UM EECS 489 Winter 2007

Programming Assignment 1: Simplified Web Server

DUE DATE (extended): Friday 2/9/2007 11:59PM EST

The testing script and grading guideline are HttpTest.py and Grading Guideline.
Please use the turnin program to turn in this assignment. Use "pa1" for the assignment name. Only one member of your group needs to turn in your code. Here is a link to some files that we have provided you to get started. (It is meant to be incomplete! In fact, it will not compile.) You do not have to use the sample code for the server, but you should use the test script provided as well as the sample HTML files to serve. 

Resources

Here is a good tutorial on pthreads. Be sure to use man pages for details.

Preamble

Please read the instructions very carefully. For this assignment, you are to write a simple Web server that only serves files. You will be writing two different versions of the server using two different I/O models: (1) select()-based I/O multiplexing, (2) thread-based blocking I/O. This project is to be done in groups of two students. If you have to work in groups of three or alone, please contact us.

Functionality

The simplified Web server should be able to handle HTTP GET requests. The request format is the following, which can be seen in the test script: "GET <path> HTTP/1.X\r\n\r\n", where X is either 0 or 1, <path> is the path of the file on the web server (/index.htm for example), \r is the escape sequence for carriage return, and \n is the escape sequence for newline. (Here, assume that there is only a single space between the words.) If the request is successful, the server should return "HTTP/1.X 200 OK\r\n\r\n" directly followed by the content of the file. Note, X here should match the X in the original client request. The server should not close the connection after serving the file. Instead, the server should support persistent HTTP connections, i.e., it should allow the client to send another HTTP request and serve future requests. If the file is not found, the server should return "HTTP/1.X 404 Not Found\r\n\r\n" then close the connection. The server should also write out a log entry with the current time, the client's IP address/port, the name of the file, and whether or not the request was successful for each HTTP request. For more on logging, see the error handling section.

To disallow clients from access files above the working directory where the server is run, you should call chroot. Please use man pages to understand how it works.

IO models

You are to write two versions of Web server using different I/O models. Please do not simply divide the assignment by having one partner work on one version.

In the first model, you will use select() to serve all clients in a single thread. This is the I/O multiplexing model. Select blocks on multiple file descriptors, so there is no need to create additional threads. In the second model, you will use pthreads instead, serving each client with its own thread.

Clearly there are resource limitations for your web server. One is the maximum number of simultaneous clients, which is limited by the maximum number of open socket descriptors in both models, and additionally by the maximum number of threads in the second model. For this assignment, your server will not need to handle more than 1000 simultaneous clients, so you may hard-code this as an upper limit. (HINT: File descriptors for open files also take up resources. To serve many clients, you may need to restrict the number of simultaneous open file descriptors. This can be done by reading part of a file, closing it, then opening and reading the next chunk when ready. Semaphores may be used to keep track of the number of open file descriptors in the thread model.)

Error handling

An important part of this assignment is error handling. You must be able to handle all errors associated with socket and file operations. Any function that has the potential to return an error value should be checked, and an error message should be written to the log file when an error value is returned. In addition, you should always output an error number for socket and file operations if one is available. (The error number can be obtained from checking the errno variable, which should be declared as an extern int at the beginning of your C or C++ file. You must also include errno.h to use this functionality.) For socket operations, you should also print out the client's IP and port they are available, and for file operations you should print out the file name if it is available. If the error is server-side, such as failed memory allocation, then you should also call exit() to terminate the server. If the error is client-side, for example if the client unexpectedly closes the connection without receiving the entire file, the server should log the error, then close the client connection and free up any state associated with the session.

In addition to the above errors, the server should disconnect clients that are idle (do not send or receive any data) for two minutes or longer and free up state associated with the connection.

The Web server should output information to the log in the following format: (Use '##' to preceding each log message and put a newline at the end of each log message so we can tell where they begin and end)

 ##[unix time stamp] [error number] Accepting a new connection from client [IP address]:[port].
 ##[unix time stamp] [error number] client [IP address]:[port] prematurely disconnected.
 ##[unix time stamp] [error number] Unable to open file [filename] requested by client [IP address]:[port].
 ##[unix time stamp] [error number] client [IP address]:[port] is too slow, closing connection.

Provided code

In this assignment, you are free to write your code from scratch. You are required to program in C or C++. Your code should compile using the standard gcc or G++ compiler. The code provided in the zip file is for your convenience. You are not required to use it, but may find it useful, especially the HTTP header parsing function. You do not have to do any parsing beyond what is done in the parse_request() function defined in SelectServer.h

HttpTest.py: this is the test client for you to test your Web server. You may need to modify the server IP address and port information at the beginning of the file to use it. It is written in python, but there is ample documentation on its functionality contained in the comments. Please look over the six test cases and try to understand what they are doing.  If you are interested in learning more about python, a tutorial is available at http://docs.python.org/tut/tut.html. To run the test file, use the following syntax:

python HttpTest.py <Test Number>

Test Number should be the number of the test case that you wish to run.

File1KB.htm: This is a 1KB html file that is used in some of the HttpTest.py test cases.

File500KB.htm: This is a 500KB html file that is used in some of the HttpTest.py test cases.

The checksum and size of these files are stored in HttpTest.py, and it will display an error if it does not receive a whole file or the checksum does not match.

SelectServer.h: contains useful structure definitions and a function for parsing client HTTP request.

DISCLAIMER: Do not use the client test code on a live web server that you are not running yourself. It is meant to stress-test a web server and could severely degrade performance of a live web server potentially causing denial of service.

Server command line interface and testing

Your server should display the usage if it is run without any command line arguments. You should allow the user to specify the local port to listen on as well as the name of the log file on the command line. The suggested command line interface is the following:
webServer -p [port number] -l [log file name]

The port number denotes the port on which the web server listens for connections. Normally this would be port 80 for a web server, but you will need to run on a port greater than 1024 if you do not have root privileges. All required output should be written to the log file. If the user specifies "stdout" or "stderr" for the log file, then you should print output to stdout or stderr, respectively.

You are highly encouraged to use the client code provided to test your Web server. Also, please use your own web browser to test your server to see if you can successfully download and display a file.

Design document

Please first compose a design document (one to two pages in text or PDF format), which will also serve as a README file for your project. In the document, include unique names of your project members, describe the architecture of your web server for both I/O models. This should include what states each connection can be in, what information is stored for each state, and how different error conditions are handled.