Implementation

4 Implementation

Implementation

In this section, we first present the capabilities of the current version of httperf and then we discuss some of the more subtle implementation issues discovered so far. In the third part, we mention some possible future directions for httperf.

The HTTP core engine in httperf currently supports both HTTP/1.0 and HTTP/1.1. Among the more interesting features of this engine are support for: persistent connections, request pipelining, and the ``chunked'' transfer-encoding [2, 4]. Higher-level HTTP processing is enabled by the fact that the engine exposes each reply header-line and all of the reply body to the other parts of httperf by signalling appropriate events. For example, when one of the workload generators required simple cookie support, the necessary changes were implemented and tested in a matter of hours.

The current version of httperf supports two kinds of workload generators: request generators and URL generators.

Request Generation:

Request generators initiate HTTP calls at the appropriate times. At present, there are two such generators: the first one generates new connections deterministically and at a fixed rate and each connection is used to perform a command-line specified number of pipelined HTTP calls. By default, the number of pipelined calls per connection is one, which yields HTTP/1.0-like behavior in the sense that each connection is used for a single call and is closed afterwards.

The second request generator creates sessions deterministically and at a fixed rate. Each session consists of a specified number of call-bursts that are spaced out by the command-line specified user think-time. Each call-burst consists of a fixed number of calls. Call-bursts mimic the typical browser behavior where a user clicks on a link which causes the browser to first request the selected HTML page and then the objects embedded in it.

URL Generation:

URL generators create the desired sequence of URLs that should be accessed on the server. The most primitive generator simply generates the same, command-line specified URL over and over again.

The second generator walks through a fixed set of URLs at a given rate. With this generator, the web pages are assumed to be organized as a 10ary directory tree (each directory contains up to ten files or sub-directories) on the server. This generator is useful, for example, to induce a specific file buffer cache miss rate on the server under test.

As far as statistics collectors are concerned, httperf always collects and prints the basic information shown in Figure 1. The only other statistics collector at this time is one that collects session-related information. It measures similar quantities as the basic connection statistics with the main difference being that the unit of measurement is the session instead of the connection.

We now proceed to discuss some of the implementation issues that conspire to raise the difficulty to write a robust high-performance test tool.

4.1 Scheduling Granularity

Scheduling Granularity

The process scheduling granularity of today's OSes is in the millisecond range. Some support one millisecond, but most use a timer tick of around 10 milliseconds. This often severely limits the accuracy with which a given workload can be generated. For example, with a timer tick of 10 milliseconds, deterministically generating a rate of 150 requests per second would have to be implemented by sending one request during even-numbered timer ticks and two requests during odd-numbered ticks. While the average rate is achieved, the bursts sent during the odd-number ticks could cause server-queue overflows that in turn could severely affect the observed behavior. This is not to say that measuring web servers with bursty traffic is a bad idea (quite the opposite is true), however, the problem here is that burstiness was introduced due to the OS, not because the tester requested it.

To avoid depending on OS scheduling granularity, httperf executes in a tight loop that checks for network I/O activity via select() and keeps track of real time via gettimeofday(). This means that httperf consumes all available CPU cycles (on a multiprocessor client, only one CPU will be kept busy in this way). This approach works fine because the only other important activity is the asynchronous receiving and processing of network packets. Since this activity executes as a (soft-) interrupt handler, no scheduling problem arises. However, executing in a tight loop does imply that only one httperf process can run per client machine (per client CPU, to be more precise). It also means that care should be taken to avoid unnecessary background tasks on the client machine while a test is in progress.

4.2 Limited Number of Ephemeral Ports

Limited Number of Ephemeral Ports

Many TCP implementations restrict the TCP ports available to sockets that are not bound to a specific local address to the so-called ephemeral ports [7]. Ephemeral ports are typically in the range from 1,024 to 5,000. This has the unfortunate effect that even moderate request rates may cause a test client to quickly run out of port numbers. For example, assuming a TIME_WAIT state duration of one minute, the maximum sustainable rate would be about 66 requests per second.

To work around this problem, httperf can optionally maintain its own bitmap of ports that it believes to be available. This solution is not ideal because the bitmap is not guaranteed to be accurate. In other words, a port may not be available, even though httperf thinks otherwise. This can cause additional system calls that could ordinarily be avoided. It is also suboptimal because it means that httperf duplicates information that the OS kernel has to maintain at any rate. While not optimal, the solution works well in practice.

A subtle issue in managing the bitmap is the order in which ports are allocated. In a first implementation, httperf reused the most recently freed port number as soon as possible (in order to minimize the number of ports consumed by httperf). This worked well as long as both the client and server machines were UNIX-based. Unfortunately, a TCP incompatibility between UNIX and NT breaks this solution. Briefly, the problem is that UNIX TCP implementations allow pre-empting the TIME_WAIT state if a new SYN segment arrives. In contrast, NT disallows such pre-emption. This has the effect that a UNIX client may consider it legitimate to reuse a given port at a time NT considers the old connection still to be in TIME_WAIT state. Thus, when the UNIX client attempts to create a new connection with the reused port number, NT will respond with a TCP RESET segment that causes the connection attempt to fail. In the case of httperf this had the effect of dramatically reducing the apparent throughput the NT server could sustain (half the packets failed with a ``connection reset by peer'' error). This problem is avoided in the current version of httperf by allocating ports in strict round-robin fashion.

4.3 Slow System Calls

Slow System Calls

A final issue with implementing httperf is that even on modern systems, some OS operations are relatively slow when dealing with several thousand TCP control blocks. The use of hash-tables to look up TCP control blocks for incoming network traffic is standard nowadays. However, it turns out that at least some BSD-derived systems still perform linear control block searches for the bind() and connect() system calls. This is unfortunate because in the case of httperf, these linear searches can easily use up eighty or more percent of its total execution time. This, once again, can severely limit the maximum load that a client can generate.

Fortunately, this is an issue only when running a test that causes httperf to close the TCP connection---as long as the server closes the connection, no problem occurs. Nevertheless, it would be better to avoid the problem altogether. Short of fixing the OS, the only workaround we have found so far is to change httperf so it closes connections by sending a RESET instead of going through the normal connection shutdown handshake. This workaround may be acceptable for certain cases, but should not be used in general. The reason is that closing a connection via a RESET may cause data corruption in future TCP connections or, more likely, can lead to needlessly tying up server resources. Also, a RESET artificially lowers the cost of closing a connection, which could lead to overestimating a server's capacity. With these reservation in mind, we observe in passing that at least one popular web browser (IE 4.01) appears to be closing connections in this manner.

4.4 Future Directions

Future Directions

In its current form httperf is already useful for performing several web server measurement tasks but its development has by no means come to a halt. Indeed, there are several features that are likely to be added. For example, we believe it would be useful to add a workload generator that attempts to mimic the real-world traffic patterns observed by web servers. To a first degree of approximation, this could be done by implementing a SPECweb-like workload generator. Another obvious and useful extension would be to modify httperf to allow log file based URL generation. Both of these extensions can be realized easily thanks to the event-oriented structure of httperf.

Another fruitful direction would be to modify httperf to make it easier to run tests with multiple clients. At present, it is the tester's responsibility to start httperf on each client machine and to collect and summarize the per-client results. A daemon-based approach where a single command line would control multiple clients could be a first step in this direction.