Design

3 Design

Design

The two main design goals of httperf were (a) predictable and good performance and (b) ease of extensibility. Good performance is achieved by implementing the tool in C and paying attention to the performance critical execution paths. Predictability is improved by relying as little as possible on the underlying OS. For example, httperf is designed to run as a single-threaded process using non-blocking I/O to communicate with the server and with one process per client machine. With this approach, CPU scheduling is trivial for the OS which minimizes the risk of excessive context switching and poor scheduling decisions. Another example is timeout management: rather than depending on OS-mechanisms, httperf implements its own, specialized and light-weight timer management facility that avoids expensive system calls and POSIX signal delivery wherever possible.

Based on experiences with an earlier test tool, it was clear that httperf will undergo fairly extensive changes during its lifetime. To accommodate this need for change, httperf is logically divided into three different parts: the core HTTP engine, workload generation, and statistics collection. The HTTP engine handles all communication with the server and as such takes care of connection management and HTTP request generation and reply handling. Workload generation is responsible for initiating appropriate HTTP calls at the appropriate times so a particular workload is induced on the server. The third part, statistics collection, is responsible for measuring various quantities and producing relevant performance statistics. Interactions between these three parts occur through a simple yet general event signalling mechanism. The idea is that whenever something interesting occurs inside httperf, an event is signalled. Parties interested in observing a particular event can register a handler for the event. These handlers are invoked whenever the event is signalled. For example, the basic statistics collector measures the time it takes to establish a TCP connection by registering events handler for the events that signal the initiation and establishment of a connection, respectively. Similarly, a workload generator responsible for generating a particular URL access pattern can register a handler for the event indicating the creation of a new call. Whenever this handler gets invoked, the URL generator can insert the appropriate URL into the call without having to concern itself with the other aspects of call creation and handling.

3.1 Sustaining Overload

Sustaining Overload

As alluded to earlier, an important design issue is how to sustain an offered load that exceed the capacity of the web server. The problem is that once the offered rate exceeds the server's capacity, the client starts building up resources at a rate that is proportional to the difference between offered and sustained rate. Since each client has only a finite amount of resources available, sooner or later the client would run out of resources and therefore be unable to generate any new requests. For example, suppose that each httperf process can have at most 2,000 TCP connection open at any given time. If the difference between offered and sustained rate is 100 requests per second, a test could last at most 20 seconds. Since web server tests usually require minutes to reach a stable state, such short test durations are unacceptable. To solve this problem, httperf times out calls that have been waiting for a server response for too long. The length of this timeout can be selected through command-line options.

With this timeout approach, the amount of client resources used up by httperf is bounded by the timeout value. In the worst case scenario where the server does not respond at all, httperf will never use more than the amount of resources consumed while running httperf for the duration of the timeout value. For example, if connections are initiated at a rate of 100 per second and the timeout is 5 seconds, at most 500 connections would be in use at any given time.

3.1.1 Limits to Client-Sustainable Load

Limits to Client-Sustainable Load

It is interesting to consider just what exactly limits the offered load a client can sustain. Apart from the obvious limit that the client's CPU imposes, there is a surprising variety of resources that can become the first-order bottleneck. It is important to keep these limits in mind so as to avoid the pitfall of mistaking client performance limits as server performance limits. The three most important client bottlenecks we have identified so far are described below.

Size of TCP port space:: TCP port numbers are 16 bits wide. Of the 64K available port numbers, 1,024 are typically reserved for privileged processes. This means that a client machine running httperf can make use of at most 64,512 port numbers. Since a given port number cannot be reused until the TCP TIME_WAIT state expires, this can seriously limit the client sustainable offered rate. Specifically, with a 1 minute timeout (common for BSD-derived OSes) the maximum sustainable rate per client is about 1,075 requests per second. With the RFC-793 [5] recommended value of 4 minutes, the maximum rate would drop to just 268 requests per second.
Number of open file descriptors:: Most operating systems limit both the total and per-process number of file descriptors that can be opened. The system-wide number of open files is normally not a limiting factor and hence we will focus on the latter. Typical per-process limits are in the range from 256 to 2,048. Since a file descriptor can be reused as soon as an earlier descriptor has been closed, the TCP TIME_WAIT state plays no role here. Instead, the duration that is of interest here is the httperf timeout value. Assuming a value of 5 seconds and a limit of 2,000 open file descriptors per process, a maximum rate of about 400 requests per second could be sustained. If this becomes the first-order bottleneck in a client, it is possible to avoid it either by tuning the OS to allow a larger number of open file descriptors or by decreasing the httperf timeout value. Note that decreasing the timeout value effectively truncates the lifetime distribution of TCP connections. This effect has to be taken into consideration when selecting an appropriate value. Another seemingly obvious solution would be to run multiple processes on a single machine. However, as will be explained in Section 4.1, there are other reasons that make this approach undesirable.
Socket buffer memory:: Each TCP connection contains a socket receive and send buffer. By default, httperf limits send buffers to 4KB and receive buffers to 16KB. With limits in the kilobyte range, these buffers are typically the dominant per-connection costs as far as httperf memory consumption is concerned. The offered load that a client can sustain is therefore also limited by how much memory is available for socket buffers. For example, with 40MB available for socket buffers, a client could sustain at most 2,048 concurrent TCP connections (assuming a worst-case scenario where all send and receive buffers are full). This limit is rarely encountered, but for memory-constrained clients, httperf supports options to select smaller limits for the send- and receive-buffers.

The above list of potential client performance bottlenecks is of course by no means exhaustive. For example, older OSes often exhibit poor performance when faced with several hundred concurrent TCP connections. Since it is often difficult to predict the exact rate at which a client will start to become the performance bottleneck, it is essential to empirically verify that observed performance is indeed a reflection of the server's capacity and not that of the client's. A safe way to achieve this is to vary the number of test clients, making sure that the observed performance is independent of the number of client machines that participate in the test.

3.2 Measuring Throughput

Measuring Throughput

Conceptually, measuring throughput is simple: issue a certain number of requests, count the number of replies received and divide that number by the time it took to complete the test. This approach has unfortunately two problems: first, to get a quantitative idea of the robustness of a particular measurement, it is necessary to run the same test several times. Since each test run is likely to take several minutes, a fair amount of time has to be spent to obtain just a single data point. Equally important, computing only one throughput estimate for the entire test hides variations that may occur at time scales shorter than that of the entire test. For these reasons, httperf samples the reply throughput once every five seconds. The throughput samples can optionally be printed in addition to the usual statistics. This allows observing throughput during all phases of a test. Also, with a sample period of 5 seconds, running a test for at least 3 minutes results in enough throughput samples that confidence intervals can be computed without having to make assumptions on the distribution of the samples [3].