The two main design goals of httperf were (a) predictable and good performance and (b) ease of extensibility. Good performance is achieved by implementing the tool in C and paying attention to the performance critical execution paths. Predictability is improved by relying as little as possible on the underlying OS. For example, httperf is designed to run as a single-threaded process using non-blocking I/O to communicate with the server and with one process per client machine. With this approach, CPU scheduling is trivial for the OS which minimizes the risk of excessive context switching and poor scheduling decisions. Another example is timeout management: rather than depending on OS-mechanisms, httperf implements its own, specialized and light-weight timer management facility that avoids expensive system calls and POSIX signal delivery wherever possible.
Based on experiences with an earlier test tool, it was clear that httperf will undergo fairly extensive changes during its lifetime. To accommodate this need for change, httperf is logically divided into three different parts: the core HTTP engine, workload generation, and statistics collection. The HTTP engine handles all communication with the server and as such takes care of connection management and HTTP request generation and reply handling. Workload generation is responsible for initiating appropriate HTTP calls at the appropriate times so a particular workload is induced on the server. The third part, statistics collection, is responsible for measuring various quantities and producing relevant performance statistics. Interactions between these three parts occur through a simple yet general event signalling mechanism. The idea is that whenever something interesting occurs inside httperf, an event is signalled. Parties interested in observing a particular event can register a handler for the event. These handlers are invoked whenever the event is signalled. For example, the basic statistics collector measures the time it takes to establish a TCP connection by registering events handler for the events that signal the initiation and establishment of a connection, respectively. Similarly, a workload generator responsible for generating a particular URL access pattern can register a handler for the event indicating the creation of a new call. Whenever this handler gets invoked, the URL generator can insert the appropriate URL into the call without having to concern itself with the other aspects of call creation and handling.
As alluded to earlier, an important design issue is how to sustain an offered load that exceed the capacity of the web server. The problem is that once the offered rate exceeds the server's capacity, the client starts building up resources at a rate that is proportional to the difference between offered and sustained rate. Since each client has only a finite amount of resources available, sooner or later the client would run out of resources and therefore be unable to generate any new requests. For example, suppose that each httperf process can have at most 2,000 TCP connection open at any given time. If the difference between offered and sustained rate is 100 requests per second, a test could last at most 20 seconds. Since web server tests usually require minutes to reach a stable state, such short test durations are unacceptable. To solve this problem, httperf times out calls that have been waiting for a server response for too long. The length of this timeout can be selected through command-line options.
With this timeout approach, the amount of client resources used up by httperf is bounded by the timeout value. In the worst case scenario where the server does not respond at all, httperf will never use more than the amount of resources consumed while running httperf for the duration of the timeout value. For example, if connections are initiated at a rate of 100 per second and the timeout is 5 seconds, at most 500 connections would be in use at any given time.
It is interesting to consider just what exactly limits the offered load a client can sustain. Apart from the obvious limit that the client's CPU imposes, there is a surprising variety of resources that can become the first-order bottleneck. It is important to keep these limits in mind so as to avoid the pitfall of mistaking client performance limits as server performance limits. The three most important client bottlenecks we have identified so far are described below.
The above list of potential client performance bottlenecks is of course by no means exhaustive. For example, older OSes often exhibit poor performance when faced with several hundred concurrent TCP connections. Since it is often difficult to predict the exact rate at which a client will start to become the performance bottleneck, it is essential to empirically verify that observed performance is indeed a reflection of the server's capacity and not that of the client's. A safe way to achieve this is to vary the number of test clients, making sure that the observed performance is independent of the number of client machines that participate in the test.
Conceptually, measuring throughput is simple: issue a certain number of requests, count the number of replies received and divide that number by the time it took to complete the test. This approach has unfortunately two problems: first, to get a quantitative idea of the robustness of a particular measurement, it is necessary to run the same test several times. Since each test run is likely to take several minutes, a fair amount of time has to be spent to obtain just a single data point. Equally important, computing only one throughput estimate for the entire test hides variations that may occur at time scales shorter than that of the entire test. For these reasons, httperf samples the reply throughput once every five seconds. The throughput samples can optionally be printed in addition to the usual statistics. This allows observing throughput during all phases of a test. Also, with a sample period of 5 seconds, running a test for at least 3 minutes results in enough throughput samples that confidence intervals can be computed without having to make assumptions on the distribution of the samples [3].