Apache Webserver Benchmarks

You are here: Andrew Ho > Work > Apache Benchmarks

On April 6, 2002, the newest version of the the Apache webserver, Apache 2.0.35, was finally announced as a general availability release of the much anticipated Apache 2 webserver. There are many new features now available, and Apache 2 represents essentially an entirely new rewrite of the entire codebase.

Curious about its performance against the most recent Apache from the venerable 1.3 series (Apache 1.3.24), I downloaded, built, and benchmarked them against each other.

Initial Results

All servers are tested "out of the box," meaning I built them freshly from source, and made minimal changes to their configurations. I did no tweaking of parameters for performance. The naïve benchmark test uses ab to throw load at a server at different levels of concurrency, serving up a 1k text file.

The benchmark machines have dual Pentium III processors running at 700MHz each, with 1GB of RAM, running Solaris x86. A separate machine with identical specs was used to put the load on the webserver. The two machines are connected by a fast local network.

Before each test, I restart Apache to ensure that memory leaks and other persistent process problems are eliminated; I also zero out any logs so any cumulative effects from large logfile writing are minimized. I also run a short warmup load of 15 requests/second before the start of each test. Each level of concurrency runs for 30 seconds, and there is no delay between increases in the level of concurrency.

Just for fun, I threw thttpd in the mix. thttpd is a high-performance server which runs as a single process, doing a tight select() loop over the incoming sockets. thttpd is blazingly fast, and its concurrency method is used by the commercial Zeus webserver.

Here is the first set of benchmark results I got:

View data table | Download data table as CSV

As you can see, Apache 2 represents a 30% improvement in server throughput as measured by requests/second, right out of the box!

Note that while thttpd still beats both Apaches by a good measure, the test is not entirely fair because the out-of-the-box Apache configuration has far more functionality than thttpd, which does little else out of the box than serve static files from a single directory. I would probably never use thttpd on a production service because typically I want the large amount of configuration flexibility that Apache excels at.

More Detailed Results

Apache 2 runs in two different configurations under Unix—a preforking mode, which forks individual processes just like Apache 1.3 does; and a new, hybrid, "worker" mode which forks multithreaded processes, with a fixed number of threads per child. I included both configurations of Apache 2 in the next benchmark, ran the test up to 150 simultaneous requests, and added latency measurements (average latency per request).

View data table | Download data table as CSV

The throughput curves are interesting. All of the Apache variants jump up their maximum levels relatively quickly, while thttpd scales up a bit more slowly. Surprisingly, the worker MPM (the hybrid multiprocess/threaded mode) for Apache 2 comes out slower than the preforking variant. It's still a little faster than Apache 1.3.24, which is nice.

I'm pretty sure the burp at 15-16 simultaneous requests for the prefork MPM is an anomaly. It didn't show up in some other tests I ran (including the first set above).

All three Apaches plateau so quickly that it's hard to see how they scale up. Here's a zoomed in view of the parts of the data with low concurrency:

You can see that all three Apaches have the same basic curve.

Using Processor Binding

An astute co-worker hypothesized that the dual processor nature of our test box was trashing the threading performance. He suggested I use the Solaris pbind command to bind processes to a single processor and benchmark that. I did this for the three Apache variants:

View data table | Download data table as CSV

Now, for higher concurrency levels the worker MPM beats out the preforking one hands-down. However, what's up with that impulse at around 50 simultaneous requests? It turns out that by default Apache with the worker MPM forks 2 children, with 25 threads each. So at around 50 simultaneous requests, Apache forks a third child, and those 75 total threads are capable of handling all 150 simultaneous requests from the load generating machine.

I figured that prespawning 3 children would eliminate the bump at 50 simultaneous requests, so I ran a benchmark with this configuration as well. But as you can see above, for some reason this mode still fails to reach the performance of the situation where we went from 2 children to 3 children.

Things to Improve

The short list of things to improve.

Use Better Benchmarking Methodology

I should switch to a better benchmarking tool (like Flood or Siege), and run a couple other benchmarks as order-of-magnitude sanity tests (for example, http_load).

Real websites are usually composed of more than one file. The test should include a long list of URLs to visit, in random order. The URLs should return results which vary in size.

I should run each level of concurrency faster, and sanity check the results for certain levels of concurrency by running checks from multiple machines and summing the results.

I should plot maximum latency along with average latency, because in many situations (especially with parallelized, load balanced servers where you can just add machines for easy capacity upgrade) peak latency is more important than average latency.

For thttpd, I should enable cgi-bin parsing and other features that put it on par with Apache (on this test, I was trying to not touch configurations that came out of the box).

I should up the number of simultaneous users until I see a marked drop in requests/second served.

Improve Reporting of Results

I should autogenerate the graphs using gnuplot or GD to eliminate the manual graph generation step. This would also make it easy to add other benchmarking tools in an automated way to compare results. The throughput and latency graphs could be superimposed on a graph with two sets of y-axes.

The processor bound results should be separated out so that the worker MPM with an additional child forked is shown separately. It's kinda confusing that it's all together to start with.

I should add links to the different Apache MPM documentation pages and to a description of the select() based server algorithm, and links to past benchmarks.

This page probably has some typos and grammatical errors.

Other Benchmarks I'd Like to Run

I should expand this to include a few other platforms and try the test on a single-processor box to confirm the processor bound trials.

I'd like to include results for server-side includes to see the impact of the new filter architecture on performance. Doing some base CGI comparison would also be nice too. Probably it's too early to do any mod_perl testing on Apache 2 but it'd be great to compare, say, an Apache::Registry page on Apache 1.3.24 to see how it stacks up to static page performance.

I should probably either drop thttpd or add a bunch of other webservers to add either focus or interest.

Andrew Ho (andrew@zeuscat.com)