You are here

Drupal

BarCamp San Francisco

Barcamp San Francisco

When we arrived at <a href="http://barcamp.org/">BarCamp San Francisco</a>, <a href="/album/san-francisco-2006/harry-slaughter">Harry Slaughter</a> was doing a presentation on <a href="http://drupal.org/">Drupal</a>.

Jeff, Boris and me

Jeff boris and me

Jeff Robbins (Lullabot), Boris Mann (Bryght) and myself after our Drupal meeting with <a href="http://spikesource.com/">SpikeSource</a>.

Drupal's database interaction

I used XDebug to profile the behavior of Drupal, and to study the interaction with the database server. I aggregated the profile information of 100 requests to the main page using the "Apache, mod_php, PHP4, APC" configuration used for previous benchmark experiments. More information about my experimental setup is available at that page. XDebug generates a trace file with all the profile information which I visualized using KCacheGrind.

Drupal has a page cache mechanism which stores dynamically generated web pages in the database. By caching a web page, Drupal does not have to create the page each time it is requested. Only pages requested by anonymous visitors (users that have not logged on) are cached. Once users have logged on, caching is disabled for them since the pages are personalized in various ways. Because this represents two different modes of operation, I investigated Drupal's behavior with and without page caching.

With page caching

Critical functions (with page caching)

This figure shows how much time is spent in each function when page caching is enabled. The functions are sorted by the 'Self' column, which shows the time spent in each function without the time spent in its children. The second column shows how often the function was called to serve 100 requests to the main page.

We observe that more than 45% (14.37% + 14.18% + 8.9% + 5.54% + 2.9% + ...) of the execution time is spent in the database related functions. Sending the SQL queries to the database server and waiting for the results takes 14% of the total execution time. Drupal preparing the queries (eg. database prefixing, sanitizing the input to prevent SQL query injection, etc) takes more than 31% of the total execution time. We should look into optimizing the functions that prepare the queries (db_query(), _db_query and _db_query_callback()).

The figure above depicts that PHP's mysql_query() function is called 1401 times. This means that we need 14 SQL queries to serve a cached page. This is where these SQL queries come from:

Queries (with page caching)

This figure shows the Drupal functions responsible for querying the database when serving 100 cached pages. The first column shows how much time is spent in the calls to <code>db_query()</code>. The second column shows how many times each function queried the database and the last column shows the functions' names and source files.

Without page caching

Critical functions (without page caching)

This figure shows how much time is spent in each function when page caching is disabled. The functions are sorted by the 'Self' column, which shows the time spent in each function without the time spent in its children. The second column shows how often the function was called to serve 100 requests to the main page.

When the page cache is disabled, we see that on average we need 144 SQL queries to generate the main page. That is a lot. We also observe that 25% of the total execution time is spent in database related functions: 13% of the total execution time is spent executing queries, and 12% of the total execution time is spent preparing the queries. This is where the SQL queries come from:
Queries (without page caching)

This figure shows the Drupal functions responsible for querying the database when serving 100 pages. The first column shows how much time is spent in the calls to <code>db_query()</code>. The second column shows how many times each function queried the database and the last column shows the functions' names and source files.

Drupal road trip to San Francisco

About six years ago I started working on Drupal. Drupal, at that time, was an experimental platform that helped me explore new web technologies from my student dorm. Contrast this with the present. Today, there are hundreds of people contributing to the project, building and relying on that foundation, and hundreds of thousands of people downloading it. What started as a hobby project is now starting to get on the radar of some of the bigger projects and players ... It is no longer the casual hobby project it used to be.

It is fair to say that Drupal's growth makes for some interesting questions, both for me personally, and for the Drupal community at large. It makes me feel increasingly responsible, and that certainly adds some pressure. How to help run this thing as it continues to grow? Do we need a Drupal Foundation or not? How should I deal with my growing sense of responsibility? Or how to deal with being labeled an anti-Bill Gates?

I'm particularly interested to hear what other projects and people in a similar position do, or have done. So in an effort to make some connections and relationships with leaders in the FOSS community, I'll be taking a "Drupal road trip" to the San Francisco Bay Area from June 25th to June 30th.

Jeff Robbins from Lullabot is helping me contact people and set up my schedule. We set up personal meetings with some of the smartest people in the FOSS and internet community:

  • Tim O'Reilly (Founder and CEO of O'Reilly & Associates)
  • Chris DiBona (Open Source Programs Manager at Google)
  • Mitch Kapor (Co-founder of Lotus-1-2-3, founder of the Open Source Applications Foundation, co-founder of the Electronic Frontier Foundation, chair of the Mozilla Foundation)
  • Jeffrey Veen (Google / Measure Map / Adaptive Path)
  • Channel Wheeler and Bradley Greenwood (Yahoo!)
  • Janice Fraser (CEO of Adaptive Path)
  • Guido van Rossum (Founder of the Python project, Google)
  • Larry M. Augustin (CEO of VA Linux)
  • Anders Tjernlund (VP of Support Services at SpikeSource)
  • Brian Behlendorf (co-founder of the Apache Foundation)

If you would like to connect us with others in the Bay Area that should be on this list, please contact us.

These meetings will give me a chance to talk to these people about what is happening with Drupal and the Drupal community, and get a chance to promote all the great work we've done together. We're going to try and make sure that Drupal can be a little more connected with the larger FOSS community. Maybe it will open up further possibilities for collaboration and support. As always, we'll let things go naturally. Most importantly, I hope to learn from these people and that alone is going to be an invaluable experience.

As well as all of these "official" meetings, I would love to connect with the local Drupal community. We're leaving at least Thursday evening (the 29th) open for a Drupal meetup, but there should be other opportunities to hang out. Check the Bay Area Drupal group for details on time and place.

Trends interview

André Gilain interviewed me about Drupal for Trends, a Belgian business magazine. You can find the article in the June 15 issue of Trends Tendances (French version), or you can read it in full by clicking the link below.

Trends06 interview

<a href="/files/trends06-interview.pdf">PDF version</a>, &copy; Trends.

This is really good publicity for Drupal, and it is exciting to see our work getting so much recognition.

I'm not sure that I like being called the "anti-Bill Gates" -- it is not like I'm a modern hippy fighting windmills, am I? I wish that the article was more about Drupal and the Drupal community, and less about me. Credit should be given where credit is due: Drupal's successes should be attributed to the Drupal community, of which I am just one part.

Drupal webserver configurations compared

Opinions on the best webserver configuration for Drupal vary from one user to the next. Thus, I set out to compare and document the performance of difference combinations of webserver options.

Experimental setup

I setup a Drupal 4.7 site with 2,000 users, 5,000 nodes, 5,000 path aliases, 10,000 comments and 250 vocabulary terms spread over 15 vocabularies. If you don't know what that means, you're probably not a Drupal user. That is OK. If you have basic technical knowledge of webservers the results might still be interesting.

Next, I configured the main page to show 10 nodes, enabled some blocks in both the left and the right sidebar, setup some primary links, and added a search function at the top of the page. I also setup a contact page using Drupal 4.7's contact module. The image below depicts how my final main page was configured.

A Drupal 4.7 main page

Benchmarks were done on a 3 year old Pentium IV 3Ghz with 2 GB of RAM running Gentoo Linux. I used the following software: Apache 2.0.55, Lighttpd 1.3.16, PHP 4.4.2, PHP 5.1.4, and MySQL 4.1.4 without special configuration or tweaking other than strictly necessary to get things up an running.

Apache's ab2 was used to compute how many requests per second the above setup is capable of serving.

Drupal page caching?

Drupal has a page cache mechanism which stores dynamically generated web pages in the database. By caching a web page, Drupal does not have to create the page each time it is requested. Only pages requested by anonymous visitors (users that have not logged on) are cached. Once users have logged on, caching is disabled for them since the pages are personalized in various ways. On some websites, like this weblog, everyone but me is an anonymous visitor, while on other websites there might be a good mix of both anonymous and authenticated visitors.

When presenting the benchmark results, I'll make a distinction between anonymous visitors and authenticated visitors. This allows you to interpret the results with the dynamics of your own Drupal websites in mind.

APC?

APC, which stands for Alternative PHP Cache, is a free PHP extension that will optimize the performance of PHP applications by caching PHP code in a compiled state.

APC compared with no APC

(Larger bars are better. It means that we can serve more pages per second.)

With APC, my configuration could serve 4 times as many pages to anonymous visitors, and 2 times as many pages to authenticated visitors.

PHP4 or PHP5?

The figure below depicts a comparison between PHP4 and PHP5. It shows that on average, PHP5 can handle 13% fewer requests per second than PHP4 for anonymous visitors, and 4% fewer requests for authenticated visitors.

PHP4 compared with PHP5

Reverse proxy?

When a proxy, like Squid, is configured as a reverse proxy it can act as a caching mechanism for web pages. Because the reverse proxy sits between the internet and the webserver, it can intercept all requests and respond to them by serving cached content. This reduces the load on the webserver (and the database).

I haven't setup a reverse proxy for these experiments because, currently, Drupal doesn't generate the proper HTTP headers to control the cache mechanism. Also, because of the way Drupal works, it is quite a challenge to setup and configure a reverse proxy.

The one scenario where it would be easy to setup, and likely to be beneficial, is where all visitors are anonymous visitors.

mod_php or FastCGI?

There are many ways to run PHP. FastCGI and mod_php are two of the most commonly used approaches. Most people are using mod_php because that is the default on nearly all Linux distributions. With mod_php, PHP runs as an Apache module and as a result, all PHP applications are executed with the privileges of Apache. This means that your Drupal files need to be readable (and sometimes writable) by Apache.

When using FastCGI, the web applications can run under different privileges than that of Apache. Hence, FastCGI is often used on shared hosts to provide additional security. Using FastCGI you can prevent that other users on the system, can read or write your files. The downside, however, is that FastCGI has an additional performance cost.

So when choosing between mod_php or FastCGI, you are making a trade-off between security and performance. The figure below shows the relative performance of the two approaches, and can help you understand the trade-off. When switching from mod_php to FastCGI we observe a 63% slowdown for anonymous visitors, and a 18% slowdown for authenticated visitors.

FastCGI compared with mod_php

Apache or Lighttpd?

Lighttpd (or Lighty) is a HTTP daemon designed for high-performance environments. Its goal is to be fast and have a small memory footprint. The figure below shows how Apache compares to Lighttpd.

Apache compared with Lighttpd

Conclusions

So what are the best and worst configurations of those i have tested? And what is the fastest configuration, while still being secure? Turns out that the slowest configuration is Apache 2 running PHP5 as an Apache module without using APC. Unfortunately, this is one of the most common configurations. The fastest configuration using the more secure FastCGI method, on the other hand, is Lighttpd with PHP4 in FastCGI mode using APC. For anonymous visitors, the latter is almost 4 times faster than the former, while being more secure. This is illustrated by the figure below.

Webserver configurations compared
If you are not on a shared host, you might not care about the security options provided by FastCGI. In that case, the next and last figure might be of interest. Turns out that for anonymous visitors my fastest Apache configuration is 3% faster than the fastest Lighttpd configuration.
Webserver configurations compared

Drupal download statistics

It has been exactly one month since we released Drupal 4.7.0 and we have nearly broken 50,000 downloads. The Drupal 4.7 videocasts have been watched over 25,000 times, and have added 400GB of traffic to our monthly bandwidth usage. Not too shabby.
Absolute download statistics
Relative download statistics

Pages

© 1999-2014 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.