Drupal's database interaction

I used XDebug to profile the behavior of Drupal, and to study the interaction with the database server. I aggregated the profile information of 100 requests to the main page using the "Apache, mod_php, PHP4, APC" configuration used for previous benchmark experiments. More information about my experimental setup is available at that page. XDebug generates a trace file with all the profile information which I visualized using KCacheGrind.

Drupal has a page cache mechanism which stores dynamically generated web pages in the database. By caching a web page, Drupal does not have to create the page each time it is requested. Only pages requested by anonymous visitors (users that have not logged on) are cached. Once users have logged on, caching is disabled for them since the pages are personalized in various ways. Because this represents two different modes of operation, I investigated Drupal's behavior with and without page caching.

With page caching

Critical functions (with page caching)

This figure shows how much time is spent in each function when page caching is enabled. The functions are sorted by the 'Self' column, which shows the time spent in each function without the time spent in its children. The second column shows how often the function was called to serve 100 requests to the main page.

We observe that more than 45% (14.37% + 14.18% + 8.9% + 5.54% + 2.9% + ...) of the execution time is spent in the database related functions. Sending the SQL queries to the database server and waiting for the results takes 14% of the total execution time. Drupal preparing the queries (eg. database prefixing, sanitizing the input to prevent SQL query injection, etc) takes more than 31% of the total execution time. We should look into optimizing the functions that prepare the queries (db_query(), _db_query and _db_query_callback()).

The figure above depicts that PHP's mysql_query() function is called 1401 times. This means that we need 14 SQL queries to serve a cached page. This is where these SQL queries come from:

Queries (with page caching)

This figure shows the Drupal functions responsible for querying the database when serving 100 cached pages. The first column shows how much time is spent in the calls to <code>db_query()</code>. The second column shows how many times each function queried the database and the last column shows the functions' names and source files.

Without page caching

Critical functions (without page caching)

This figure shows how much time is spent in each function when page caching is disabled. The functions are sorted by the 'Self' column, which shows the time spent in each function without the time spent in its children. The second column shows how often the function was called to serve 100 requests to the main page.

When the page cache is disabled, we see that on average we need 144 SQL queries to generate the main page. That is a lot. We also observe that 25% of the total execution time is spent in database related functions: 13% of the total execution time is spent executing queries, and 12% of the total execution time is spent preparing the queries. This is where the SQL queries come from:
Queries (without page caching)

This figure shows the Drupal functions responsible for querying the database when serving 100 pages. The first column shows how much time is spent in the calls to <code>db_query()</code>. The second column shows how many times each function queried the database and the last column shows the functions' names and source files.

Comments

Gerhard Killesreiter (not verified):

I think we can easily save some queries by caching the results of the queries done in drupal_get_filename().

As I see it we only need to invalidate the cache when installing/disabling new modules or themes.

June 23, 2006 - 19:52
Justin Cooper (not verified):

Nice writeup of your test results!

One thing that I don't see mentioned which has had an *enormous* effect on the speed of my site is the MySQL query cache.

This MySQL feature is not enabled by default, but for long queries it can make a big difference. I'd love to see how it affects the performance of your benchmark tests.

June 23, 2006 - 19:54
Dries:

The performance gain that you can obtain from enabling the MySQL query cache is bound by the time you spend executing SQL queries. In this particular case, we spent 13-14% of the time executing SQL queries (see figures). The rest of the time is spent in PHP code which doesn't benefit directly from the MySQL query cache. Thus, the upper bound of the potential performance improvement is 13-14%. In practice it will be less because even with the MySQL query cache enabled, you have to do a minimal amount of work. For pages with more expensive SQL queries, the performance gain is likely to be more significant.

June 24, 2006 - 07:21
merlinofchaos (not verified):

I've tried to address the drupal_lookup_path() queries: http://drupal.org/node/47832.

June 23, 2006 - 21:05
Chris Johnson (not verified):

Are the Self and Time values the total time for that function, versus the average time for that function? I am guessing yes.

If so, then there needs to be some optimization in how many times some of the functions are called, not just in how they perform. For example, why is drupal_lookup_path() being called about 6.5 times per node on this page? Is that number even node count related?

Note that several of the functions are called 10 times each, exactly the number of nodes being displayed, so clearly they are node-related.

There's a lot of stuff to be investigated here.

June 23, 2006 - 21:28
bertboerland (not verified):

Excellent write up, we need more emperic data like this and follow up code to speed up Drupal.

June 24, 2006 - 09:58
moshe weitzman (not verified):

drupal_lookup_path() is responsible for half of the SQL queries on the page; that is a lot. Yet we tried to cache the whole alias table with bad results. Not sure what to do here, but that looks like a smoking gun if we really care about reducing the number of queries.

Note that this goes to 0 queries if you have an empty alias table.

I wish it was easier to install cachegrind on Mac OS X.

June 28, 2006 - 23:40
Bernhard (not verified):

Regarding kcachegrind, it's just a matter of typing "fink install kcachegrind" if you are using Fink (unstable, http://fink.sourceforge.net). Use kcachegrind through Apple's X11.app.

July 28, 2006 - 12:22
moshe weitzman (not verified):

I think about this post from time to time and I still can't come to terms with it. Specifically, the statement "The functions are sorted by the 'Self' column, which shows the time spent in each function without the time spent in its children". How is it possible that db_query() is at the top? that function hardly does anything on its own. It hands off all processing to other functions.

For me, this does not pass the smell test.

September 5, 2006 - 21:07
Dries:

I verified the results at the time so I'm confident it is not my fault. I'll do some additional analysis in the future; maybe I can use alternative tools or newer versions of the same tools to get more accurate results or to better understand the problem. I'll investigate this further as time permits.

September 6, 2006 - 22:11
Reid Spencer (not verified):

This thread has been inactive for 1.75 years. I'm considering using Drupal 6.X for a new major site where performance matters a lot. How does 6.X stack up in these investigations? Has the number of queries been reduced since 2006? Any other news to report?

July 5, 2008 - 22:49

Add new comment

© 1999-2014 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.