Drupal Performance Mantra: Crawl, Boost, Expire
The Boost module is a surprisingly effective static file caching tool to dramatically speed up pretty much any Drupal website. The trick it employs is in that it actually tries to avoid running Drupal as often as possible.
It provides caching on the level of whole web pages. If a particular version of a page happens not to be cached, Drupal will of course be asked to run through its convoluted logic of permissions, menus, database calls, and all that. Heavy pages often take several seconds to complete. Getting under the magic limit of 1 second is a prized UX and SEO art — but that's another story.
Thanks to Boost, even if your page takes several seconds to load, it will do that just once. The next time it will be fired blazing fast as a little compressed HTML file directly by the Apache web server, completely avoiding not just Drupal but any MySQL or PHP processing, easily delivering a page in a matter of milliseconds.
There indeed are even more performant caching layers. On dedicated servers, we surely may want to consider Varnish instead. In my experience, however, the various Varnish caching recipes out there are actually much less customizable than Boost, and they lack the sweet sense of completeness provided by Boost and its companion modules described further in this article.
Boost is good at making sure that a page request does not ask Drupal for the page content more than one single time. The problem is that without talking to Drupal it never learns when a page is updated. It will tirelessly continue its rapid delivery of pages — but their content will become stale.
Not forever, of course — Boost has settings for cache expiration time. We can set it to get a fresh version of the pages after a given time period, e.g. after 6 or 12 hours after the initial caching.
However, that is rarely good enough for any larger or busier website. Imagine a site with 300,000 pages that has 3,000 page views every hour, and with 30 page updates every hour. Some of the pages are more popular than others, so let's say that after 6 hours there are some 10,000 pages cached. In the 6 hours, 6 x 30 = 180 pages have been updated. Say 100 of them had been initially cached, meaning Boost is now serving stale content for those pages. Then the configured expiration time starts elapsing for the cached pages and Drupal is again tasked to slowly compute each of them. It happens gradually, since each page is cached at a bit different time and so they do not expire simultaneously either. But in the end, 10,000 pages will be expired and re-computed even though 9,900 of them remained unchanged. What a waste of resources and more, what a waste of time our users have to spend while Drupal re-computes the identical pages!
Enter the Cache Expiration module, a nifty little module that allows us to configure what pages get expired when something happens. We can set whether node pages get expired on node insert, update or delete, and whether the front page should also be expired at that moment (which is often needed), and possibly also the taxonomy term pages related to nodes, or any number of other custom URLs. Even better, we can do this per content type — making it possible that e.g. adding an image node, which anyway does not have a page display of its own, does not expire the cache of all nodes! And we can configure similar behaviour for comments, files, and for user pages. After a page is expired, it will be re-generated by Drupal the very next time it is requested.
Let's consider our above example benefitting from the Cache Expiration module. For simplicity, say that all of the 100 cached pages happen to be nodes with content imported or updated from an RSS feed. In such case, only the content type for feed items will get expired but all other content — e.g. articles, news, products, links, etc. will remain cached. Only 100 instead of 10,000 pages will have to be re-generated.
With Cache Expiration module we can set the Boost's blanket expiration time much higher, e.g. to a few days or even weeks, knowing that the pages that are changed will have their cache expired automatically.
Crawling in the Background
The world of caching scenarios is complex because of the number of variables we need to take into consideration. Our performance improvement above may look impressive but we can still do much better. Let's do another thought experiment on our example site from above.
On our site of 300,000 pages there are several content types. One of them is called "Article", with 50,000 nodes. Our Boost expiration is set to 1 week because we rely on the more selective Cache Expiration functionality. So after 3 days our website visitors (including search engine spiders) happened to click on all 50,000 articles. Then somebody notices a small typo in one of the articles, we fix it and save the page, but by doing that we expire all 50,000 articles. That means our users will collectively have to endure 50,000 slow page loads again in the coming days before all articles are cached again. They won't be happy. And anyway, why do users have to spend their time to generate Boost cache on our website?!
Enter Boost Crawler, which is in fact a new sub-module of the Boost module. First of all, its name is incorrect — it does not crawl the way search engine spiders do, discovering and following links. What it does is pre-caching pages that happen to expire from Boost cache. And it does that in the backround, using another excellent helper module called HTTP Parallel Request & Threading Library. If our 50,000 articles get expired, at each cron run the HTTPRL module spawns several background processes and invisibly caches a number of the pages without bothering the user. Obviously, shortly after the mass expiration, it will still happen that our users will request a page that is not yet re-cached by HTTPRL, but over time the chances of that happening will continue to diminish.
Here's a complete set of steps to configure the magic bullet for your Drupal 7 performance.
Get Boost, Cache Expiration, and HTTP Parallel Request & Threading Library. On the Modules page (/admin/modules), enable "Boost", "Boost Crawler" and "Cache Expiration". You may immediately enable also "HTTP Parallel Request & Threading Library" — or it will be enabled automatically since it is required by Boost Crawler.
On the Boost Settings tab (/admin/config/system/boost), adjust the Maximum Cache Lifetime to a long period, e.g. 1 week, or even longer. Make sure the Minimum Cache Lifetime remains zero. (Leave XML and Ajax/JSON file caching off for the time being — you can activate those later if needed.)
List of Ingredients
Not all of the following modules are necessary, but all of them contribute to the smoothest possible intelligent file caching experience.
Magic potion ingredients
See also the Related Modules tab at /admin/config/system/boost/listmodules
Drupal Drupal 7 cache performance speed Boost Crawler Cache Expiration expire file user experience search engine optimization Varnish