Prefetching Hints – Helping Firefox and Google speed up your site
The Prefetching Problem
Wouldn’t it be better to download the next page we’ll want to click while we’re reading the one before? That’s the thinking behind prefetching, whether it’s done by the Firefox browser or the Google Web Accelerator. There’s been a lot of controversy about whether browsers should do this kind of thing. If a site is on fast enough hardware and has a lot of bandwidth to spare, it makes sense to let users download pages they’re likely to want in advance. On the other hand, for a site with limited resources, a bunch of clients downloading pages they may not even look at will only slow things down for everybody.
Clearly the problem is not with prefetching itself, it’s with deciding which pages to prefetch. The browser has no idea how busy the server is or how much spare bandwidth it has. Not only that, it also has no reliable way of telling a link the user is likely to click from a link that nobody cares about.
As web server administrators, on the other hand, we know about all these things. We have data about how much bandwidth our sites are allowed and how much they are using, which pages are cheap to deliver and which ones involve expensive database queries, how much memory we’re using, how much strain the CPU is under – everything we need to judge whether prefetching our pages will make things better or worse for our readers. Not only that, we also have our web server logs, giving us real data from real people about which pages our users like to click, and where they are likely to go next.
I will suggest a couple of things we can do to take control of the prefetching process, discourage badly-behaved clients from prefetching too much, and give the browser the information it needs to make our users’ experience better.
Providing prefetching hints
Now we’ve dealt with browsers trying to do things the wrong way, let’s provide some hints to help the ones that are trying to do it right.
The conventional way to tell the browser to prefetch something is to put a <link> tag in the body of our page. For example, if I think there’s a good chance someone reading this page will want to go and look at my top page as well, I can stick something like this in the head of my HTML document:
<link rel="prefetch" href="/index.htm">
That’s fine if we know what we’ll want people to prefetch when we make the page. But we probably don’t. People won’t necessarily click what we think they’re going to click, and we want to be able to adjust how much is prefetched according to how much spare capacity we’ve got on our server.
So instead, let’s keep our prefetching rules separate from our website content. Rather than putting <link> tags in every page on our site, we’ll inject prefetching hints into the headers of the responses that our server sends to the browser. That way we can easily regenerate the rules to keep up with changes in usage patterns, and scale back or turn off prefetching altogether if our server gets too busy. (Many thanks to Darin Fisher for his help with this.)
If we haven’t already done so, we’ll need to turn on apache’s mod_headers.
In apache2, we can do that like this:
…then get apache to reload itself with:
Now let’s try making a prefetching hint for Firefox. When we’re done, the following will tell the Firefox to prefetch the top page of my website when it’s finished downloading this page:
<IfModule mod_headers.c> <Location /programming/pf.htm> Header append Link "</index.htm>; rel=prefetch" </Location> <IfModule>
I’ve called this file prefetch.conf and stuck it in my apache2 configuration directory (/etc/apache2). To tell Apache to read it, we need an Include statement in the configuration file like this:
Once we’ve reloaded apache, we should be able to check our logs and find that requests for /programming/pf.htm are immediately followed by prefetch requests for /index.htm.
If this doesn’t seem to be working, you may want to check whether the Link header is really being set. You can either use Firefox’s Live HTTP Headers Extension or do it the old-fashioned way with wget -S. When testing, bear in mind that the file we’re pre-fetching may already have been cached by the browser. It might be easier to test this by telling Firefox to pre-fetch a non-existent file, then checking for the resulting 404 in the server logs.