As described here, I recently moved an existing Drupal 6 site (DogTagArt.com) onto a new Pantheon Mercury 1.0 image running on Amazon's EC2 "cloud". This image included a Pressflow Drupal 6 distribution with a Varnish 3.0 front end serving as a reverse proxy. However, the site's performance was nowhere near what we expected with this setup.
I used varnishstat to check, and found that the site's hitrate was around 1:10 (1 page returned directly from the Varnish cache for every 10 requests). The "norm" for a quality Varnish install should be around 10:1, so we had lots of room for improvement!
My research on Varnish told me that it works by caching web content (images, pages, etc) and sending it to the browser without asking the back-end (Drupal in this case) to generate the content. I learned that Varnish is very conservative, however, and will not cache anything that can be session-specific. Primarily, browsers and web servers handle session content by setting and passing cookies. So Varnish will not cache anything that has a cookie. So, I started by finding out what cookies were being served here.
There are several ways to view cookies, but I like to use Chrome's Developer Tools. To reach this, browse to your site, press F12, then Resources, then expand Cookies, and click on your site name. Your cookies will be listed. This list will update as you browse. Here are the steps I took to handle each:
Google Analytics
These were named __utma, __utmb, __utmc and __utmz. I followed the advice here [https://www.varnish-cache.org/docs/trunk/tutorial/cookies.html] to modify the varnish.vcl file to "strip" these cookies, plus one more named "has_js". By "strip", I mean that varnish removes those cookies from consideration when deciding to cache a page, or return a page from the cache. This works due to the way Google uses those cookies. This is NOT a good solution for all cookies.
Images and other static files
I also noticed that even the image files had cookies attached, so I used this example from Lullabot [https://gist.github.com/985112] to remove all cookies from a number of static file types (png, jpg, htm, etc.) This change had a dramatic effect on the varnish hitrate, but not much effect on the site performance (since the static files are handled directly from apache, not from drupal, the load is not that great). That Lullabot example is chock-full of good ideas, so I recommend you study it carefully, and use the relevant parts for your varnish config.
Drupal Session Cookie
This left me with one unidentified cookie which was showing up on every Drupal page. It started with "SESS", then a random string of numbers. Its value was likewise unhelpful. I learned that this cookie was being written by Drupal whenever the $_SESSION[] variable was set, and that it represented the "key" to the SESSIONS table. My research indicated that using Pressflow _should_ eliminate this cookie for anonymous users. I had to look in the SESSIONS table to find clues to this cookie, using sql "select hostname, uid, from_unixtime(timestamp), session from sessions order by timestamp desc" to see what session variables were in use, particularly for uid 0 (the anonymous user). The session column contains 0-n session variable names with serialized values, separated by ";".
From the SQL, the obvious biggie was "uc_cart_id", which was showing up numerous times for the same anonymous host, a few seconds apart. Turns out that crawlers don't support cookies, so crawlers were getting a new session record for every access! This not only hurt varnish, it was blowing up the sessions table with unneeded records! I used the suggestion here: [http://drupal.org/node/377798#comment-3083568] to address this. A better solution would have been to use the Cocomore version of Ubercart, but (unfortunately), this site's copy of Ubercart had already been "customized" and undoing that was not feasible. Another similar cookie "uc_referer_uri" was removed following the suggestion here [http://drupal.org/node/273574#comment-5015198]
There were several additional session variables being set by custom modules written for this site. I had to rewrite those modules to either not use the session variable at all, or only use it when actually needed. In one example, the author was writing a session variable to "save" an empty array!
Once the unneeded creation of the session variables was addressed, I made another change to reduce the duration of sessions from the advice here: [http://success.grownupgeek.com/index.php/2008/08/14/drupal-sessions-table/] These changes reduced the size of the SESSIONS table from millions of records down to hundreds.
VarnishTop
To verify that only the expected pages are going to the backend (Drupal), I run the command "varnishtop -b -i txurl" for a while. This shows one line for each url requested from the backend. Pages which are requested most often stay near the top of the list. Basically, if the "score" for a page is > 1, then it is _not_ being cached.
Hopefully, this information will help someone else to solve Drupal + Varnish + Ubercart issues more quickly than I did (the first time). If you have questions or need help, contact me (SteveT) using the contact link below.
