Slide 1
Agenda
HTTP in 3 minutes
Caching concepts
Hit, Miss, Revalidation
5 techniques for caching and cache-busting
Not covered in this talk
Proxy deployment
HTTP acceleration (a k a reverse proxies)
Database query results caching

HTTP and Proxy Review
HTTP: Simple and elegant
HTTP: Simple and elegant
HTTP example
mradwin@machshav:~$ telnet www.example.com 80
Trying 192.168.37.203...
Connected to w6.example.com.
Escape character is '^]'.
GET /foo/index.html HTTP/1.1
Host: www.example.com
HTTP/1.1 200 OK
Date: Wed, 28 Jul 2004 23:36:12 GMT
Last-Modified: Thu, 12 May 2005 21:08:50 GMT
Content-Length: 3688
Connection: close
Content-Type: text/html
<html><head>
<title>Hello World</title>
...

Browsers use private caches
Revalidation (Conditional GET)
Non-Caching Proxy
Caching Proxy: Miss
Caching Proxy: Hit
Caching Proxy: Revalidation
Top 5 Caching Techniques
Assumptions about content types
Top 5 techniques for publishers
Use Cache-Control: private for personalized content
Implement “Images Never Expire” policy
Use a cookie-free TLD for static content
Use Apache defaults for occasionally-changing static content
Use random tags in URL for accurate hit metering or very sensitive content

1. Cache-Control: private
for personalized content
Bad Caching: Jane’s 1st visit
Bad Caching: Jane’s 2nd visit
Bad Caching: Mary’s visit
What’s cacheable?
HTTP/1.1 allows caching anything by default
Unless overridden with Cache-Control header
In practice, most caches avoid anything with
Cache-Control/Pragma header
Cookie/Set-Cookie header
WWW-Authenticate/Authorization header
POST/PUT method
302/307 status code (redirects)
SSL content

Cache-Control: private
Shared caches bad for shared content
Mary shouldn’t be able to read Jane’s mail
Private caches perfectly OK
Speed up web browsing experience
Avoid personalization leakage with single line in httpd.conf or .htaccess
Header set Cache-Control private

2. “Images Never Expire” policy
“Images Never Expire” Policy
Dictate that images (icons, logos) once published never change
Set Expires header 10 years in the future
Use new names for new versions
http://us.yimg.com/i/new.gif
http://us.yimg.com/i/new2.gif
Tradeoffs
More difficult for designers
Faster user experience, bandwidth savings

Imgs Never Expire: mod_expires
# Works with both HTTP/1.0 and HTTP/1.1
# (10*365*24*60*60) = 315360000 seconds
ExpiresActive On
ExpiresByType image/gif A315360000
ExpiresByType image/jpeg A315360000
ExpiresByType image/png A315360000

Imgs Never Expire: mod_headers
# Works with HTTP/1.1 only
<FilesMatch "\.(gif|jpe?g|png)$">
  Header set Cache-Control \
  "max-age=315360000"
</FilesMatch>
# Works with both HTTP/1.0 and HTTP/1.1
<FilesMatch "\.(gif|jpe?g|png)$">
  Header set Expires \
  "Mon, 28 Jul 2014 23:30:00 GMT"
</FilesMatch>

mod_images_never_expire
/* Enforce policy with module that runs at URI translation hook */
static int translate_imgexpire(request_rec *r) {
  const char *ext;
  if ((ext = strrchr(r->uri, '.')) != NULL) {
    if (strcasecmp(ext,".gif") == 0 || strcasecmp(ext,".jpg") == 0 ||
        strcasecmp(ext,".png") == 0 || strcasecmp(ext,".jpeg") == 0) {
      if (ap_table_get(r->headers_in,"If-Modified-Since") != NULL ||
          ap_table_get(r->headers_in,"If-None-Match") != NULL) {
        /* Don't bother checking filesystem, just hand back a 304 */
        return HTTP_NOT_MODIFIED;
      }
    }
  }
  return DECLINED;
}

3. Cookie-free static content
Use a cookie-free Top Level Domain for static content
For maximum efficiency use 2 domains
www.example.com for dynamic HTML
static.example.net for images
Many proxies won’t cache Cookie requests
But: multimedia is never personalized
Cookies irrelevant for images

Typical GET request w/Cookies
GET /i/foo/bar/quux.gif HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707 Firefox/0.8
Accept: application/x-shockwave-flash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1
Cookie: U=mt=vtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEcA--&ux=IIr.AB&un=42vnticvufc8v; brandflash=1; B=amfco1503sgp8&b=2; F=a=NC184LcsvfX96G.JR27qSjCHu7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7&b=K1It; LYC=l_v=2&l_lv=7&l_l=h03m8d50c8bo &l_s=3yu2qxz5zvwquwwuzv22wrwr5t3w1zsr&l_lid=14rsb76&l_r=a8&l_um=1_0_1_0_0; GTSessionID835990899023=83599089902340645635; Y=v=1&n=6eecgejj7012f &l=h03m8d50c8bo/o&p=m012o33013000007&jb=16|47|&r=a8&lg=us&intl=us&np=1; PROMO=SOURCE=fp5; YGCV=d=; T=z=iTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-&a=YAE&sk=DAAwRz5HlDUN2T&d=c2wBT0RBekFURXdPRFV3TWpFek5ETS0BYQFZQUUBb2sBWlcwLQF0aXABWUhaTVBBAXp6AWlUdS5BQmdXQQ--&af=QUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ftc1ZFZUhBLS0-; LYS=l_fh=0&l_vo=myla; PA=p0=dg13DX4Ndgk-&p1=6L5qmg--&e=xMv.AB; YP.us=v=2&m=addr&d=1525+S+Robertson+Blvd%01Los+Angeles%01CA%0190035-4231%014480%0134.051590%01-118.384342%019%01a%0190035
Referer: http://www.example.com/foo/bar.php?abc=123&def=456
Accept-Language: en-us,en;q=0.7,he;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

Same request, no Cookies
GET /i/foo/bar/quux.gif HTTP/1.1
Host: static.example.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707 Firefox/0.8
Accept: application/x-shockwave-flash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1
Referer: http://www.example.com/foo/bar.php?abc=123&def=456
Accept-Language: en-us,en;q=0.7,he;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Bonus: much smaller GET request
Dial-up MTU size 576 bytes, PPPoE 1492
1450 bytes reduced to 550

4. Apache defaults for static, occasionally-changing content
Revalidation works well
Apache handles revalidation for static content
Browser sends If-Modified-Since request
Server replies with short 304 Not Modified
No special configuration needed
Use if you can’t predict when content will change
Page designers can change immediately
No renaming necessary
Cost: extra HTTP transaction for 304
Smaller with Keep-Alive, but large sites disable

Successful revalidation
Updated content
5. URL Tags for sensitive content, hit metering
URL Tag technique
Idea
Convert public shared proxy caches into private caches
Without breaking real private caches
Implementation: pretty simple
Assign a per-user URL tag
No two users use same tag
Users never see each other’s content

URL Tag example
Goal: accurate advertising statistics
Do you trust proxies?
Send Cache-Control: must-revalidate
Count 304 Not Modified log entries as hits
If you don’t trust ’em
Ask client to fetch tagged image URL
Return 302 to highly cacheable image file
Count 302s as hits
Don’t bother to look at cacheable server log

Hit-metering for ads (1)
<script type="text/javascript">
var r = Math.random();
var t = new Date();
document.write("<img width='109' height='52' src='http://ads.example.com/ad/foo/bar.gif?t=" + t.getTime() + ";r=" + r + "'>");
</script>
<noscript>
<img width="109" height="52" src= "http://ads.example.com/ad/foo/bar.gif?js=0">
</noscript>

Hit-metering for ads (2)
GET /ad/foo/bar.gif?t=1090538707;r=0.510772917234983 HTTP/1.1
Host: ads.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707 Firefox/0.8
Referer: http://www.example.com/foo/bar.php?abc=123&def=456
Cookie: uid=C50DF33E-E202-4206-B1F3-946AEDF9308B
HTTP/1.1 302 Moved Temporarily
Date: Wed, 28 Jul 2004 23:45:06 GMT
Location: http://static.example.net/i/foo/bar.gif
Content-Type: text/html
<a href="http://static.example.net/i/foo/bar.gif">Moved</a>

Hit-metering for ads (3)
GET /i/foo/bar.gif HTTP/1.1
Host: static.example.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707 Firefox/0.8
Referer: http://www.example.com/foo/bar.php?abc=123&def=456
HTTP/1.1 200 OK
Date: Wed, 28 Jul 2004 23:45:07 GMT
Last-Modified: Mon, 05 Oct 1998 18:32:51 GMT
ETag: "69079e-ad91-40212cc8"
Cache-Control: public,max-age=315360000
Expires: Mon, 28 Jul 2014 23:45:07 GMT
Content-Length: 6096
Content-Type: image/gif
GIF89a...

URL Tags & user experience
Does not require modifying HTTP headers
No need for Pragma: no-cache or Expires in past
Doesn’t break the Back button
Browser history & visited-link highlighting
JavaScript timestamps/random numbers
Easy to implement
Breaks visited link highlighting
Session or Persistent ID preserves history
A little harder to implement

Breaking the Back button
User expectation: Back button works instantly
Private caches normally enable this behavior
Aggressive cache-busting breaks Back button
Server sends Pragma: no-cache or Expires in past
Browser must re-visit server to re-fetch page
Hitting network much slower than hitting disk
User perceives lag
Use aggressive approach very sparingly
Compromising user experience is A Bad Thing

Summary
Review: Top 5 techniques
Use Cache-Control: private for personalized content
Implement “Images Never Expire” policy
Use a cookie-free TLD for static content
Use Apache defaults for occasionally-changing static content
Use random tags in URL for accurate hit metering or very sensitive content

Pro-caching techniques
Cache-Control: max-age=<bignum>
Expires: <10 years into future>
Generate “static content” headers
Last-Modified, ETag
Content-Length
Avoid “cgi-bin”, “.cgi” or “?” in URLs
Some proxies (e.g. Squid) won’t cache
Workaround: use PATH_INFO instead

Cache-busting techniques
Use POST instead of GET
Use random strings and “?” char in URL
Omit Content-Length & Last-Modified
Send explicit headers on response
Breaks the back button
Only as a last resort
Cache-Control: max-age=0,no-cache,no-store
Expires: Tue, 11 Oct 1977 12:34:56 GMT
Pragma: no-cache

Recommended Reading
Web Caching and Replication
Michael Rabinovich &
Oliver Spatscheck
Addison-Wesley, 2001
Web Caching
Duane Wessels
O'Reilly, 2001

Slide 48