Usually, a CDN is just a cache you put in front of a dynamic website. You set up the CDN to be the first server your clients get data from, the CDN quickly decides if it was a copy cached or otherwise it asks the origin server for a fresh copy. So far so good, but if you really care about squeezing that extra performance out you need to worry about having a decent TTL and as soon as you make the TTL more than a couple of minutes you need to think about cache invalidation. You also need to worry about preventing certain endpoints from ever getting caught in the CDN which could be very bad.
For this site, www.peterbe.com, I'm using KeyCDN which I've blogged out here: "I think I might put my whole site behind a CDN" and here: "KeyCDN vs. DigitalOcean Nginx". KeyCDN has an API and a python client which I've contributed to.
The next problem is; how do you test all this stuff on your laptop? Unfortunately, you can't deploy a KeyCDN docker image or something like that, that attempts to mimic how it works for reals. So, to simulate a CDN locally on my laptop, I'm using Nginx. It's definitely pretty different but it's not the point. The point is that you want something that acts as a reverse proxy. You want to make sure that stuff that's supposed to be cached gets cached, stuff that's supposed to be purged gets purged and that things that are always supposed to be dynamic is always dynamic.
The Configuration
First I add peterbecom.local
into /etc/hosts
like this:
▶ cat /etc/hosts | grep peterbecom.local 127.0.0.1 peterbecom.local origin.peterbecom.local ::1 peterbecom.local origin.peterbecom.local
Next, I set up the Nginx config (running on port 80) and the configuration looks like this:
proxy_cache_path /tmp/nginxcache levels=1:2 keys_zone=STATIC:10m inactive=24h max_size=1g; server { server_name peterbecom.local; location / { proxy_cache_bypass $http_secret_header; add_header X-Cache $upstream_cache_status; proxy_set_header x-forwarded-host $host; proxy_cache STATIC; # proxy_cache_key $uri; proxy_cache_valid 200 1h; proxy_pass http://origin.peterbecom.local; } access_log /tmp/peterbecom.access.log combined; error_log /tmp/peterbecom.error.log info; }
By the way, I've also set up origin.peterbecom.local
to be run in Nginx too but it could just be proxy_pass http://localhost:8000;
to go straight to Django. Not relevant for this context.
The Purge
Without the commercial version of Nginx (Plus) you can't do easy purging just for purging sake. But with proxy_cache_bypass $http_secret_header;
it's very similar to purging except that it immediately makes a request to the origin.
First, to test that it works, I start up Nginx and Django and now I can run:
▶ curl -v http://peterbecom.local/about > /dev/null < HTTP/1.1 200 OK < Server: nginx/1.15.10 < Cache-Control: public, max-age=3672 < X-Cache: MISS ...
(Note the X-Cache: MISS
which comes from add_header X-Cache $upstream_cache_status;
)
This should trigger a log line in /tmp/peterbecom.access.log
and in the Django runserver
foreground logs.
At this point, I can kill the Django server and run it again:
▶ curl -v http://peterbecom.local/about > /dev/null < Server: nginx/1.15.10 < HTTP/1.1 200 OK < Cache-Control: max-age=86400 < Cache-Control: public < X-Cache: HIT ...
Cool! It's working without Django running. As expected. This is how to send a "purge request"
▶ curl -v -H "secret-header:true" http://peterbecom.local/about > /dev/null > GET /about HTTP/1.1 > secret-header:true > < HTTP/1.1 502 Bad Gateway ...
Clearly, it's trying to go to the origin, which was killed, so you start that up again and you get back to:
▶ curl -v http://peterbecom.local/about > /dev/null < HTTP/1.1 200 OK < Server: nginx/1.15.10 < Cache-Control: public, max-age=3672 < X-Cache: MISS ...
In Python
In my site, there are Django signals that are triggered when a piece of content changes and I'm using python-keycdn-api in production but obviously, that won't work with Nginx. So I have a local setting and my Python code looks like this:
# This function gets called by a Django `post_save` signal
# among other things such as cron jobs and management commands.
def purge_cdn_urls(urls):
if settings.USE_NGINX_BYPASS:
# Note! This Nginx trick will not just purge the proxy_cache, it will
# immediately trigger a refetch.
x_cache_headers = []
for url in urls:
if "://" not in url:
url = settings.NGINX_BYPASS_BASEURL + url
r = requests.get(url, headers={"secret-header": "true"})
r.raise_for_status()
x_cache_headers.append({"url": url, "x-cache": r.headers.get("x-cache")})
print("PURGED:", x_cache_headers)
return
...the stuff that uses keycdn...
Notes and Conclusion
One important feature is that my CDN is a CNAME for www.peterbe.com
but it reaches the origin server on a different URL. When my Django code needs to know the outside facing domain, I need to respect that. The communication between by the CDN and my origin is a domain I don't want to expose. What KeyCDN does is that they send an x-forwarded-host
header which I need to take into account when understanding what outward facing absolute URL was used. Here's how I do that:
def get_base_url(request):
base_url = ["http"]
if request.is_secure():
base_url.append("s")
base_url.append("://")
x_forwarded_host = request.headers.get("X-Forwarded-Host")
if x_forwarded_host and x_forwarded_host in settings.ALLOWED_HOSTS:
base_url.append(x_forwarded_host)
else:
base_url.append(request.get_host())
return "".join(base_url)
That's about it. There are lots of other details I glossed over but the point is that this works good enough to test that the cache invalidation works as expected.
Comments
This site is a pull zone of your origin on KeyCDN?
Wow, my site is now load balanced by my two servers using Flexbalancer and proxy_pass and static files uploading via git, while the most of the contents are static.
Makes me wanna try KeyCDN. My only concern is that the ability to proxy_pass to a dynamic backend of a push zone. A pull zone might be a nightmire for the first visitors from each region in KeyCDN due to the slow fetching.