Secs sell! How I cache my entire pages (server-side)

May 10, 2012
1 comment Python, Django

I've blogged before about how this site can easily push out over 2,000 requests/second using only 6 WSGI workers excluding latency. The reason that's possible is because the whole page(s) can be cached server-side. What actually happens is that the whole rendered HTML blob is stored in the cache server (Redis in my case) so that no database queries are needed at all.

I wanted my site to still "feel" dynamic in the sense that once you post a comment (and it's published), the page automatically invalidates the cache and thus, the user doesn't have to refresh his browser when he knows it should have changed. To accomplish this I used a hacked cache_page decorator that makes the cache key depend on the content it depends on. Here's the code I actually use today for the home page:


def _home_key_prefixer(request):
    if request.method != 'GET':
        return None
    prefix = urllib.urlencode(request.GET)
    cache_key = 'latest_comment_add_date'
    latest_date = cache.get(cache_key)
    if latest_date is None:
        # when a blog comment is posted, the blog modify_date is incremented
        latest, = (BlogItem.objects
                   .order_by('-modify_date')
                   .values('modify_date')[:1])
        latest_date = latest['modify_date'].strftime('%f')
        cache.set(cache_key, latest_date, 60 * 60)
    prefix += str(latest_date)

    try:
        redis_increment('homepage:hits', request)
    except Exception:
        logging.error('Unable to redis.zincrby', exc_info=True)

    return prefix


@cache_page_with_prefix(60 * 60, _home_key_prefixer)
def home(request, oc=None):
    ...
    try:
        redis_increment('homepage:misses', request)
    except Exception:
        logging.error('Unable to redis.zincrby', exc_info=True)
    ...

And in the models I then have this:


@receiver(post_save, sender=BlogComment)
@receiver(post_save, sender=BlogItem)
def invalidate_latest_comment_add_dates(sender, instance, **kwargs):
    cache_key = 'latest_comment_add_date'
    cache.delete(cache_key)

So this means:

  • whole pages are cached for long time for fast access
  • updates immediately invalidates the cache for best user experience
  • no need to mess with ANY SQL caching

So, the next question is, if posting a comment means that the cache is invalidated and needs to be populated, what's the ratio of hits versus hits where the cache is cleared? Glad you asked. That's why I made this page:

www.peterbe.com/stats/

It allows me to monitor how often a new blog comment or general time-out means poor django needs to re-create the HTML using SQL.

At the time of writing, one in every 25 hits to the homepage requires the server to re-generate the page. And still the content is always fresh and relevant.

The next level of optimization would be to figure out whether a particular page update (e.g. a blog comment posting on a page that isn't featured on the home page) should or should not invalidate the home page. esp

On the command line no one can hear you screen. Or can they?

May 3, 2012
2 comments Linux

This is how you check if a command (with or without any output) exited successfully or if it exited with something other than 0, in bash:

#!/bin/bash
./someprogram
WORKED=$?
if [ "$WORKED" != 0 ]; then
  echo "FAILED"
else
  echo "WORKED"
fi

But how do you inspect this on the command line? I actually don't know, until it hit me. The simplest possible solution:

$ ./someprogram && echo worked || echo failed

What a great low-tech solution. I just works. If you're on OSX, you can nerd it up a bit more:

$ ./someprogram && say worked || say failed

Are WebSockets faster than AJAX? ...with latency in mind?

April 22, 2012
25 comments Web development, JavaScript

The advantage with WebSockets (over AJAX) is basically that there's less HTTP overhead. Once the connection has been established, all future message passing is over a socket rather than new HTTP request/response calls. So, you'd assume that WebSockets can send and receive much more messages per unit time. Turns out that that's true. But there's a very bitter reality once you add latency into the mix.

So, I created a simple app that uses SockJS and an app that uses jQuery AJAX to see how they would perform under stress. Code is here. All it does is basically, send a simple data structure to the server which echos it back. As soon as the response comes back, it starts over. Over and over till it's done X number of iterations.

Here's the output when I ran this on localhost here on my laptop:

# /ajaxtest (localhost)
start!
Finished
10 iterations in 0.128 seconds meaning 78.125 messages/second
start!
Finished
100 iterations in 0.335 seconds meaning 298.507 messages/second
start!
Finished
1000 iterations in 2.934 seconds meaning 340.832 messages/second

# /socktest (localhost)
Finished
10 iterations in 0.071 seconds meaning 140.845 messages/second
start!
Finished
100 iterations in 0.071 seconds meaning 1408.451 messages/second
start!
Finished
1000 iterations in 0.466 seconds meaning 2145.923 messages/second

Wow! It's so fast that the rate doesn't even settle down. Back-of-an-envelope calculation tells me the WebSocket version is 5 times faster roughly. Again; wow!

Now reality kicks in! It's obviously unrealistic to test against localhost because it doesn't take latency into account. I.e. it doesn't take into account the long distance the data has to travel from the client to the server.

So, I deployed this test application on my server in London, England and hit it from my Firefox here in California, USA. Same number of iterations and I ran it a number of times to make sure I don't get hit by sporadic hickups on the line. Here are the results:

# /ajaxtest (sockshootout.peterbe.com)
start!
Finished
10 iterations in 2.241 seconds meaning 4.462 messages/second
start!
Finished
100 iterations in 28.006 seconds meaning 3.571 messages/second
start!
Finished
1000 iterations in 263.785 seconds meaning 3.791 messages/second

# /socktest (sockshootout.peterbe.com) 
start!
Finished
10 iterations in 5.705 seconds meaning 1.752 messages/second
start!
Finished
100 iterations in 23.283 seconds meaning 4.295 messages/second
start!
Finished
1000 iterations in 227.728 seconds meaning 4.391 messages/second

Hmm... Not so cool. WebSockets are still slightly faster but the difference is negligable. WebSockets are roughly 10-20% faster than AJAX. With that small a difference I'm sure the benchmark is going to vastly effected by other factors that make it unfair for one or the the other such as quirks in my particular browser or the slightest hickup on the line.

What can we learn from this? Well, latency kills all the fun. Also, it means that you don't necessarily need to re-write your already working AJAX heavy app just to gain speed because even though it's ever so slightly faster, the switch from AJAX to WebSocket comes with other risks and challenges such as authentication cookies, having to deal with channel concurrency, load balancing on the server etc.

Before you say it, yes I'm aware than WebSocket web apps comes with other advantages such as being able to hold on to sockets and push data at will from the server. Those are juicy benefits but massive performance boosts ain't one.

Also, I bet that writing this means that peeps will come along and punch hole in my code and my argument. Something I welcome with open arms!

I'm running pgFouine right now on my server

April 19, 2012
0 comments Linux

pgFouine in action on my server
pgFouine is a PostgreSQL log analyzer. You basically, configure your Postgres server to be very verbose about all statements. Then, you simply run the pgfouine.php command against the log file and it spits out a page like this:

www.peterbe.com/pgfouine.html

Running all this verbose logging will obviously slow down the database server a bit so I'm only going to be running this temporarily. The overhead is actually pretty small but it's also piling on quite a few bytes in terms of the size of the log file.

So, at the time of writing, it's been about 1 day running and it has captured about 70,000 queries (by the time you look at the file it might have gone up significantly). I haven't started actually looking at it in detail yet but it's clear that there's some use of the LIKE operator that Postgres spends most of its time on.

You can configure your pgFouine to filter on specific databases. I have not done so because I'm at the moment just interested in what the whole database server is getting up to. Most of these guilty queries comes from the Crosstips site. Maybe it's time to optimize the worst performing queries there a bit.

UPDATE

After running for 24 hours, I did some low-hanging fruit optimization to the biggest culprits and reset the logs. The first 24 hours report is still here: www.peterbe.com/pgfouine.1.html

UPDATE 2

I've stopped logging all queries now. The results are still there. I'm quite pleased with the results so far.

Secs sell! How frickin' fast this site is! (server side)

April 5, 2012
2 comments Linux, Web development, Django

This is part 2. Part 1 is here about how I managed to make this site fast.

The web framework powering this site is Django and in front of that is Nginx which serves all the static content (once before Amazon CloudFront CDN takes over) and all non-static traffic is passed on to a uWSGI daemon which is running 6 worker processes. The database that stores the content is PostgreSQL and all caching is done in Redis. Actually another Redis database is used for other things such as maintaining a quick look-up index of keywords to primary keys so that I can quickly mesh together blog posts by keywords.

However, as we all know the deciding factor of a web sites server-side speed is effectively the speed of the database or any other disk-bound I/O device. To remedy this I've set up some practical caching strategies which I'm quite happy with.

So, how fast is it? Here's an ab stress test against home page with 10,000 requests spread across 10 concurrent users:

Document Path:          /
Document Length:        73272 bytes

Concurrency Level:      10
Time taken for tests:   4.426 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      734250000 bytes
HTML transferred:       732720000 bytes
Requests per second:    2259.59 [#/sec] (mean)
Time per request:       4.426 [ms] (mean)
Time per request:       0.443 [ms] (mean, across all concurrent requests)
Transfer rate:          162022.11 [Kbytes/sec] received

I could probably make that 2,300 requests/second to 3,000 or 4,000 if I just increase the number of workers. However, that costs memory and since I'm currently running 19 other uWSGI workers on this server that all (all 25) in total take up a steady 1.4 Gb I don't feel like increasing that number much more. Besides since this site doesn't really get any traffic, I'm not so concerned about massive throughput on concurrent benchmarks but more about serving each and every page as fast as possible the few times it's called.

Every single page on this site is behind some sort of internal cache. The only time the PostgreSQL is involved is in rendering a page is when it's first requested after a comment has been entered or I've added (or edited) a new post. Thing is, I don't want to be inconvenienced by a stupid cache that forces me to wait an hour every time I change something. No, instead lots of Django database model signals are put in place that fire off cache invalidation when certain pieces of data is changed. You can see the code for that here.

So, for the home page for example: For each request, a small piece of Python code checks the Redis for what the latest comment add-date is and based on that tells the Django page_cache decorator to either render the page as normal or to serve the whole HTML payload from Redis. In other words, on a successful cache "hit" it actually needs two Redis look-ups. Even that could be improved and blindly just spare these look-ups by serving from the workers allocated Python memory instead but that would make things fragile, hard to unit test and it would only make the benchmarks faster which is not necessary.

The most important thing to optimize on a web site is the static content. Well, there's little point in serving the static content fast if it takes 3 seconds to say what static content to serve. Also, a fast website is likely to appear more favorable on the Google bot which effectively makes the site appear higher on Google searches.

In the next part, I'll try to share more in-depth technical bits and pieces of what I actually did although they're no secrets I think some of them are best practice and even senior web developers sometimes get them wrong.

Secs sell! How frickin' fast this site is! (client side)

March 30, 2012
7 comments Web development

Secs sell: How frickin' fast this site is!
After a lot of optimization work on this website I finally now get a score of 98 on YSlow! Phew! Finally!

I've managed to get near perfect scores in the past but never on something as "big" and mixed and "multimedia" as this, ie. the home page. The home page on this site contains a lot of content. Lots of thumbnails and lots of code.

As always, it really helps if you can control the requirements. Meaning you can say "No, we don't want an embedded Flash widget with 30kb Javascript". In my case I didn't want content to be dynamic per each user request so the underlying HTML can be properly cached. Also, I don't need any Javascript for the home page because all it does is static content.

Secs sell: How frickin' fast this site is!
My individual blog pages are the only pages that require Javascript. What I did there was let Google host a copy of the latest jQuery and I just add some minified code to handle the AJAX of the comment posting. It's pretty cool that the individual blog post pages get a score of 99 on YSlow even though they contain a decent amount of Javascript.

What I've also done is moved every single image, css and javascript element to the Amazon CloudFront CDN. Yes, this costs money but certainly not much. My web server is located in London, England which is a good location but considering that 70% of my visitors are based in north America it's more fair that 90% of the web page content is served near them instead. This is clearly illustrated with this screenshot from Pingdom.
Secs sell: How frickin' fast this site is!

I'm quite aware that it's 100 times easier to build a fast website when you can simply disregard certain features such as fat picture galleries and massive blocks of Javascript stuff. But mind you, choosing not to add those features is a large part of making fast websites too. The number one rule of making a request fast is to not make it at all.

I'll soon blog more about how I made these things happen from a technical point of view.

IssueTrackerProduct now officially abandoned

March 30, 2012
6 comments Zope, IssueTrackerProduct

In 2001 I started my first and perhaps most successful Open Source project I've ever made: IssueTrackerProduct. After nearly a decade of maintaining it I have now officially abandoned it.

It all started when I needed a way to track feedback on my personal website. That's why it was originally called "SiteTrackerProduct". I needed something where I could collect bug reports and any other pieces of feedback and then process it in some structured fashion. It was therefore very important that it would be possible to run the application open for anonymous access. People should be able to submit bugs and issues without having to create an account. You see, kids, back in that day it was actually very common that sites would force users to register and create accounts even just because the content owner wanted it. These days, it's common knowledge that to get people to open up and share anything for others to benefit you make it absolutely trivial to jump straight in without having to see a registration page that looks like a tax return form.

Now, since I long ago abandoned the Zope2 application server technology stack and I no longer use IssueTrackerProduct for anything real it's no longer feasible to maintain this project. In the last five years or so we were actually using it actively to track all projects at Fry-IT where I used to work. I have to say, even though we did grow out of it, it was actually successful. It handled the load (after some much needed patches towards optimization) and it was easy for people to actually use since unlike many other bug trackers, it focused on the non-technical end user first and foremost. As much as possible was done to make it trivial to type in your bug or issue and it automatically took care of all notifications and access rights.

Being a personal Open Source project, over the years, it became a melting pot for experimenting and perfecting various new ideas. Many of them we take for granted today but back then it was quite novel if I may say so. This includes:

  • ability to auto-save unfinished form inputs (added before Gmail had it)
  • automatic updates of the content without reload made it possible to see other people participating as you're typing
  • ability to reply directly to email notifications without having to open a web browser
  • an advanced via-the-web programmable interface for adding and modifying custom fields (e.g. "Customer reference code")
  • full-text search combined with ability to search on specific fields by key
  • file attachments that are images automatically appear as little thumbnails
  • file attachments that have text become searchable (e.g Word documents)
  • advanced filtering where you can easily decide to search inclusive or exclusive on certain fields
  • persistent filtering automatically saved and share'able between different users
  • programmable search filters that are coded in Python which made it possible to create very specific reports
  • ability to export and import bugs from and to Excel for offline processing

Writing all of this, I can not resist to get a bit nostalgic. I did sink A LOT of time into this project. Today when I look back at the code and almost feel sick seeing all the mistakes that I made. Much of the ugliness of the code can be attributed partially to the fact that I often used and abused the code to add new features. Also, because we often needed some features (since it was used to manage all of our projects) "yesterday" and then it was hard to justify doing things "properly". For example, the main .py file is over 14,000 lines of code!

I did called it "perhaps most successful Open Source project I've ever made" in the first sentence. The reason for that is that over the years many many people have downloaded it and installed and let it be used by thousands of users. That's something to be proud of.

Anyway! It's time to move on. So long and thank you for all the fish!

The code is still available at github.com/peterbe/IssueTrackerProduct

How much faster is Nginx+gunicorn than Apache+mod_wsgi?

March 22, 2012
9 comments Linux, Django

Short answer: about 5%

I had a few minutes and wanted to see if changing from Apache + mod_wsgi to Nginx + gunicorn would make the otherwise slow site any faster. It's not this site but another Django site for work (which, by the way, doesn't have to be fast). It's slow because it doesn't cache any of the SQL queries.

# with Apache + mod_wsgi
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    39 [#/sec] (mean)
...
# Uses about 110 Mb

That's after running multiple times and roughly averaging the requests per seconds.

# with Nginx + guncorn --workers=4
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    41 [#/sec] (mean)
...
# uses about 70 Mb

So, if you want to make a site fast forget about how the code is being served until all the slow db I/O is taken care of properly.

String length truncation optimization difference in Python

March 19, 2012
8 comments Python

We have a piece of code that is going to be run A LOT on a server infrastructure that needs to be fast. I know that I/O is much more important but because I had the time I wanted to figure out which is fastest:


def a(s, m):
    if len(s) > m:
        s = s[:m]
    return s

...or...


def b(s, m):
    return s[:m]

Truncated! Read the rest by clicking the link below.

When to __deepcopy__ classes in Python

March 14, 2012
9 comments Python

When using mutables in Python you have to be careful:


>>> a = {'value': 1}
>>> b = a
>>> a['value'] = 2
>>> b
{'value': 2}

So, you use the copy module from the standard library:


>>> import copy
>>> a = {'value': 1}
>>> b = copy.copy(a)
>>> a['value'] = 2
>>> b
{'value': 1}

That's nice but it's limited. It doesn't deal with the nested mutables as you can see here:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.copy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'else'}}

That's when you need the copy.deepcopy function:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.deepcopy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'Something'}}

Now, suppose we have a custom class that overrides the dict type. That's a very common thing to do. Let's demonstrate:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(name='Value')
>>> b = copy.copy(a)
>>> a['name'] = 'Other'
>>> b
{'name': 'Value'}

And again, if you have a nested mutable object you need copy.deepcopy:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(data={'name': 'Something'})
>>> b = copy.deepcopy(a)
>>> a['data']['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

But oftentimes you'll want to make your dict subclass behave like a regular class so you can access data with dot notation. Like this:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'

Now here's a problem. If you do that, you loose the ability to use copy.deepcopy since the class has now been slightly "abused".


>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/copy.py", line 172, in deepcopy
    copier = getattr(x, "__deepcopy__", None)
  File "<stdin>", line 3, in __getattr__
KeyError: '__deepcopy__'

Hmm... now you're in trouble and to get yourself out of it you have to define a __deepcopy__ method as well. Let's just do it:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
...     def __deepcopy__(self, memo):
...         return ORM(copy.deepcopy(dict(self)))
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
>>> a.data['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

Yeah!!! Now we get what we want. Messing around with the __getattr__ like this is, as far as I know, the only time you have to go in and write your own __deepcopy__ method.

I'm sure hardcore Python language experts can point out lots of intricacies about __deepcopy__ but since I only learned about this today, having it here might help someone else too.