Peterbe.com

Peter Bengtsson's blog

Filtered by Django

Page 6

A Django base class for all your Forms

November 16, 2013
2 comments Django

This has served me well of the last couple of years of using Django:


from django import forms

class _BaseForm(object):
    def clean(self):
        cleaned_data = super(_BaseForm, self).clean()
        for field in cleaned_data:
            if isinstance(cleaned_data[field], basestring):
                cleaned_data[field] = (
                    cleaned_data[field].replace('\r\n', '\n')
                    .replace(u'\u2018', "'").replace(u'\u2019', "'").strip())

        return cleaned_data


class BaseModelForm(_BaseForm, forms.ModelForm):
    pass


class BaseForm(_BaseForm, forms.Form):
    pass

So instead of doing...


class SigupForm(forms.Form):
    name = forms.CharField(max_length=100)
    nick_name = forms.CharField(max_length=100, required=False)

...you do:


class SigupForm(BaseForm):
    name = forms.CharField(max_length=100)
    nick_name = forms.CharField(max_length=100, required=False)

What it does is that it makes sure that any form field that takes a string strips all preceeding and trailing whitespace. It also replaces the strange "curved" apostrophe ticks that Microsoft Windows sometimes uses.

Yes, this might all seem trivial and I'm sure there's something as good or better out there but isn't it a nice thing to never have to worry about doing things like this again:



class SignupForm(forms.Form):
    ...

    def clean_name(self):
        return self.cleaned_data['name'].strip()

#...or...

form = SignupForm(request.POST)
if form.is_valid():
    name = form.cleaned_data['name'].strip()

UPDATE

This breaks some fields, like DateField.


>>> class F(BaseForm):
...     start_date = forms.DateField()
...     def clean_start_date(self):
...         return self.cleaned_data['start_date']
...
>>> f=F({'start_date': '2013-01-01'})
>>> f.is_valid()
True
>>> f.cleaned_data['start_date']
datetime.datetime(2013, 1, 1, 0, 0)

As you can see, it cleans up '2013-01-01' into datetime.datetime(2013, 1, 1, 0, 0) when it should become datetime.date(2013, 1, 1).

Not sure why yet.

django-fancy-cache with or without stats

March 11, 2013
1 comment Python, Django

If you use django-fancy-cache you can either run with stats or without. With stats, you can get a number of how many times a cache key "hits" and how many times it "misses". Keeping stats incurs a small performance slowdown. But how much?

I created a simple page that either keeps stats or ignores it. I ran the benchmark over Nginx and Gunicorn with 4 workers. The cache server is a memcached running on the same host (my OSX 10.7 laptop).

With stats:

Average: 768.6 requests/second
Median: 773.5 requests/second
Standard deviation: 14.0

Without stats:

Average: 808.4 requests/second
Median: 816.4 requests/second
Standard deviation: 30.0

That means, roughly that running with stats incurs a 6% slower performance.

The stats is completely useless to your users. The stats tool is purely for your own curiousity and something you can switch on and off easily.

Note: This benchmark assumes that the memcached server is running on the same host as the Nginx and the Gunicorn server. If there was more network in between, obviously all the .incr() commands would cause more performance slowdown.

This site is now 100% inline CSS and no bytes are wasted

March 5, 2013
8 comments Python, Web development, Django

This personal blog site of mine uses django-fancy-cache and mincss.

What that means is that I can cache the whole output of every blog post for weeks and when I do that I can first preprocess the HTML and convert every external CSS into one inline STYLE block which will only reference selectors that are actually used.

To see it in action, right-click and select "View Page Source". You'll see something like this:

/*
Stats about using github.com/peterbe/mincss
-------------------------------------------
Requests:         1 (now: 0)
Before:           81Kb
After:            11Kb
After (minified): 11Kb
Saving:           70Kb
*/
section{display:block}html{font-size:100%;-webkit-text-size-adjust:100%;-ms-tex...

The reason the saving is so huge, in my case, is because I'm using Twitter Bootstrap CSS framework which is awesome but as any framework, it will inevitably contain a bunch of stuff that I don't use. Some stuff I don't use on any page at all. Some stuff is used only on some pages and some other stuff is used only on some other pages.

What I gain by this, is faster page loads. What the browser does is that it, gets a URL, downloads all HTML, opens the HTML to look for referenced CSS (using the link tag) and downloads that too. Once all of that is downloaded, it starts to render the page. Approximately after that it starts to download all referenced Javascript and starts evaluating and executing that.

By not having to download the CSS the browser has one less thing to do. Only one request? Well, that request might be on a CDN (not a great idea actually) so even though it's just 1 request it will involve another DNS look-up.

Here's what the loading of the homepage looks like in Firefox from a US east coast IP.

Granted, a downloaded CSS file can be cached by the browser and used for other pages under the same domain. But, on my blog the bounce rate is about 90%. That doesn't necessarily mean that visitors leave as soon as they arrived, but it does mean that they generally just read one page and then leave. For those 10% of visitors who visit more than one page will have to download the same chunk of CSS more than once. But mind you, it's not always the same chunk of CSS because it's different for different pages. And the amount of CSS that is now in-line only adds about 2-3Kb on the HTML load when sent gzipped.

Getting to this point wasn't easy because I first had to develop mincss and django-fancy-cache and integrate it all. However, what this means is that you can have it done on your site too! All the code is Open Source and it's all Python and Django which are very popular tools.

Welcome to the world django-fancy-cache!

March 1, 2013
3 comments Python, Django

** A Django cache_page on steroids**

Django ships with an awesome view decorator called cache_page which is awesome. But a bit basic too.

What it does is that it stores the whole view response in memcache and the key to it is the URL it was called with including any query string. All you have to do is specify the length of the cache timeout and it just works.
Now, it's got some shortcomings which django-fancy-cache upgrades. These "steroids" are:

Ability to override the key prefix with a callable.
Ability to remember every URL that was cached so you can do invalidation by a URL pattern.
Ability to modify the response before it's stored in the cache.
Ability to ignore certain query string parameters that don't actually affect the view but does yield a different cache key.
Ability to serve from cache but always do one last modification to the response.
Incrementing counter of every hit and miss to satisfy your statistical curiosity needs.

The documentation is here:
https://django-fancy-cache.readthedocs.org/

You can see it in a real world implementation by seeing how it's used on my blog here. You basically use it like this::


from fancy_cache import cache_page

@cache_page(60 * 60)
def myview(request):
    ...
    return render(request, 'template.html', stuff)

What I'm doing with it here on my blog is that I make the full use of caching on each blog post but as soon as a new comment is posted, I wipe the cache by basically creating a new key prefix. That means that pages are never cache stale but the views never have to generate the same content more than once.

I'm also using django-fancy-cache to do some optimizations on the output before it's stored in cache.

Real-timify Django with SockJS

September 6, 2012
4 comments Django, JavaScript, Tornado

In yesterdays DjangoCon BDFL Keynote Adrian Holovaty called out that Django needs a Real-Time story. Well, here's a response to that: django-sockjs-tornado

Immediately after the keynote I went and found a comfortable chair and wrote this app. It's basically a django app that allows you to run a socketserver with manage.py like this:

python manage.py socketserver

Now, you can use all of SockJS to write some really flashy socket apps. In Django! Using Django models and stuff. The example included shows how to write a really simple chat application using Django models. check out the whole demo here

If you're curious about SockJS read the README and here's one of many good threads about the difference between SockJS and socket.io.

The reason I could write this app so quickly was because I have already written a production app using sockjs-tornado so the concepts were familiar. However, this app has (at the time of writing) not been used in any production. So mind you it might still need some more love before you show your mom your django app with WebSockets.

Secs sell! How I cache my entire pages (server-side)

May 10, 2012
1 comment Python, Django

I've blogged before about how this site can easily push out over 2,000 requests/second using only 6 WSGI workers excluding latency. The reason that's possible is because the whole page(s) can be cached server-side. What actually happens is that the whole rendered HTML blob is stored in the cache server (Redis in my case) so that no database queries are needed at all.

I wanted my site to still "feel" dynamic in the sense that once you post a comment (and it's published), the page automatically invalidates the cache and thus, the user doesn't have to refresh his browser when he knows it should have changed. To accomplish this I used a hacked cache_page decorator that makes the cache key depend on the content it depends on. Here's the code I actually use today for the home page:


def _home_key_prefixer(request):
    if request.method != 'GET':
        return None
    prefix = urllib.urlencode(request.GET)
    cache_key = 'latest_comment_add_date'
    latest_date = cache.get(cache_key)
    if latest_date is None:
        # when a blog comment is posted, the blog modify_date is incremented
        latest, = (BlogItem.objects
                   .order_by('-modify_date')
                   .values('modify_date')[:1])
        latest_date = latest['modify_date'].strftime('%f')
        cache.set(cache_key, latest_date, 60 * 60)
    prefix += str(latest_date)

    try:
        redis_increment('homepage:hits', request)
    except Exception:
        logging.error('Unable to redis.zincrby', exc_info=True)

    return prefix


@cache_page_with_prefix(60 * 60, _home_key_prefixer)
def home(request, oc=None):
    ...
    try:
        redis_increment('homepage:misses', request)
    except Exception:
        logging.error('Unable to redis.zincrby', exc_info=True)
    ...

And in the models I then have this:


@receiver(post_save, sender=BlogComment)
@receiver(post_save, sender=BlogItem)
def invalidate_latest_comment_add_dates(sender, instance, **kwargs):
    cache_key = 'latest_comment_add_date'
    cache.delete(cache_key)

So this means:

whole pages are cached for long time for fast access
updates immediately invalidates the cache for best user experience
no need to mess with ANY SQL caching

So, the next question is, if posting a comment means that the cache is invalidated and needs to be populated, what's the ratio of hits versus hits where the cache is cleared? Glad you asked. That's why I made this page:

www.peterbe.com/stats/

It allows me to monitor how often a new blog comment or general time-out means poor django needs to re-create the HTML using SQL.

At the time of writing, one in every 25 hits to the homepage requires the server to re-generate the page. And still the content is always fresh and relevant.

The next level of optimization would be to figure out whether a particular page update (e.g. a blog comment posting on a page that isn't featured on the home page) should or should not invalidate the home page. esp

Secs sell! How frickin' fast this site is! (server side)

April 5, 2012
2 comments Linux, Web development, Django

This is part 2. Part 1 is here about how I managed to make this site fast.

The web framework powering this site is Django and in front of that is Nginx which serves all the static content (once before Amazon CloudFront CDN takes over) and all non-static traffic is passed on to a uWSGI daemon which is running 6 worker processes. The database that stores the content is PostgreSQL and all caching is done in Redis. Actually another Redis database is used for other things such as maintaining a quick look-up index of keywords to primary keys so that I can quickly mesh together blog posts by keywords.

However, as we all know the deciding factor of a web sites server-side speed is effectively the speed of the database or any other disk-bound I/O device. To remedy this I've set up some practical caching strategies which I'm quite happy with.

So, how fast is it? Here's an ab stress test against home page with 10,000 requests spread across 10 concurrent users:

Document Path:          /
Document Length:        73272 bytes

Concurrency Level:      10
Time taken for tests:   4.426 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      734250000 bytes
HTML transferred:       732720000 bytes
Requests per second:    2259.59 [#/sec] (mean)
Time per request:       4.426 [ms] (mean)
Time per request:       0.443 [ms] (mean, across all concurrent requests)
Transfer rate:          162022.11 [Kbytes/sec] received

I could probably make that 2,300 requests/second to 3,000 or 4,000 if I just increase the number of workers. However, that costs memory and since I'm currently running 19 other uWSGI workers on this server that all (all 25) in total take up a steady 1.4 Gb I don't feel like increasing that number much more. Besides since this site doesn't really get any traffic, I'm not so concerned about massive throughput on concurrent benchmarks but more about serving each and every page as fast as possible the few times it's called.

Every single page on this site is behind some sort of internal cache. The only time the PostgreSQL is involved is in rendering a page is when it's first requested after a comment has been entered or I've added (or edited) a new post. Thing is, I don't want to be inconvenienced by a stupid cache that forces me to wait an hour every time I change something. No, instead lots of Django database model signals are put in place that fire off cache invalidation when certain pieces of data is changed. You can see the code for that here.

So, for the home page for example: For each request, a small piece of Python code checks the Redis for what the latest comment add-date is and based on that tells the Django page_cache decorator to either render the page as normal or to serve the whole HTML payload from Redis. In other words, on a successful cache "hit" it actually needs two Redis look-ups. Even that could be improved and blindly just spare these look-ups by serving from the workers allocated Python memory instead but that would make things fragile, hard to unit test and it would only make the benchmarks faster which is not necessary.

The most important thing to optimize on a web site is the static content. Well, there's little point in serving the static content fast if it takes 3 seconds to say what static content to serve. Also, a fast website is likely to appear more favorable on the Google bot which effectively makes the site appear higher on Google searches.

In the next part, I'll try to share more in-depth technical bits and pieces of what I actually did although they're no secrets I think some of them are best practice and even senior web developers sometimes get them wrong.

How much faster is Nginx+gunicorn than Apache+mod_wsgi?

March 22, 2012
9 comments Linux, Django

Short answer: about 5%

I had a few minutes and wanted to see if changing from Apache + mod_wsgi to Nginx + gunicorn would make the otherwise slow site any faster. It's not this site but another Django site for work (which, by the way, doesn't have to be fast). It's slow because it doesn't cache any of the SQL queries.

# with Apache + mod_wsgi
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    39 [#/sec] (mean)
...
# Uses about 110 Mb

That's after running multiple times and roughly averaging the requests per seconds.

# with Nginx + guncorn --workers=4
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    41 [#/sec] (mean)
...
# uses about 70 Mb

So, if you want to make a site fast forget about how the code is being served until all the slow db I/O is taken care of properly.

Cryptic errors when using django-nose

December 7, 2011
2 comments Django

After about 3 days of debugging using pdb, print and writing to a log file I've almost finally solve my bizarre errors I was getting when running a whole test suite. The error that it lead to was that Django refused to re-register models to the admin and the errors looked something like this:


 ...
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/urls.py", line 6, in <module>
   admin.autodiscover()
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/vendor/src/django/django/contrib/admin/__init__.py", line 26, in autodiscover
   import_module('%s.admin' % app)
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/vendor/src/django/django/utils/importlib.py", line 35, in import_module
   __import__(name)
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/apps/users/admin.py", line 30, in <module>
   admin.site.register(UserProfile, UserProfileAdmin)
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/vendor/src/django/django/contrib/admin/sites.py", line 85, in register
   raise AlreadyRegistered('The model %s is already registered' % model.__name__)
AlreadyRegistered: The model UserProfile is already registered

Turns out to be independent of which Django project I ran and it was something no one else was able to reproduce on any machine with the exact same code.

After 2 days I found that there's a difference between a successful run and a failing run was how I specified (to nose) which module to load:


./manage.py test users  # fails!
./manage.py test users.test  # works!

In both cases it finds the same tests. So it would either fail 10 times or work 10 times. Hmmm...

The bridging between nose and Django is done by awesome django-nose developed here at Mozilla by Django extraordinaire Jeff Balogh and it's a non-trivial piece of code as it depends on some really smart importing tricks and stuff which I haven't even begun to understand.

However, after so many trial and errors I finally discovered that the solution (for me) was to delete the ~/.noserc file. What's strange is that all it contained was:


[nosetests]
with-doctest=1

I might never actually find out what went wrong. Ultimately I think a reason things went wrong was because it incorrectly populated sys.modules with excessive keys that would cause double imports of urls.py which in turn runs admin.autodiscover() but incorrectly does so twice.

Sorry for the rambling. And sorry for not actually finding the real bug. I did spent 2-3 days debugging this non-stop and hopefully some other poor frustrated person is going to see this and also look into the ~/.noserc for ways to fix it maybe.

EmailInput HTML5 friendly for Django

August 2, 2011
6 comments Django

Suppose you have a Django app with a login where people can only log in with their email address. Then use this widget on your login form:


## The input widget class
class EmailInput(forms.widgets.Input):
   input_type = 'email'

   def render(self, name, value, attrs=None):
       if attrs is None:
           attrs = {}
       attrs.update(dict(autocorrect='off',
                         autocapitalize='off',
                         spellcheck='false'))
       return super(EmailInput, self).render(name, value, attrs=attrs)

## Example usage
class AuthenticationForm(django.contrib.auth.forms.AuthenticationForm):
   """override the authentication form because we use the email address as the
   key to authentication."""
   # allows for using email to log in
   username = forms.CharField(label="Username", max_length=75,
                              widget=EmailInput())
   rememberme = forms.BooleanField(label="Remember me", required=False)

This input field does some cool stuff in the browser such as automatic validation in the browser as seen in this screenshot here.

More importantly it fixes a very annoying problem when surfing on a smartphone or a tablet like the iPad. As I'm about to type "someusername@mozilla.com" it first wants to start capitalized and which might fail the login. Also if the email address contains a word that it wants to correct like ("mozilla" -> "Mozilla") you have to click the little correct tooltip to tell the input is correct in verbatim.

Note to Djangonauts who want to use this and have a dual authentication backend that takes both usernames and email addresses, this form will make it impossible to log in as something called "admin" for example.