Filtered by Django

Page 7

Reset

A taste of the Django on inside Mozilla, Sheriffs Duty

July 22, 2011
0 comments Django

A taste of the Django on inside Mozilla, Sheriffs Duty One of the many great things about working for Mozilla is that everything we do is Open Source. Even our wiki is open (however we have an internal wiki for corporation boring stuff such as meeting rooms, HR etc.)

Last week I wrote an internal application for Mozilla's build engineers. Essentially it's a roster that lists one user per day and it's helped by being visualized as a calendar and as a vCal export. It's very unlikely that anybody outside Mozilla will find this particularly useful. But who knows, perhaps other companies have needs to take turns to sheriff build machines.

Anyway, the project was easy to write because we have something called Playdoh. It's a set of nifty and useful settings and a folder structure and it comes with a submodule called "playdoh-lib" which is stuffed with lots of useful packages that you'll most likely want to use. If you browse Playdoh on Github it might look like a lot of stuff but after a second look you'll see that there's actually almost no code. So don't you dare to play the "bloat card"! :)

What this app uses is TastyPie for the REST API which was awesome by the way.

For the authentication I used django-auth-ldap and some custom classes because at Mozilla we use email addresses instead of usernames.

To make the vCal export I use VObject which was easy to work with but has some usual syntax in places.

Jinja was used for the template rendering and it meant I had to do some tricks to use the django.contrib.auth.views.login view but with my templates. Might be worth looking into if people are interested.

The code has 98% test coverage but I had to upgrade to the latest nose to be able to run test coverage on app modules that have similar names to modules in the standard lib.

Test static resources in Django tests

June 2, 2011
3 comments Django

At Mozilla we use jingo-minify to bundle static resources such as .js and .css files. It's not a perfect solution but it's got some great benefits. One of them is that you need to know exactly which static resources you need in a template and because things are bundled you don't need to care too much about what files it originally consisted of. For example "jquery-1.6.2.js" + "common.js" + "jquery.cookies.js" can become "bundles/core.js"

A drawback of this is if you forget to compress and prepare all assets (using the compress_assets management command in jingo-minify) is that you break your site with missing static resources. So how to test for this?

Truncated! Read the rest by clicking the link below.

Optimization of getting random rows out of a PostgreSQL in Django

February 23, 2011
48 comments Django

There was a really interesting discussion on the django-users mailing list about how to best select random elements out of a SQL database the most efficient way. I knew using a regular RANDOM() in SQL can be very slow on big tables but I didn't know by how much. Had to run a quick test!

Cal Leeming discussed a snippet of his to do with pagination huge tables which uses the MAX(id) aggregate function.

So, I did a little experiment on a table with 84,000 rows in it. Realistic enough to matter even though it's less than millions. So, how long would it take to select 10 random items, 10 times? Benchmark code looks like this:


TIMES = 10
def using_normal_random(model):
   for i in range(TIMES):
       yield model.objects.all().order_by('?')[0].pk

t0 = time()
for i in range(TIMES):
   list(using_normal_random(SomeLargishModel))
t1 = time()
print t1-t0, "seconds"

Result:


41.8955321312 seconds

Nasty!! Also running this you'll notice postgres spiking your CPU like crazy.

A much better approach is to use Python's random.randint(1, <max ID>). Looks like this:


 from django.db.models import Max
 from random import randint
 def using_max(model):
   max_ = model.objects.aggregate(Max('id'))['id__max']
   i = 0
   while i < TIMES:
       try:
           yield model.objects.get(pk=randint(1, max_)).pk
           i += 1
       except model.DoesNotExist:
           pass

t0 = time()
for i in range(TIMES):
   list(using_max(SomeLargishModel))
t1 = time()
print t1-t0, "seconds"

Result:


0.63835811615 seconds

Much more pleasant!

UPDATE

Commentator, Ken Swift, asked what if your requirement is to select 100 random items instead of just 10. Won't those 101 database queries be more costly than just 1 query with a RANDOM(). Answer turns out to be no.

I changed the script to select 100 random items 1 time (instead of 10 items 10 times) and the times were the same:


using_normal_random() took 41.4467599392 seconds
using_max() took 0.6027739048 seconds

And what about 1000 items 1 time:


using_normal_random() took 204.685141802 seconds
using_max() took 2.49527382851 seconds

UPDATE 2

The algorithm for returning a generator has a couple of flaws:

  1. Can't pass in a QuerySet
  2. You get primary keys returned, not ORM instances
  3. You can't pass in a number
  4. Internally, it might randomly select a number already tried

Here's a much more complete function:


 def random_queryset_elements(qs, number):
    assert number <= 10000, 'too large'
    max_pk = qs.aggregate(Max('pk'))['pk__max']
    min_pk = qs.aggregate(Min('pk'))['pk__min']
    ids = set()
    while len(ids) < number:
        next_pk = random.randint(min_pk, max_pk)
        while next_pk in ids:
            next_pk = random.randint(min_pk, max_pk)
        try:
            found = qs.get(pk=next_pk)
            ids.add(found.pk)
            yield found
        except qs.model.DoesNotExist:
            pass

Nice testimonial about django-static

February 21, 2011
0 comments Django

My friend Chris is a Django newbie who has managed to build a whole e-shop site in Django. It will launch on a couple of days and when it launches I will blog about it here too. He sent me this today which gave me a smile:

"I spent today setting up django_static for the site, and optimising it for performance. If there's one thing I've learned from you, it's optimisation.

So, my homepage is now under 100KB (was 330KB), and it loads in @5-6 seconds from hard refresh (was 13-14 seconds at its worst). And I just got a 92 score on Yslow. I do believe I have the fastest tea website around now, and I still haven't installed caching.

Wicked huh?"

He's talking about using django-static. Then I get another email shortly after with this:

"correction - I get 97 on YSlow if I use a VPN.

I just found that the Great Firewall tags extra HTTP requests onto every request I make from my browser, pinging a server in Shanghai with a PHP script which probably checks the page for its content or if its on some kind of blocked list. Cheeky buggers!"

It's that interesting! (Note: Chris is based in China but hosts the test site in the UK)

Fastest "boolean SQL queries" possible with Django

January 14, 2011
5 comments Django

For those familiar with the Django ORM they know how easy it is to work with and that you can do lots of nifty things with the result (QuerySet in Django lingo).

So I was working report that basically just needed to figure out if a particular product has been invoiced. Not for how much or when, just if it's included in an invoice or not.

Truncated! Read the rest by clicking the link below.

django-static version 1.5 automatically taking care of imported CSS

January 11, 2011
1 comment Django

I just released django-static 1.5 (github page) which takes care of optimizing imported CSS files.

To explain, suppose you have a file called foo.css and do this in your Django template:


{% load django_static %}
<link href="{% slimfile "/css/foo.css" %}"
  rel="stylesheet" type="text/css" />

And in foo.css you have the following:


@import "bar.css";
body {
   background-image: url(/images/foo.png);
}

And in bar.css you have this:


div.content {
   background-image: url("bar.png");
}

The outcome is the following:


# foo.css
@import "/css/bar.1257701299.css";
body{background-image:url(/images/foo.1257701686.png)}

# bar.css
div.content{background-image:url("/css/bar.1257701552.png")}

In other words not only does it parse your CSS content and gives images unique names you can set aggressive caching headers on, it will also unfold imported CSS files and optimize them too.

I think that's really useful. You with one single setting (settings.DJANGO_STATIC=True) you can get all your static resources massaged and prepare for the best possible HTTP optimization. Also, it's all automated so you never need to run any build scripts and the definition of what static resources to use (and how to optimize them) is all defined in the template. This I think makes a lot more sense than maintaining static resources in a config file.

The coverage is 93% and there is an example app to look at in the if you prefer that over a README.

In Django, how much faster is it to aggregate?

October 27, 2010
5 comments Django

Being able to do aggregate functions with Django's QuerySet API is really useful. Not because it's difficult to write your own loop but because the summation is then done inside the SQL database. I had this piece of code:


t = Decimal('0')
for each in some_queryset:
   t += each.cost

Which can be rewritten like this instead:


t = qs.aggregate(Sum('cost'))['cost__sum']

For my 6,000+ records in the database the first one takes about 0.7 seconds. The aggregate takes 0.02 seconds. Blimey! That's over 30 fold difference in speed for practically the same thing.

Granted, when doing the loop you can do some other stuff such as counting or additional function calls but that difference is quite significant. In my current application those 0.7 seconds isn't really a problem but it quickly becomes when it has to be done over and over for multiple sets.

Local Django development with Nginx

October 11, 2010
15 comments Django

When doing local Django development with runserver you end up doing some changes, then refreshing in Firefox/Chrome/Safari again and again. Doing this means that all your static resources are probably served via Django. Presumably via django.views.static.serve, right? What's wrong with that? Not much, but we can do better.

So, you serve it via Nginx and let Nginx take care of all static resources. You'll still use Django's own runserver so no need for mod_wsgi, gunicorn or uWSGI. This requires that you have Nginx installed and running on your local development environment. First you need to decide on a fake domain name. For example mylittlepony. Edit your /etc/hosts file by adding this line:


127.0.1.1       mylittlepony

Truncated! Read the rest by clicking the link below.

Musings about django.contrib.auth.models.User

August 28, 2010
6 comments Python, Django

Dawned on me that the Django auth user model that ships with Django is like the string built-in of a high level programming language. With the string built-in it's oh so tempting to add custom functionality to it like a fancy captialization method or some other function that automatically strips whitespace or what not. Yes, I'm looking at you Prototype for example.

By NOT doing that, and leaving it as it is, you automatically manage to Keep It Simple Stupid and your application code makes sense to the next developer who joins your project.

I'm not a smart programmer but I'm a smart developer in that I'm good at keeping things pure and simple. It means I can't show off any fancy generators, monads or metaclasses but it does mean that fellow coders who follow my steps can more quickly hit the ground running.

My colleagues and I now have more than ten Django projects that rely on, without overriding, the django.contrib.auth.models.User class and there has been many times where I've been tempted to use it as a base class or something instead but in retrospect I'm wholeheartedly happy I didn't. The benefit isn't technical; it's a matter of teamwork and holistic productivity.

Hosting Django static images with Amazon Cloudfront (CDN) using django-static

July 9, 2010
4 comments Django

About a month ago I add a new feature to django-static that makes it possible to define a function that all files of django-static goes through.

First of all a quick recap. django-static is a Django plugin that you use from your templates to reference static media. django-static takes care of giving the file the optimum name for static serving and if applicable compresses the file by trimming all whitespace and what not. For more info, see The awesomest way possible to serve your static stuff in Django with Nginx

The new, popular, kid on the block for CDN (Content Delivery Network) is Amazon Cloudfront. It's a service sitting on top of the already proven Amazon S3 service which is a cloud file storage solution. What a CDN does is that it registers a domain for your resources such that with some DNS tricks, users of this resource URL download it from the geographically nearest server. So if you live in Sweden you might download myholiday.jpg from a server in Frankfurk and if you live in North Carolina, USA you might download the very same picture from Virgina, USA. That assures the that the distance to the resource is minimized. If you're not convinced or sure about how CDNs work check out THE best practice guide for faster webpages by Steve Sounders (it's number two)

A disadvantage with Amazon Cloudfront is that it's unable to negotiate with the client to compress downlodable resources with GZIP. GZIPping a resource is considered a bigger optimization win than using CDN. So, I continue to serve my static CSS and Javascript files from my Nginx but put all the images on Amazon Cloudfront. How to do this with django-static? Easy: add this to your settings:


DJANGO_STATIC = True
...other DJANGO_STATIC_... settings...
# equivalent of 'from cloudfront import file_proxy' in this PYTHONPATH
DJANGO_STATIC_FILE_PROXY = 'cloudfront.file_proxy'

Then you need to write that function that get's a chance to do something with every static resource that django-static prepares. Here's a naive first version:


# in cloudfront.py

conversion_map = {} # global variable
def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
    if filepath and (new or changed):
        if filepath.lower().split('.')[-1] in ('jpg','gif','png'):
            conversion_map[uri] = _upload_to_cloudfront(filepath)
    return conversion_map.get(uri, uri)

Truncated! Read the rest by clicking the link below.