Fastest "boolean SQL queries" possible with Django

January 14, 2011
5 comments Django

For those familiar with the Django ORM they know how easy it is to work with and that you can do lots of nifty things with the result (QuerySet in Django lingo).

So I was working report that basically just needed to figure out if a particular product has been invoiced. Not for how much or when, just if it's included in an invoice or not.

Truncated! Read the rest by clicking the link below.

django-static version 1.5 automatically taking care of imported CSS

January 11, 2011
1 comment Django

I just released django-static 1.5 (github page) which takes care of optimizing imported CSS files.

To explain, suppose you have a file called foo.css and do this in your Django template:


{% load django_static %}
<link href="{% slimfile "/css/foo.css" %}"
  rel="stylesheet" type="text/css" />

And in foo.css you have the following:


@import "bar.css";
body {
   background-image: url(/images/foo.png);
}

And in bar.css you have this:


div.content {
   background-image: url("bar.png");
}

The outcome is the following:


# foo.css
@import "/css/bar.1257701299.css";
body{background-image:url(/images/foo.1257701686.png)}

# bar.css
div.content{background-image:url("/css/bar.1257701552.png")}

In other words not only does it parse your CSS content and gives images unique names you can set aggressive caching headers on, it will also unfold imported CSS files and optimize them too.

I think that's really useful. You with one single setting (settings.DJANGO_STATIC=True) you can get all your static resources massaged and prepare for the best possible HTTP optimization. Also, it's all automated so you never need to run any build scripts and the definition of what static resources to use (and how to optimize them) is all defined in the template. This I think makes a lot more sense than maintaining static resources in a config file.

The coverage is 93% and there is an example app to look at in the if you prefer that over a README.

RequireJS versus HeadJS

January 9, 2011
4 comments JavaScript

I've spent a lot of time trying to figure out which Javascript script loading framework to use. RequireJS or HeadJS. I still don't have an answer. Neither website refers to each other.

In general

  • To me, it's important to be able to load and execute some Javascript before downloading Javascript modules that aren't needed to render the initial screen. Makes for a more responsive behaviour and gets pixels drawn quicker for Javascript-heavy sites.
  • An understated, massive, benefit to combining multiple .js files into one is sporadic network bottlenecks. Fewer files to download and fewer things can go wrong. These bottlenecks can make a few Kb of a Javascript file take 10 seconds to download.
  • Public CDNs (e.g. jQuery from Google's CDN) is an extremely powerful optimization technique. Not only are they extremely fast, it's very likely they're preloaded because some other site uses the exact same URL.
  • Where does it say that Javascript has to be loaded in the head? Even html5-boilerplate loads Javascript just before the </body> tag.
  • Realistically, in the real world, it's not uncommon that you can't combine all .js files into one. This is not true for web apps that consists of just one HTML file. One page might require A.js, B.js and C.js but another page requires A.js, B.js and D.js. Requires manual thinking whether you should combine A,B,C,D.js or A,B.js + C|D.js. No framework can predict this.
  • All loading and browser incompatibility hacks will eventually become obsolete as browsers catch up. Again, requires manual thinking because supporting and ultra-boosting performance might have a different cost today compared to a year from now. The most guilty of this appears to be ControlJS
  • I'm confident that optimization in terms of file concatenation and white space optimization does not belong to the framework.
  • Apparently iPhone 3.x series can't cache individual files larger than 15Kb (25Kb for iPhone 4.x). That's a very small about if you combine several large modules.
  • Accepting the fact of life that sporadic network bottlenecks can kill your page, think hard about asynchronous loading and preserved order. Perhaps ideal is a mix of both. What framework allows that? (both RequireJS and HeadJS it seems)
  • Loading frameworks are not for everything and everyone. If you're building something "simple" or landing page like Google's search page frameworks might just get in your way.

RequireJS

  • Author well known but his Dojoesque style shines through in RequireJS's syntax and patterns.
  • Is only about Javascript. No CSS hacks or other html5ish boilerplates.
  • Gets into the realm of module definitions. Neat but do you want the loading framework to get involved in how you prefer to write your code or do you just want it to load your files?
  • All the module definition stuff feels excessive for every single project I can imagine but we're entering an era of "web apps" (as opposed to "web sites") so this might need to change.
  • What you learn in using RequireJS you can reuse when building NodeJS (a server-side framework). It's also possible to use RequireJS in Rhino (server-side Javascript engine) but personally I haven't reached that level yet.

HeadJS

  • Author relatively unknown. quite well known too. Author also of Flowplayer and jQuery Tools.
  • Contains a kitchen sink (CSS tricks, modernizer.js) but perhaps they're really quite useful. After all, you don't write your web site in Assembly.
  • There's a fork of HeadJS that does just the Javascript stuff. But will it be maintained? And does that defeat the whole point of using HeadJS?
  • With its CSS hacks (aka. kitchen sink) HeadJS seems great if you really care about combining HTML5 techniques with Internet Explorer.
  • This awesome experiment shows that HeadJS really works and that asynchronous loading can be really powerful. But ask yourself, are you ready to build in an asynchronous way?
  • With HeadJS I can label a combined and optimized bundle and load my code once that bundle is loaded. Can I do that with RequireJS? It seems to depend on the filename (minus the .js suffix).
  • Makes the assumption that just because a file is loaded the order of execution is a non-issue. This means you might have trouble controlling dependencies during execution. This is a grey area that might or might not matter depending on the complexity of your app.
  • A feeling I get is that HeadJS without the CSS kitchen sink stuff reduces to become LabJS or EnhanceJS.

Other alternatives

The ones I can think of are: ControlJS (feels too "hacky" for my taste), CommonJS (not sufficiently "in-browser specific" for my taste) and EnhanceJS (like HeadJS and LabJS but with less power/features)

The one I haven't studied as much is LabJS. It seems more similar to HeadJS in style. Perhaps it deserves more attention but the reason HeadJS got my attention is because it's got a better looking website.

In conclusion

You mileage will vary. The deeper I look into this I feel personal taste comes into play. It's hard enough for a single framework other to write realistic benchmarks; even harder for "evalutators" like myself to benchmark them all. It gets incrementally harder when you take into account the effects of http latency, sporadic network bottlenecks, browser garbage collection and user experience.

Personally I think HeadJS is a smoother transition for general web sites. RequireJS might be more appropriate when write web apps with virtually no HTML and a single URL.

With the risk of starting a war... If you're a Rails/Django/Plone head, consider HeadJS. If you're a mobile web app/NodeJS head consider RequireJS.

UPDATE

Sorry, I now realise that Tero Piirainen actually has built a fair amount of powerful Javascript libraries.

ToDo apps I gave up on in 2010

January 3, 2011
4 comments Wondering

First I tried Things for the iPhone which I tried because some people I work with said it was good. It lasted about a week. I think it failed, for me, because I didn't feel how time slowly wipes away old stuff that isn't relevant any more. My todo lists are usually about work projects which mainly means writing code and sending emails to people on the project. Things being on the iPhone meant I had to take my hands off the computer.

The second one I tried is the app with perhaps the most brilliant UI I've seen in years: TeuxDeux There's only three things you can do, enter events and mark them as done. I tried the iPhone version but even though it works well it wasn't as neat as the web version. Eventually I gave up because I think I couldn't keep up with moving past day events forward to today's date. That meant that new events entered "today" sort of got higher priority than old ones and that just felt wrong in the long run.

The third one wasn't really a todo list but that's how I ended up using it: Workflowy Again, an absolutely brilliant UI and technical achievement. I had it as an open tab for about three weeks until I ended up not bothering any more. I love writing bullet point lists to the n'th degree but I felt that every time I came back to it I had to "search" for where I was and had to make a tonne of micro-decisions about where to put stuff. When I had a thought in my head I didn't want to first think and plan where to put it.

What did work?

It's far from applicable to everyone but one thing that has worked (has for many years in fact) was our work issue tracker. We use IssueTrackerProduct, written by yours truely. It's not really fair because when you add the fact that multiple people are using the same tool the personal choices don't really matter. Also, I think project issue trackers like this have the added bonus that you don't clutter them with small basic things like "Check database log X".

The perhaps most successful todo list for me in 2010 was keeping a TODO.txt file in my project source directory. This is a personal file I rarely check in to git because my colleagues don't need to see mine (well, sometimes that's useful too). It's a simple text file and it looks something like this:


* (MEDIUM) render the shared classes in calendar.html on page load 

* (HIGH) Find out why all CSS is lost when an event is added

* (LOW) Experiment with http://vis.stanford.edu/protovis/ to write
 some nice stats

...

I guess it works because it's my own invention. From scratch. Generally todo list apps work best if you wrote the app yourself. It's immediately in context because each code project gets its own file. Its order is implied usually by writing to the top of the file but you can be a bit cowboy about it and just jot things down without doing it "the correct way".

ssl_session_cache in Nginx and the ab benchmark

December 31, 2010
2 comments DoneCal, Linux

A couple of days ago I wrote about how blazing fast the DoneCal API can be on HTTP (1,400 requests/second) and how much slower it becomes when doing the same benchmark over HTTPS. It was, as Chris Adams pointed out, possible to run ab with Keep-Alive on and after some reading up it's clear that it's a good idea to switch on shared ssl_session_cache so that Nginx's SSL TCP traffic can cache some handshakes.

With ssl_session_cache shared:SSL:10m :


 Requests per second:    112.14 [#/sec] (mean)

Same cache size but with -k on the ab loadtest:


Requests per second:    906.44 [#/sec] (mean)

I'm fairly sure that most browsers with use Keep-Alive connections so I guess it's realistic to use -k when running ab but since this is a test of an API it's perhaps more likely than not that clients (i.e. computer programs) don't use it. To be honest I'm not really sure but it never the less feels right to be able to use ssl_session_cache to boost my benchmark by 40%.

It's also worth noticing that when doing a HTTP benchmark it's CPU bound on the Tornado (Python) processes (I use 4). But when doing HTTPS it's CPU bound on the Nginx itself (I use 1 worker process).

Speed of DoneCal API (over 1,400 request/sec) and HTTPS (less than 100 request/sec)

December 27, 2010
4 comments DoneCal

DoneCal (my simple calendar and time sheet substitute web app) now has HTTPS support. It's not enabled yet as I'm ironing out some more testing. Basically, HTTPS is, at least at the moment, only going be available to premium users. Anyway, this is a performance story and about the difference in speed between HTTP and HTTPS.

I'll let these unscientific benchmarks speak for themselves.

HTTP:


donecal:~# ab -n 1000 -c 10 "http://donecal.com/api/events.json?guid=xxx&amp;start=1292999600&amp;end=1293294812"
...
Document Length:        616 bytes
Failed requests:        0
...
Requests per second:    1432.40 [#/sec] (mean)
...
Transfer rate:          1184.81 [Kbytes/sec] received

HTTPS:


..
Server Port:            443
SSL/TLS Protocol:       TLSv1/SSLv3,DHE-RSA-AES256-SHA,2048,256

...
Document Length:        616 bytes
Failed requests:        0
...
Requests per second:    84.73 [#/sec] (mean)
...
Transfer rate:          70.08 [Kbytes/sec] received

That's quite a huge difference in requests per second. HTTPS 17 times slower than HTTP. Is this the reality of HTTPS? Or something wrong with my cert or something wrong with running HTTPS through ab?

Anyway, this pretty good me thinks anyway. The HTTP version is over 1,400 requests per second and this is a fully database, security and encoding involving application. This particular test data (616 bytes JSON) isn't big but it sure is bigger than some of the "'hello world'" benchmarks you see on the interweb.

UPDATE

See this new entry about enabling ssl_session_cache in Nginx

To code or to pdb in Python

December 20, 2010
6 comments Python

To code or to pdb in Python This feels like a bit of a face-plant moment but I've never understood why anyone would use the code module when you can use the pdb when the pdb is like the code module but with less.

What you use it for is to create you own custom shell. Django does this nicely with it's shell management command. I often find myself doing this:


$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from this import that
>>> from module import something
>>> from db import connection
>>> con = connection('0.0.0.0', bla=True)
>>> con.insert(something(that))

And there's certain things I almost always import (depending on the project). So use code to write your own little custom shell loader that imports all the stuff you need. Here's one I wrote real quick. Ultra useful time-saver:


#!/usr/bin/env python
import code, re
if __name__ == '__main__':
   from apps.main.models import *
   from mongokit import Connection
   from pymongo.objectid import InvalidId, ObjectId
   con = Connection()
   db = con.worklog
   print "AVAILABLE:"
   print '\n'.join(['\t%s'%x for x in locals().keys()
                    if re.findall('[A-Z]\w+|db|con', x)])
   print "Database available as 'db'"
   code.interact(local=locals())

This is working really well for me and saving me lots of time. Hopefully someone else finds it useful.

Page here about DoneCal

December 19, 2010
0 comments DoneCal

I've finally had the time to write a little bit about my latest web app project: DoneCal.

Hopefully the banner at the top of this page will yield a bit of traffic. Since my marketing budget for DoneCal is exactly $0.00 I'm going to try to go for nice organic SEO and just generally build a great app so that people don't have to link to it but that they want to link to it.

Let me know if the text makes sense or if it makes you feel confused about what DoneCal is.

DoneCal gets a grade A (92)

November 27, 2010
3 comments DoneCal

DoneCal gets a grade A (92) All the hard work I've put into DoneCal pre-optimization has paid off: Got a Grade A with 92 percent on YSlow!

What's cool about this is that unlike other sites I've built with high YSlow score this site is very Javascript intense and rendering the home page depends on 9 different Javascript files weighing over 300 Kb which when combined and packed for production use is reduced down to 5 requests and weighing in just over 80Kb. The reason it's still 5 and not just 1 is also important. This is deliberate since it only loads the minimum first to render the calendar and then after the DOM is fully rendered more Javascript is pulled in depending on what's needed.

One annoying thing about YSlow is that it suggests that you use CDNs for Javascript and CSS files. What they perhaps don't appreciate is that most CDNs don't support negotiated gzipping like Nginx does. The ability to gzip is a CSS or Javascript file generally means less waiting for the client than getting it un-gzipped from a CDN. One thing I will work on though is perhaps serving all the images that support the CSS from my Amazon Cloudfront CDN. Gzipping is not applicable to images.

Gmail tip: Searching only for attachments

November 25, 2010
0 comments

Gmail tip: Searching only for attachments Because I've seen people many times searching in Gmail when what they're looking for is an attachment. Often you search for something like "ProjectX" and find a huge thread full of emails without that one document attachment you're looking for.

Add to your normal search:


has:attachment

and it will only find emails with an attachment. See example screenshot on the right.