UPPER vs. ILIKE

April 19, 2010
4 comments Web development

I have a Zope web app that uses hand coded SQL (PostgreSQL) statements. Similar to the old PHP. My only excuse for not using an ORM was that this project started in 2005 and at the time SQLAlchemy seemed like a nerdy mess with more undocumented quirks than it was worth losing hair over.

Anyway, several statements use ILIKE to get around the problem of making things case insensitive. Something like the Django ORM uses UPPER to get around it. So I wonder how much the ILIKE statement slows down compared to UPPER and the indexed equal operator. Obviously, neither ILIKE or UPPER can use an index.

Long story short, here are the numbers for selecting on about 10,000 index records:


# ONE
EXPLAIN ANALYZE SELECT ... FROM ... WHERE name = 'Abbaye';
Average: 0.14 milliseconds

# TWO
EXPLAIN ANALYZE SELECT ... FROM ... WHERE  UPPER(name::text) = UPPER('Abbaye');
Average: 18.89 milliseconds

# THREE
EXPLAIN ANALYZE SELECT ... FROM ... WHERE  name ILIKE 'Abbaye';
Average: 24.63 milliseconds

UPPER vs. ILIKE

First of all, the conclusion is to use UPPER instead of ILIKE if you don't need to do regular expressions. Secondly, if at all possible try to use the equal operator first and foremost and only reside on the case insensitive one if you really need to.

Lastly, in PostgreSQL is there a quick and easy to use drop in alternative to make an equality operatorish thing for varchars that are case insensitive?

UPDATE 2014

There is an answer to the question just above: citext

Who was logged in during a Django exception

April 15, 2010
5 comments Django

In lack of a fancier solution here's how I solved a problem of knowing who was logged in when an error occurred. I'm building a Intranet like system for a close group of people and if an error occurs I get an email that reminds me to add more tests. So I fix the bugs and upgrade the server. But I often want to know what poor sucker was logged in at the time the exception happened so that I can email them and say something like "Hi! I noticed your stumbled across a bug. My bad. Just wanted to let you know I've fixed that now"

So to do this I installed a silly little piece of middleware:


from django.conf import settings
class ExceptionExtraMiddleware(object):
   def process_exception(self, request, exception):
       if settings.DEBUG:
           return
       try:
           logged_in_info = ''
           if request.user and request.user.is_authenticated():
               logged_in_info = "%s" % request.user
               if request.user.email:
                   logged_in_info += ' %s' % request.user.email
               if request.user.first_name or request.user.last_name:
                   logged_in_info += ' (%s %s)' % \
                     (request.user.first_name, request.user.last_name)
           if logged_in_info:
               request.META['ERROR-X-LOGGED-IN'] = logged_in_info
       except:
           # don't make matters worse in these sensitive times
           logging.debug("Unable to debug who was logged in", exc_info=True)

This means that when I get an email with the traceback and snapshot of the request object I get this included:


...
'ERROR-X-LOGGED-IN': u'anita (Anita Test)',
...

UPDATE

The code above had a bug in it. Doing an if on request.user will return true even if there is no logged in user. The safest thing is to change it to:


if request.user and request.user.is_authenticated():

fcgi vs. gunicorn vs. uWSGI

April 9, 2010
29 comments Python, Django, Linux

uwsgi is the latest and greatest WSGI server and promising to be the fastest possible way to run Nginx + Django. Proof here But! Is it that simple? Especially if you're involving Django herself.

So I set out to benchmark good old threaded fcgi and gunicorn and then with a source compiled nginx with the uwsgi module baked in I also benchmarked uwsgi. The first mistake I did was testing a Django view that was using sessions and other crap. I profiled the view to make sure it wouldn't be the bottleneck as it appeared to take only 0.02 seconds each. However, with fcgi, gunicorn and uwsgi I kept being stuck on about 50 requests per second. Why? 1/0.02 = 50.0!!! Clearly the slowness of the Django view was thee bottleneck (for the curious, what took all of 0.02 was the need to create new session keys and putting them into the database).

So I wrote a really dumb Django view with no sessions middleware enabled. Now we're getting some interesting numbers:


fcgi (threaded)              640 r/s
fcgi (prefork 4 processors)  240 r/s (*)
gunicorn (2 workers)         1100 r/s
gunicorn (5 workers)         1300 r/s
gunicorn (10 workers)        1200 r/s (?!?)
uwsgi (2 workers)            1800 r/s
uwsgi (5 workers)            2100 r/s
uwsgi (10 workers)           2300 r/s

(* this made my computer exceptionally sluggish as CPU when through the roof)

Truncated! Read the rest by clicking the link below.

Cycling across England on Orange Snapshot

April 4, 2010
0 comments Photos

Cycling across England on Orange Snapshot I don't know who these guys are but I found their pictures on Orange Snapshot (Orange customers Twitter thing for posting pictures) and it appears that they're cycling across the country and documenting this by taking pictures and sticking them up on Twitter. I actually often check out Snapshot just to get a feeling what's going on in the country at the moment. It was great when the snow hit us.

Why is this important? Well, in these modern times it means that it's very easy to make a traveling photo documentary without a film crew or some obscure travel blog which forces you to write 1,000 words for every city you visit. Good work on you lads! Keep it up!

The awesomest way possible to serve your static stuff in Django with Nginx

March 24, 2010
19 comments Django

I'm the proud creator of django-static which is a Django app that takes care of how you serve your static media the best way possible. Although some of these things are subjective generally this is the ideal checklist of servicing your static media:

  1. Cache headers must be set to infinity
  2. URLs must be unique so that browsers never have to depend on refreshing
  3. The developer (who decided which media to include) should not have to worry himself with deployment
  4. The developer/artist (who makes the media) should not have to worry himself with deployment
  5. All Javascript and CSS must be whitespace optimized in a safe way and served with Gzip
  6. All images referenced inside CSS should be taken care of too
  7. It must be possible to combine multiple resources of Javascript or CSS into one
  8. It must be possible to easily test production deployment in development environment without too much effort
  9. A sysadmin shouldn't have to understand a developers Django application
  10. A development environment must be unhindered by this optimization
  11. Processing overhead of must be kept to a minimum
  12. Must be possible to easily say which resources can be whitespace optimized and which can not

Truncated! Read the rest by clicking the link below.

Speed test between django_mongokit and postgresql_psycopg2

March 9, 2010
15 comments Python, Django

Following on from yesterday's blog about How and why to use django-mongokit I extended the exampleproject which is inside the django-mongokit project with another app called exampleapp_sql which does the same thing as the exampleapp but does it with SQL instead. Then I added a very simple benchmarker app in the same project and wrote three functions:

  1. One to create 10/100/500/1000 instances of my class
  2. One to edit one field of all 10/100/500/1000 instances
  3. One to delete each of the 10/100/500/1000 instances

Truncated! Read the rest by clicking the link below.

How and why to use django-mongokit (aka. Django to MongoDB)

March 8, 2010
11 comments Python, Django

How and why to use django-mongokit Here I'm going to explain how to combine Django and MongoDB using MongoKit and django-mongokit.

MongoDB is a document store built for high speed and high concurrency with a very good redundancy story. It's an alternative to relational databases (e.g. MySQL) that is what Django is tightly coupled with in it's ORM (Object Relation Mapping) and what it's called now is ODM (Object Document Mapping) in lack of a better acronym. That's where MongoKit comes in. It's written in Python and it connects to the MongoDB database using a library called pymongo and it turns data from the MongoDB and turns it into instances of classes you have defined. MongoKit has nothing to do with Django. That's where django-mongokit comes in. Written by yours truly.

Truncated! Read the rest by clicking the link below.

Importance of public URLs and how enterprisecarsales.com gets it wrong

March 5, 2010
0 comments Web development

Importance of public URLs and how enterprisecarsales.com f's it up A friend of mine found a nice car she on www.enterprisecarsales.com so she copied the current URL from the address bar and emailed that to me. The URL was: http://www.enterprisecarsales.com/carsales/vehicleDetails.do?carIndex=1&vin=1GCHC24U36E112452 which by the look of it (notice the /vehicleDetails.do part of the URL) takes you to a page that says "Sorry, we are unable to complete your page request since the page you are trying to access no longer is available." How stupid is that? Come on, web developers made those kind of mistakes in 2001. Not 2010.

This means that people can't talk to each other about found matches on the site and this is something people want to do. Especially if you're going to spend $thousands on a car.

Come on Enterprise web team: install Django or something and give users what they want not your excuses.