Bash tip of the day: ff

March 25, 2011
2 comments Linux

This is helping me sooo much that it would a crime not to share it. It's actually nothing fancy, just a very convenient thing that I've learned to get used to. ff is an executable script I use to find files in a git repository. Goes like this:


$ ff list
templates/operations/network-packing-list.html
templates/sales/list_orders.html
$ ff venue
templates/venues/venues-by-special.html
templates/venues/venues.html
templatetags/venue_extras.py
templatetags/venues_by_network_extras.py
tests/test_venues.py

It makes it easy to super quickly search for added files without having to use the slow find command which would also otherwise find backup files and other junk that isn't checked in.

To install it, create a file called ~/bin/ff and make it executable:


$ chmod +x ~/bin/ff

Then type this code in:


#!/usr/bin/python
import sys, os
args = sys.argv[1:]
i = False
if '-i' in args:
   i = True
   args.remove('-i')
pattern = args[-1]
extra_args = ''
if len(args) > 1:
   extra_args = ' '.join(args[:-1])
param = i and "-i" or ""
cmd = "git ls-files | grep %s %s '%s'" % (param, extra_args, pattern)
os.system(cmd)

My AWS CloudFront bill

March 23, 2011
1 comment This site

I've put all the static resources behind this site now on AWS CloudFront For example this: http://static.peterbe.com/misc_/Peterbecom/home/grey_face.1282513695.png

This site doesn't really have much traffic. About 50,000 pageviews per month. The bill for the last month: $0.55

I think I can afford that :)

MongoUK 2011 - London conference all about MongoDB

March 21, 2011
2 comments MongoDB

MongoUK 2011 - London conference all about MongoDB This is a summary about the MongoUK conference held today here in London. It was great! Unlike other commercial conferences this one actually cost money but it was only about $50 and although there were free coffee cups, stickers and stuff this didn't feel at all like they were trying to sell themselves. The focus was on the technology. Great!

As often with these things, you realise that you have actually grasped a couple of things quite well now but it's also humbling in that there is so much you haven't grasped that other people are way ahead of you in.

What I learned

Truncated! Read the rest by clicking the link below.

QUnit testing my jQuery Mobile site in full swing

March 17, 2011
1 comment JavaScript

QUnit testing my jQuery Mobile site in full swing Yay! I've figured out how to properly unit tests my jQuery Mobile app that I'm working on. This app uses localStorage, localSession, lots of AJAX and works both offline and online. Through various hacks and tricks I've managed to mock things so that I can easily test the otherwise asynchronous AJAX calls and I can also test the little quirks of jQuery Mobile such as re-rendering <ul> tags after having changed the DOM tree.

I'm using the QUnit testing framework and I like it. The app isn't launched yet and the code is currently protected but once I get it nailed a bit more I'll blog about it more fully so other people can jump in unit testing their Javascript heavy jQuery Mobile sites too. For now I'm up to 75 tests and it's growing steadily.

Here's a little taster for how I mock the $.mobile, AJAX and the localStorage stuff:


module("Storage and AJAX", {
  setup: function() {
     localStorage.clear();
  },
  teardown: function() {
     localStorage.clear();
  }
});

var MockMobile = function() {
  this.current_page;
};
MockMobile.prototype.changePage = function(location) {
  this.current_page = location;
};

test("test Auth", function() {
  $.mobile = new MockMobile();
  var _last_alert;
  alert = function(msg) {
     _last_alert = msg;
  };
  $.ajax = function(options) {
     switch (options.url) {
      case '/smartphone/checkguid/':
        options.success({ok:true});
        break;
      case '/smartphone/auth/login/':
        if (options.data.password == 'secret')
          options.success({guid:'10001'});
        else
          options.success({error:"Wrong credentials"});
        break;
      default:
        console.log(options.url);
        throw new Error("Mock not prepared (" + options.url + ")");
     }
  };
  var result = Auth.is_logged_in(false);
  equals(result, false);

  Auth.ajax_login('peterbe@example.com', 'other junk');
  ok(_last_alert);
  ...

(Note: this is copied slightly out of context but hopefully it reveals some of the hacks and tricks I use)

More productive than Lisp? Really??!

March 10, 2011
0 comments Python

Erann Gat reveals why he lost his mojo with Lisp

What caught my attention (for busy people who don't want to read the whole email):

"So I can't really go into many specifics about what happened at Google because of confidentiality, but the upshot was this: I saw, pretty much for the first time in my life, people being as productive and more in other languages as I was in Lisp. What's more, once I got knocked off my high horse (they had to knock me more than once -- if anyone from Google is reading this, I'm sorry) and actually bothered to really study some of these other languges I found myself suddenly becoming more productive in other languages than I was in Lisp. For example, my language of choice for doing Web development now is Python."

I'm currently studying Lisp myself and it's hard. Really hard. I blame it on being spoiled with a programming language that I can work in without having to read the manual. With python's brilliant introspection I can use the interpreter to find out how a library works just by using help() and dir() without even having to read the source code. (not always true of course)

As we're entering the 21st century, the new contender "Usability" is becoming more and more important. Considering that I've now done Python for more than a decade I remind myself one of the reasons I liked it so much; yes, exactly that: Usability.

To all Zope developers: Does this sound familiar?

March 8, 2011
2 comments Zope

I was reading this article about linkfluence moving from CouchDB to Riak

"Why we move away from CouchDB

We were already aware of Riak before we started using CouchDB, but we weren’t sure about trusting a new product at this point, so we decided, after some benchmark, to go for CouchDB.

After the first couple of months, it was obvious that this was a bad choice.

Our main problems with CouchDB is scalability, versionning and stability.

Once we store a document in CouchDB, we modify it at least twice after the original write. Each modification generates a new version of the document. This feature is nice for some use-cases, but we don’t need it, and there’s no way to disable it, so the size of our databases started to become really important. You’ll probably tell me “hey, you know you can compact your database ?”, and I’ll answer “sure”. The trouble is that we never managed to get it to compact an entire database without crashing (well, to be honest, with the last version of CouchDB we finally managed to compact one database).

The second issue is that one database == one file. When you have multiple small databases, this is fine. When you a have only a few databases, and some grow to more than 1TB, the problems keep growing too (and it’s a real pain to backup).

We also had a lot of random crashes with CouchDB, even if the last version was quite stable."

Does that sound familiar, fellow Zope developer? I know a lot about ZODB but little about CouchDB. One thing that a lot of people don't know about ZODB is that it's very fast and I think this is true about CouchDB too. Speed isn't the same as a raw speed of inserts/queries because with the concurrency variable added the story gets a lot more complex.

It's the exact same perspectives I've always had on ZODB:

1) It's really convenient and powerful

2) It being a single HUGE file makes it hard to scale

3) Versioning can be nifty but it's often not needed and causes headache with the packing

4) It works great but when it cracks it cracks hard and cryptically

I just ordered tea from the Min River Tea Farm

February 27, 2011
0 comments Misc. links

I just ordered tea from the Min River Tea Farm I just ordered myself one bag of Jasmine Pearls from The Min River Tea Farm that my friend Chris has recently launched.

As soon as I get my tea I'm going to take a picture of myself drinking it and send in my pic so that £1 gets donated, by Chris, to the Mind UK charity.

If you live in the UK and love genuine sourced Chinese teas do check it out. The ordering process is lovingly easy and safe. Although I'm an Earl Grey fan myself, I love having some good jasmine tea available at home without having to worry about the caffeine in Earl Grey tea keeping me awake.

Best of luck to Chris and his new site! Please take the time to browse and read about his teas. If you're outside the UK and you want a bag, just send him and email.

Eloquent Javascript by Marijn Haverbeke

February 25, 2011
0 comments JavaScript

Eloquent Javascript by Marijn Haverbeke What a lovely title for a book! I wanted to read a book about the proper way to write Javascript but I couldn't wait any longer for John Resig's Secrets of the JavaScript Ninja which isn't out in print yet (also a great title by the way).

Eloquent Javascript begins very lightly with the basics of Javascript programming. Variables, scope, data structures and control flow. To be perfectly honest I didn't read it very carefully but I believe I did pick up a thing or two at least. The chapter on error handling was useful but the really interesting chapters were "Functional Programming" and "Object-Oriented Programming". What I love about Marijn's style of writing is that he starts very very simple and builds up the code to be better and better. Not based on what you can do but instead why you should do it. As you go along you can then immediately snap up what the benefits are for yourself. Sometimes it's brevity and sometimes it's for faster performance. My only criticism if I'm allowed is that the jargon is quite a lot keep up with. Especially around "constructors" and "prototypes" which is sometimes easy to forget (especially if you're from another language where these things mean different things).

I'm not great at it but I already knew how to write modules and "classes" so ultimately there wasn't a whole lot to take away from it to be honest. Some tricks such as the inheritance function which Marijn introduced was neat and that might be something I'll copy. Nevertheless, this book showed and educated me in why we do things as modules and stuff which I genuinely appreciated.

Thanks for a great book Marijn! Keep up the good work!

Connecting with psycopg2 without a username and password

February 24, 2011
12 comments Python

My colleague Lukas and I banged our heads against this for much too long today. So, our SQLAlchemy is was configured like this:


ENV_DB_CONNECTION_DSN = postgresql://localhost:5432/mydatabase

And the database doesn't have a password (local) so I can log in to it like this on the command line:


$ psql mydatabase

Which assumes the username peterbe which is what I'm logged in. So, this is a shortcut for doing this:


$ psql mydatabase -U peterbe

Which, assumes a blank/empty password.

Truncated! Read the rest by clicking the link below.

Optimization of getting random rows out of a PostgreSQL in Django

February 23, 2011
48 comments Django

There was a really interesting discussion on the django-users mailing list about how to best select random elements out of a SQL database the most efficient way. I knew using a regular RANDOM() in SQL can be very slow on big tables but I didn't know by how much. Had to run a quick test!

Cal Leeming discussed a snippet of his to do with pagination huge tables which uses the MAX(id) aggregate function.

So, I did a little experiment on a table with 84,000 rows in it. Realistic enough to matter even though it's less than millions. So, how long would it take to select 10 random items, 10 times? Benchmark code looks like this:


TIMES = 10
def using_normal_random(model):
   for i in range(TIMES):
       yield model.objects.all().order_by('?')[0].pk

t0 = time()
for i in range(TIMES):
   list(using_normal_random(SomeLargishModel))
t1 = time()
print t1-t0, "seconds"

Result:


41.8955321312 seconds

Nasty!! Also running this you'll notice postgres spiking your CPU like crazy.

A much better approach is to use Python's random.randint(1, <max ID>). Looks like this:


 from django.db.models import Max
 from random import randint
 def using_max(model):
   max_ = model.objects.aggregate(Max('id'))['id__max']
   i = 0
   while i < TIMES:
       try:
           yield model.objects.get(pk=randint(1, max_)).pk
           i += 1
       except model.DoesNotExist:
           pass

t0 = time()
for i in range(TIMES):
   list(using_max(SomeLargishModel))
t1 = time()
print t1-t0, "seconds"

Result:


0.63835811615 seconds

Much more pleasant!

UPDATE

Commentator, Ken Swift, asked what if your requirement is to select 100 random items instead of just 10. Won't those 101 database queries be more costly than just 1 query with a RANDOM(). Answer turns out to be no.

I changed the script to select 100 random items 1 time (instead of 10 items 10 times) and the times were the same:


using_normal_random() took 41.4467599392 seconds
using_max() took 0.6027739048 seconds

And what about 1000 items 1 time:


using_normal_random() took 204.685141802 seconds
using_max() took 2.49527382851 seconds

UPDATE 2

The algorithm for returning a generator has a couple of flaws:

  1. Can't pass in a QuerySet
  2. You get primary keys returned, not ORM instances
  3. You can't pass in a number
  4. Internally, it might randomly select a number already tried

Here's a much more complete function:


 def random_queryset_elements(qs, number):
    assert number <= 10000, 'too large'
    max_pk = qs.aggregate(Max('pk'))['pk__max']
    min_pk = qs.aggregate(Min('pk'))['pk__min']
    ids = set()
    while len(ids) < number:
        next_pk = random.randint(min_pk, max_pk)
        while next_pk in ids:
            next_pk = random.randint(min_pk, max_pk)
        try:
            found = qs.get(pk=next_pk)
            ids.add(found.pk)
            yield found
        except qs.model.DoesNotExist:
            pass