Peterbe.com

Peter Bengtsson's blog

Filtered by JavaScript, Python

Page 31

Sorting mixed type lists in Python 3

January 18, 2014
4 comments Python

Because this bit me harder than I was ready for, I thought I'd make a note of it for the next victim.

In Python 2, suppose you have this:


Python 2.7.5
>>> items = [(1, 'A number'), ('a', 'A letter'), (2, 'Another number')]

Sorting them, without specifying how, will automatically notice that it contains tuples:


Python 2.7.5
>>> sorted(items)
[(1, 'A number'), (2, 'Another number'), ('a', 'A letter')]

This doesn't work in Python 3 because comparing integers and strings is not allowed. E.g.:


Python 3.3.3
>>> 1 < '1'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()

You have to convert them to stings first.


Python 3.3.3
>>> sorted(items, key=lambda x: str(x[0]))
[(1, 'A number'), (2, 'Another number'), ('a', 'A letter')]

If you really need to sort by 1 < '1' this won't work. Then you need a more complex key function. E.g.:


Python 3.3.3
>>> def keyfunction(x):
...   v = x[0]
...   if isinstance(v, int): v = '0%d' % v
...   return v
...
>>> sorted(items, key=keyfunction)
[(1, 'A number'), (2, 'Another number'), ('1', 'Actually a string')]

That's really messy but the best I can come up with at past 4PM on Friday.

Credit Card formatter in Javascript

November 19, 2013
28 comments JavaScript

I looked around for Javascript libs that do automatic input formatting for credit card inputs.

The first one was formatter.js which looked promising but it weighs over 6Kb minified and also, when you apply it the placeholder attribute you have on the input disappears.

So, in true software engineering fashion I wrote my own:


function cc_format(value) {
  var v = value.replace(/\s+/g, '').replace(/[^0-9]/gi, '')
  var matches = v.match(/\d{4,16}/g);
  var match = matches && matches[0] || ''
  var parts = []
  for (i=0, len=match.length; i<len; i+=4) {
    parts.push(match.substring(i, i+4))
  }
  if (parts.length) {
    return parts.join(' ')
  } else {
    return value
  }
}

And some tests to prove it:


assert(cc_format('1234') === '1234')
assert(cc_format('123456') === '1234 56')
assert(cc_format('123456789') === '1234 5678 9')
assert(cc_format('') === '')
assert(cc_format('1234 1234 5') === '1234 1234 5')
assert(cc_format('1234 a 1234x 5') === '1234 1234 5')

Check out the Demo

From Postgres to JSON strings

November 12, 2013
14 comments Python

No, this is not about the new JSON Type added in Postgres 9.2. This is about how you can get a record set from a Postgres database into a JSON string the best way possible using Python.

Here's the traditional way:


>>> import json
>>> import psycopg2
>>>
>>> conn = psycopg2.connect('dbname=peterbecom')
>>> cur = conn.cursor()
>>> cur.execute("""
...   SELECT
...     id, oid, root, approved, name
...   FROM blogcomments
...   LIMIT 10
... """)
>>> columns = (
...     'id', 'oid', 'root', 'approved', 'name'
... )
>>> results = []
>>> for row in cur.fetchall():
...     results.append(dict(zip(columns, row)))
...
>>> print json.dumps(results, indent=2)
[
  {
    "oid": "comment-20030707-161847",
    "root": true,
    "id": 5662,
    "name": "Peter",
    "approved": true
  },
  {
    "oid": "comment-20040219-r4cf",
    "root": true,
    "id": 5663,
    "name": "silconscave",
    "approved": true
  },
  {
    "oid": "c091011r86x",
    "root": true,
    "id": 5664,
    "name": "Rachel Jay",
    "approved": true
  },
...

This is plain and nice but it's kinda annoying that you have to write down the columns you're selecting twice.
Also, it's annoying that you have to convert the results of fetchall() into a list of dicts in an extra loop.

So, there's a trick to the rescue! You can use the cursor_factory parameter. See below:


>>> import json
>>> import psycopg2
>>> from psycopg2.extras import RealDictCursor
>>>
>>> conn = psycopg2.connect('dbname=peterbecom')
>>> cur = conn.cursor(cursor_factory=RealDictCursor)
>>> cur.execute("""
...   SELECT
...     id, oid, root, approved, name
...   FROM blogcomments
...   LIMIT 10
... """)
>>>
>>> print json.dumps(cur.fetchall(), indent=2)
[
  {
    "oid": "comment-20030707-161847",
    "root": true,
    "id": 5662,
    "name": "Peter",
    "approved": true
  },
  {
    "oid": "comment-20040219-r4cf",
    "root": true,
    "id": 5663,
    "name": "silconscave",
    "approved": true
  },
  {
    "oid": "c091011r86x",
    "root": true,
    "id": 5664,
    "name": "Rachel Jay",
    "approved": true
  },
...

Isn't that much nicer? It's shorter and only lists the columns once.

But is it much faster? Sadly, no it's not. Not much faster. I ran various benchmarks comparing various ways of doing this and basically concluded that there's no significant difference. The latter one using RealDictCursor is around 5% faster. But I suspect all the time in the benchmark is spent doing things (the I/O) that is not different between the various versions.

Anyway. It's a keeper. I think it just looks nicer.

Lazy loading below the fold

October 26, 2013
2 comments Web development, JavaScript

I've started experimenting with my home page to make it load even faster.

Amazon famously does this too which you can read more about in this Steve Souders post. They make sure everything that needs to be visible above the fold is loaded first, then, it starts loading all the other "stuff" below the fold. The assumption is that the user requests the page, watches it render, and sometimes after it has rendered reaches for the mouse and starts scrolling down for more content. Or perhaps, never bothers to scroll down at all. Either way, everything below the fold can wait. We have more time, to load that in, later.

What we want to avoid is a load graph like this:

big html document delays loading other stuff

The graph is deliberately zoomed out so that we don't get stuck on the details of that particular graph. But basically, you have a very heavy document to load which needs to be fully loaded (and partially rendered) before it can load all other stuff that that page entails. As you can see, the first load (the HTML document) is taking up a majority of the load time. Once that's downloaded the browser can start parsing it and start rendering it. Simultaneously it can start downloading all the mentioned resources such as images, javascript, and CSS.

On WebPagetest they call this Speed Index; "The Speed Index is the average time at which visible parts of the page are displayed."
So basically, you want to display as much as you possibly can and then load in other things that are necessary but can wait in the background.

So, how did I accomplish this on my site?

Basically, the home page uses as piece of Django code that picks up the 10 most recent blog posts and includes them into the template. Instead, I made it only pick up the first 2 and then after window.onload a piece if AJAX code loads the HTML for the remaining 8 blog posts.
That means that much less is required to load the home page. The page is smaller and references less images. The AJAX code is very crude and simple but works enough:


onload = function() {
  microAjax("/rest/2/10/", function (res) {
    document.getElementById('rest').innerHTML = res;
  });
};

The user probably won't notice a huge difference if she avoids looking at the loading spinner of her browser. Only if she is really really fast at scrolling down will she notice that the rest of the page (about 80% of its vertical space) comes in a little bit later.

So, did it work?

I hope so! The theory is sound. However, my home page is, unlike an Amazon.com product page, very sparse. The page weighs a total of 77Kb (excluding external resources) but now only the first 25Kb is loaded and the rest later.

Here's a measurement before and one after. It's kinda hard to compare because "fluctuations" on network I/O make measurements like this quite unpredictable. Also, there's various odd requests like New Relic and Google Analytics which clouds the waterfall view. However, what really matters is in the "First View" of the after measurement. If you look closely you'll see that now a bunch of images aren't loaded until after the "Document Complete" event has fired. That, to me, is a big win.

If you're interested in how it was done, check out this changeset.

Fastest database for Tornado

October 9, 2013
9 comments Python, Tornado

When you use a web framework like Tornado, which is single threaded with an event loop (like nodejs familiar with that), and you need persistency (ie. a database) there is one important questions you need to ask yourself:

Is the query fast enough that I don't need to do it asynchronously?

If it's going to be a really fast query (for example, selecting a small recordset by key (which is indexed)) it'll be quicker to just do it in a blocking fashion. It means less CPU work to jump between the events.

However, if the query is going to be potentially slow (like a complex and data intensive report) it's better to execute the query asynchronously, do something else and continue once the database gets back a result. If you don't all other requests to your web server might time out.

Another important question whenever you work with a database is:

Would it be a disaster if you intend to store something that ends up not getting stored on disk?

This question is related to the D in ACID and doesn't have anything specific to do with Tornado. However, the reason you're using Tornado is probably because it's much more performant that more convenient alternatives like Django. So, if performance is so important, is durable writes important too?

Let's cut to the chase... I wanted to see how different databases perform when integrating them in Tornado. But let's not just look at different databases, let's also evaluate different ways of using them; either blocking or non-blocking.

What the benchmark does is:

On one single Python process...
For each database engine...
Create X records of something containing a string, a datetime, a list and a floating point number...
Edit each of these records which will require a fetch and an update...
Delete each of these records...

I can vary the number of records ("X") and sum the total wall clock time it takes for each database engine to complete all of these tasks. That way you get an insert, a select, an update and a delete. Realistically, it's likely you'll get a lot more selects than any of the other operations.

And the winner is:

pymongo!! Using the blocking version without doing safe writes.

Let me explain some of those engines

pymongo is the blocking pure python engine
with the redis, toredis and memcache a document ID is generated with uuid4, converted to JSON and stored as a key
toredis is a redis wrapper for Tornado
when it says (safe) on the engine it means to tell MongoDB to not respond until it has with some confidence written the data
motor is an asynchronous MongoDB driver specifically for Tornado
MySQL doesn't support arrays (unlike PostgreSQL) so instead the tags field is stored as text and transformed back and fro as JSON
None of these database have been tuned for performance. They're all fresh out-of-the-box installs on OSX with homebrew
None of these database have indexes apart from ElasticSearch where all things are indexes
momoko is an awesome wrapper for psycopg2 which works asyncronously specifically with Tornado
memcache is not persistant but I wanted to include it as a reference
All JSON encoding and decoding is done using ultrajson which should work to memcache, redis, toredis and mysql's advantage.
mongokit is a thin wrapper on pymongo that makes it feel more like an ORM
A lot of these can be optimized by doing bulk operations but I don't think that's fair
I don't yet have a way of measuring memory usage for each driver+engine but that's not really what this blog post is about
I'd love to do more work on running these benchmarks on concurrent hits to the server. However, with blocking drivers what would happen is that each request (other than the first one) would have to sit there and wait so the user experience would be poor but it wouldn't be any faster in total time.
I use the official elasticsearch driver but am curious to also add Tornado-es some day which will do asynchronous HTTP calls over to ES.

You can run the benchmark yourself

The code is here on github. The following steps should work:

$ virtualenv fastestdb
$ source fastestdb/bin/activate
$ git clone https://github.com/peterbe/fastestdb.git
$ cd fastestdb
$ pip install -r requirements.txt
$ python tornado_app.py

Then fire up http://localhost:8000/benchmark?how_many=10 and see if you can get it running.

Note: You might need to mess around with some of the hardcoded connection details in the file tornado_app.py.

Discussion

Before the lynch mob of HackerNews kill me for saying something positive about MongoDB; I'm perfectly aware of the discussions about large datasets and the complexities of managing them. Any flametroll comments about "web scale" will be deleted.

I think MongoDB does a really good job here. It's faster than Redis and Memcache but unlike those key-value stores, with MongoDB you can, if you need to, do actual queries (e.g. select all talks where the duration is greater than 0.5). MongoDB does its serialization between python and the database using a binary wrapper called BSON but mind you, the Redis and Memcache drivers also go to use a binary JSON encoding/decoder.

The conclusion is; be aware what you want to do with your data and what and where performance versus durability matters.

What's next

Some of those drivers will work on PyPy which I'm looking forward to testing. It should work with cffi like psycopg2cffi for example for PostgreSQL.

Also, an asynchronous version of elasticsearch should be interesting.

UPDATE 1

Today I installed RethinkDB 2.0 and included it in the test.

It was added in this commit and improved in this one.

I've been talking to the core team at RethinkDB to try to fix this.

In Python you sort with a tuple

June 14, 2013
9 comments Python

My colleague Axel Hecht showed me something I didn't know about sorting in Python.

In Python you can sort with a tuple. It's best illustrated with a simple example:


>>> items = [(1, 'B'), (1, 'A'), (2, 'A'), (0, 'B'), (0, 'a')]
>>> sorted(items)
[(0, 'B'), (0, 'a'), (1, 'A'), (1, 'B'), (2, 'A')]

By default the sort and the sorted built-in function notices that the items are tuples so it sorts on the first element first and on the second element second.

However, notice how you get (0, 'B') appearing before (0, 'a'). That's because upper case comes before lower case characters. However, suppose you wanted to apply some "humanization" on that and sort case insensitively. You might try:


>>> sorted(items, key=str.lower)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor 'lower' requires a 'str' object but received a 'tuple'

which is an error we deserve because this won't work for the first part of each tuple.

We could try to write a lambda function (e.g. sorted(items, key=lambda x: x.lower() if isinstance(x, str) else x)) but that's not going to work because you'll only ever get to apply that to the first item.

Without further ado, here's how you do it. A lambda function that returns a tuple:


>>> sorted(items, key=lambda x: (x[0], x[1].lower()))
[(0, 'a'), (0, 'B'), (1, 'A'), (1, 'B'), (2, 'A')]

And there you have it! Thanks for sharing Axel!

As a bonus item for people still reading...
I'm sure you know that you can reverse a sort order simply by passing in sorted(items, reverse=True, ...) but what if you want to have different directions depend on the key that you're sorting on.

Using the technique of a lambda function that returns a tuple, here's how we sort a slightly more advanced structure:


>>> peeps = [{'name': 'Bill', 'salary': 1000}, {'name': 'Bill', 'salary': 500}, {'name': 'Ted', 'salary': 500}]

And now, sort with a lambda function returning a tuple:


>>> sorted(peeps, key=lambda x: (x['name'], x['salary']))
[{'salary': 500, 'name': 'Bill'}, {'salary': 1000, 'name': 'Bill'}, {'salary': 500, 'name': 'Ted'}]

Makes sense, right? Bill comes before Ted and 500 comes before 1000. But how do you sort it like that on the name but reverse on the salary? Simple, negate it:


>>> sorted(peeps, key=lambda x: (x['name'], -x['salary']))
[{'salary': 1000, 'name': 'Bill'}, {'salary': 500, 'name': 'Bill'}, {'salary': 500, 'name': 'Ted'}]

UPDATE

Webucator has made a video explaining this blog post as a video.

Thanks Nat!

premailer now excludes pseudo selectors by default

May 27, 2013
0 comments Python, Web development

Thanks to Igor who emailed me and made me aware, you can't put pseudo classes in style attributes in HTML. I.e. this does not work:


<a href="#" style="color:pink :hover{color:red}">Sample Link</a>

See for yourself: Sample Link

Note how it does not become red when you hover over the link above.
This is what premailer used to do. Until yesterday.

BEFORE:


>>> from premailer import transform
>>> print transform('''
... <html>
... <style>
... a { color: pink }
... a:hover { color: red }
... </style>
... <a href="#">Sample Link</a>
... </html>
... ''')
<html><head><a href="#" style="{color:pink} :hover{color:red}">Sample Link</a></head></html>

AFTER:


>>> from premailer import transform
>>> print transform('''
... <html>
... <style>
... a { color: pink }
... a:hover { color: red }
... </style>
... <a href="#">Sample Link</a>
... </html>
... ''')
<html><head>
<style>a:hover {color:red}</style>
<a href="#" style="color:pink">Sample Link</a>
</head></html>

That's because the new default is exclude pseudo classes by default.

Thanks Igor for making me aware!

What stumped me about AngularJS

May 12, 2013
22 comments AngularJS, JavaScript

So I've now built my first real application using AngularJS. It's a fun side-project which my wife and I use to track what we spend money on. It's not a work project but it's also not another Todo list application. In fact, the application existed before as a typical jQuery app. So, I knew exactly what I needed to build but this time trying to avoid jQuery as much as I possibly could.

The first jQuery based version is here and although I'm hesitant to share this beginner-creation here's the AngularJS version

The following lists were some stumbling block and other things that stumped me. Hopefully by making this list it might help others who are also new to AngularJS and perhaps the Gods of AngularJS can see what confuses beginners like me.

1. AJAX doesn't work like jQuery

Similar to Backbone, I think, the default thing is to send the GET, POST request with the data the body blob. jQuery, by default, sends it as application/x-www-form-urlencoded. I like that because that's how most of my back ends work (e.g. request.GET.get('variable') in Django). I ended up pasting in this (code below) to get back what I'm familiar with:


module.config(function ($httpProvider) {
  $httpProvider.defaults.transformRequest = function(data){
    if (data === undefined) {
      return data;
    }
    return $.param(data);
  };
  $httpProvider.defaults.headers.post['Content-Type'] = ''
    + 'application/x-www-form-urlencoded; charset=UTF-8';
});

2. App/Module configuration confuses me

The whole concept of needing to define the code as an app or a module confused me. I think it all starts to make sense now. Basically, you don't need to think about "modules" until you start to split distinct things into separate files. To get started, you don't need it. At least not for simple applications that just have one chunk of business logic code.

Also, it's confusing why the the name of the app is important and why I even need a name.

3. How to do basic `.show()` and `.hide()` is handled by the data

In jQuery, you control the visibility of elements by working with the element based on data. In AngularJS you control the visibility by tying it to the data and then manipulate the data. It's hard to put your finger on it but I'm so used to looking at the data and then decide on elements' visibility. This is not an uncommon pattern in a jQuery app:


<p class="bench-press-question">
  <label>How much can you bench press?</label>
  <input name="bench_press_max">
</p>


if (data.user_info.gender == 'male') {
  $('.bench-press-question input').val(response.user_info.bench_press_max);
  $('.bench-press-question').show();
}

In AngularJS that would instead look something like this:


<p ng-show="male">
  <label>How much can you bench press?</label>
  <input name="bench_press_max" ng-model="bench_press_max">
</p>


if (data.user_info.gender == 'male') {
  $scope.male = true;
  $scope.bench_press_max = data.user_info.bench_press_max;
}

I know this can probably be expressed in some smarter way but what made me uneasy is that I mix stuff into the data to do visual things.

4. How do I use controllers that "manage" the whole page?

I like the ng-controller="MyController" thing because it makes it obvious where your "working environment" is as opposed to working with the whole document but what do I do if I need to tie data to lots of several places of the document?

To remedy this for myself I created a controller that manages, basically, the whole body. If I don't, I can't manage scope data that is scattered across totally different sections of the page.

I know it's a weak excuse but the code I ended up with has one massive controller for everything on the page. That can't be right.

5. `setTimeout()` doesn't quite work as you'd expect

If you do this in AngularJS it won't update as you'd expect.


<p class="status-message" ng-show="message">{{ message }}</p>


$scope.message = 'Changes saved!';
setTimout(function() {
  $scope.message = null;
}, 5 * 1000);

What you have to do, once you know it, is this:


function MyController($scope, $timeout) {
  ...
  $scope.message = 'Changes saved!'; 
  $timeout(function() {
    $scope.message = null;
  }, 5 * 1000);
}

It's not too bad but I couldn't see this until I had Googled some Stackoverflow questions.

6. Autocompleted password fields don't update the scope

Due to this bug when someone fills in a username and password form using autocomplete the password field isn't updating its data.

Let me explain; you have a username and password form. The user types in her username and her browser automatically now also fills in the password field and she's ready to submit. This simply does not work in AngularJS yet. So, if you have this code...:


<form>
<input name="username" ng-model="username" placeholder="Username">
<input type="password" name="password" ng-model="password" placeholder="Password">
<a class="button button-block" ng-click="submit()">Submit</a>
</form>


$scope.signin_submit = function() {
  $http.post('/signin', {username: $scope.username, password: $scope.password})
    .success(function(data) {
      console.log('Signed in!');
    };
  return false;
};

It simply doesn't work! I'll leave it to the reader to explore what available jQuery-helped hacks you can use.

7. Events for selection in a `<select>` tag is weird

This is one of those cases where readers might laugh at me but I just couldn't see how else to do it.
First, let me show you how I'd do it in jQuery:


$('select[name="choice"]').change(function() {
  if ($(this).val() == 'other') {
    // the <option value="other">Other...</option> option was chosen 
  }
});

Here's how I solved it in AngularJS:


$scope.$watch('choice', function(value) {
  if (value == 'other') {
    // the <option value="other">Other...</option> option was chosen 
  }
});

What's also strange is that there's nothing in the API documentation about $watch.

8. Controllers "dependency" injection is, by default, dependent on the controller's arguments

To have access to modules like $http and $timeout for example, in a controller, you put them in as arguments like this:


function MyController($scope, $http, $timeout) { 
  ...

It means that it's going to work equally if you do:


function MyController($scope, $timeout, $http) {  // order swapped
  ...

That's fine. Sort of. Except that this breaks minification so you have to do it this way:


var MyController = ['$scope', '$http', '$timeout', function($scope, $http, $timeout) {
  ...

Ugly! The first form depends on the interpreter inspecting the names of the arguments. The second form depends on the modules as strings.

The more correct way to do it is using the $inject. Like this:


MyController.$inject = ['$scope', '$http', '$timeout'];
function MyController($scope, $http, $timeout) {
  ...

Still ugly because it depends on them being strings. But why isn't this the one and only way to do it in the documentation? These days, no application is worth its salt if it isn't minify'able.

9. Is it "angular" or "angularjs"?

Googling and referring to it "angularjs" seems to yield better results.

This isn't a technical thing but rather something that's still in my head as I'm learning my way around.

In conclusion

I'm eager to write another blog post about how fun it has been to play with AngularJS. It's a fresh new way of doing things.

AngularJS code reminds me of the olden days when the HTML no longer looks like HTML but instead some document that contains half of the business logic spread all over the place. I think I haven't fully grasped this new way of doing things.

From hopping around example code and documentation I've seen some outrageously complicated HTML which I'm used to doing in Javascript instead. I appreciate that the HTML is after all part of the visual presentation and not the data handling but it still stumps me every time I see that what used to be one piece of functionality is now spread across two places (in the javascript controller and in the HTML directive).

I'm not givin up on AngularJS but I'll need to get a lot more comfortable with it before I use it in more serious applications.

Careful with your assertRaises() and inheritance of exceptions

April 10, 2013
10 comments Python

This took me by surprise today!

If you run this unit test, it actually passes with flying colors:


import unittest


class BadAssError(TypeError):
    pass


def foo():
    raise BadAssError("d'oh")


class Test(unittest.TestCase):

    def test(self):
        self.assertRaises(BadAssError, foo)
        self.assertRaises(TypeError, foo)
        self.assertRaises(Exception, foo)


if __name__ == '__main__':
    unittest.main()

Basically, assertRaises doesn't just take the exception that is being raised and accepts it, it also takes any of the raised exceptions' parents.

I've only tested it with Python 2.6 and 2.7. And the same works equally with unittest2.

I don't really know how I feel about this. It did surprise me when I was changing one of the exceptions and expected the old tests to break but they didn't. I mean, if I want to write a test that really makes sure the exception really is BadAssError it means I can't use assertRaises().

premailer now honours specificity

March 21, 2013
0 comments Python

Thanks to Theo Spears awesome effort premailer now support CSS specificity. What that means is that when linked and inline CSS blocks are transformed into tag style attributes the order is preserved as you'd expect.

When the browser applies CSS to elements it does it in a specific order. For example if you have this CSS:


p.tag { color: blue; }
p { color: red; }

and this HTML:


<p>Regular text</p>
<p class="tag">Special text</p>

the browser knows to draw the first paragraph tag in red and the second paragraph in red. It does that because p.tag is more specific that p.

Before, it would just blindly take each selector and set it as a style tag like this:


<p style="color:red">Regular text</p>
<p style="color:blue; color:red" class="tag">Special text</p>

which is not what you want.

The code in action is here.

Thanks Theo!