Filtered by JavaScript, Python

Page 34

Reset

String length truncation optimization difference in Python

March 19, 2012
8 comments Python

We have a piece of code that is going to be run A LOT on a server infrastructure that needs to be fast. I know that I/O is much more important but because I had the time I wanted to figure out which is fastest:


def a(s, m):
    if len(s) > m:
        s = s[:m]
    return s

...or...


def b(s, m):
    return s[:m]

Truncated! Read the rest by clicking the link below.

When to __deepcopy__ classes in Python

March 14, 2012
9 comments Python

When using mutables in Python you have to be careful:


>>> a = {'value': 1}
>>> b = a
>>> a['value'] = 2
>>> b
{'value': 2}

So, you use the copy module from the standard library:


>>> import copy
>>> a = {'value': 1}
>>> b = copy.copy(a)
>>> a['value'] = 2
>>> b
{'value': 1}

That's nice but it's limited. It doesn't deal with the nested mutables as you can see here:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.copy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'else'}}

That's when you need the copy.deepcopy function:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.deepcopy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'Something'}}

Now, suppose we have a custom class that overrides the dict type. That's a very common thing to do. Let's demonstrate:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(name='Value')
>>> b = copy.copy(a)
>>> a['name'] = 'Other'
>>> b
{'name': 'Value'}

And again, if you have a nested mutable object you need copy.deepcopy:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(data={'name': 'Something'})
>>> b = copy.deepcopy(a)
>>> a['data']['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

But oftentimes you'll want to make your dict subclass behave like a regular class so you can access data with dot notation. Like this:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'

Now here's a problem. If you do that, you loose the ability to use copy.deepcopy since the class has now been slightly "abused".


>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/copy.py", line 172, in deepcopy
    copier = getattr(x, "__deepcopy__", None)
  File "<stdin>", line 3, in __getattr__
KeyError: '__deepcopy__'

Hmm... now you're in trouble and to get yourself out of it you have to define a __deepcopy__ method as well. Let's just do it:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
...     def __deepcopy__(self, memo):
...         return ORM(copy.deepcopy(dict(self)))
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
>>> a.data['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

Yeah!!! Now we get what we want. Messing around with the __getattr__ like this is, as far as I know, the only time you have to go in and write your own __deepcopy__ method.

I'm sure hardcore Python language experts can point out lots of intricacies about __deepcopy__ but since I only learned about this today, having it here might help someone else too.

Persistent caching with fire-and-forget updates

December 14, 2011
4 comments Python, Tornado

I just recently landed some patches on toocool that implements and interesting pattern that is seen more and more these days. I call it: Persistent caching with fire-and-forget updates

Basically, the implementation is this: You issue a request that requires information about a Twitter user: E.g. http://toocoolfor.me/following/chucknorris/vs/peterbe The app looks into its MongoDB for information about the tweeter and if it can't find this user it goes onto the Twitter REST API and looks it up and saves the result in MongoDB. The next time the same information is requested, and the data is available in the MongoDB it instead checks if the modify_date or more than an hour and if so, it sends a job to the message queue (Celery with Redis in my case) to perform an update on this tweeter.

You can basically see the code here but just to reiterate and abbreviate, it looks like this:


tweeter = self.db.Tweeter.find_one({'username': username})
if not tweeter:
   result = yield tornado.gen.Task(...)
   if result:
       tweeter = self.save_tweeter_user(result)
   else:
       # deal with the error!
elif age(tweeter['modify_date']) > 3600:
   tasks.refresh_user_info.delay(username, ...)
# render the template!

What the client gets, i.e. the user using the site, is it that apart from the very first time that URL is request is instant results but data is being maintained and refreshed.

This pattern works great for data that doesn't have to be up-to-date to the second but that still needs a way to cache invalidate and re-fetch. This works because my limit of 1 hour is quite arbitrary. An alternative implementation would be something like this:


tweeter = self.db.Tweeter.find_one({'username': username})
if not tweeter or (tweeter and age(tweeter) > 3600 * 24 * 7):
    # re-fetch from Twitter REST API
elif age(tweeter) > 3600:
    # fire-and-forget update

That way you don't suffer from persistently cached data that is too old.

Python file with closing automatically

December 3, 2011
2 comments Python

Perhaps someone who knows more about the internals of python and the recent changes in 2.6 and 2.7 can explain this question that came up today in a code review.

I suggest using with instead of try: ... finally: to close a file that was written to. Instead of this:


dest = file('foo', 'w')
try:
   dest.write('stuff')
finally:
   dest.close()
print open('foo').read()  # will print 'stuff'

We can use this:


with file('foo', 'w') as dest: 
    dest.write('stuff')
print open('foo').read()  # will print 'stuff'

Why does that work? I'm guessing it's because the file() instance object has a built in __exit__ method. Is that right?

That means I don't need to use contextlib.closing(thing) right?

For example, suppose you have this class:


class Farm(object):
   def __enter__(self):
       print "Entering"
       return self
   def __exit__(self, err_type, err_val, err_tb):
       print "Exiting", err_type
       self.close()
   def close(self):
       print "Closing"

with Farm() as farm:
   pass
# this will print:
#   Entering
#   Exiting None
#   Closing

Another way to achieve the same specific result would be to use the closing() decrorator:


class Farm(object):
   def close(self):
       print "Closing"

from contextlib import closing
with closing(Farm()) as farm:
   pass
# this will print:
#   Closing

So the closing() decorator "steals" the __enter__ and __exit__. This last one can be handy if you do this:


from contextlib import closing
with closing(Farm()) as farm:
   raise ValueError

# this will print
#  Closing
#  Traceback (most recent call last):
#   File "dummy.py", line 16, in <module>
#     raise ValueError
#  ValueError

This is turning into my own loud thinking and I think I get it now. contextlib.closing() basically makes it possible to do what I did there with the __enter__ and __exit__ and it seems the file() built-in has a exit handler that takes care of the closing already so you don't have to do it with any extra decorators.

Trivial but powerful tips for nosetests

November 19, 2011
0 comments Python

I'm clearly still a nosetests beginner because it was only today that I figured out how to set certain plugins to always be on.

First of all you might like these plugins too:


$ pip install rudolf
$ pip install disabledoc

Docs: rudolf and disabledoc

To get these gorgeous little tricks into every run of nosetests edit the file ~/.noserc and add the following:


[nosetests]
with-disable-docstring=1
with-color=1

That should make your life a little easier.

UPDATE:

I've since managed to shoot myself in both legs with messing around with nosetests plugins because I heavily rely on django-nose in Django. Long story short: be careful if you get strange import related errors!

Going real simple on HTML5 audio

October 14, 2011
0 comments Web development, JavaScript

DoneCal users are to 80+% Chrome and Firefox users. Both Firefox and Chrome support the HTML <audio> element without any weird plugins and they both support the Ogg Vorbis (.ogg) file format. change log here

So, I used use the rather enterprisey plugin called SoundManager2 which attempts to abstract away all hacks into one single API. It uses a mix of browser sniffing, HTML5 and Flash. Although very promising, it is quite cumbersome. It doesn't work flawlessly despite their hard efforts. Unfortunately, using it also means a 30kb (optimized) Javascript file and a 3kb .swf file (if needed). So, instead of worrying about my very few Internet Explorer users I decided to go really dumb and simple on this.

The solution basically looks like this:


// somewhere.js
var SOUND_URLS = {
  foo: 'path/to/foo.ogg',
  egg: 'path/to/egg.ogg'
};

// play-sounds.js

/* Call to create and partially download the audo element.
 * You can all this as much as you like. */
function preload_sound(key) {
 var id = 'sound-' + key;
 if (!document.getElementById(id)) {
   if (!SOUND_URLS[key]) {
     throw "Sound for '" + key + "' not defined";
   } else if (SOUND_URLS[key].search(/\.ogg/i) == -1) {
     throw "Sound for '" + key + "' must be .ogg URL";
   }
   var a = document.createElement('audio');
   a.setAttribute('id', id);
   a.setAttribute('src', SOUND_URLS[key]);
   document.body.appendChild(a);
 }
 return id;
}

function play_sound(key) {
  document.getElementById(preload_sound(key)).play();
}

// elsewhere.js
$.lightbox.open({
   onComplete: function() {
      preload_sound('foo');
   }
});
$('#lightbox button').click(function() {
   play_sound('foo');
});

Basically, only Firefox, Chrome and Opera support .ogg but it's a good and open source encoding so I don't mind being a bit of an asshole about it. This little script could be slightly extended with some browser sniffing to work with Safari people but right now it doesn't feel like it's worth the effort.

This make me happy and I feel lean and light. A good feeling!

Title - a javascript snippet to control the document title

August 22, 2011
0 comments JavaScript

This is a piece of Javascript code I use on "Kwissle" to make the document title change temporarily. Other people might find it useful too.

Code looks like this:


var Title = (function() {
 var current_title = document.title
   , timer;

 return {
    showTemporarily: function (msg, msec) {
      msec = typeof(msec) !== 'undefined' ? msec : 3000;
      if (msec < 100) msec *= 1000;
      if (timer) {
        clearTimeout(timer);
      }
      document.title = msg;
      timer = setTimeout(function() {
        document.title = current_title;
      }, msec);
    }
 }
})();

Demo here

Comparing Google Closure with UglifyJS

July 10, 2011
17 comments JavaScript

On Kwissle I'm using Google Closure Compiler to minify all Javascript files. It's fine but because it's java and because I'm running this on a struggling EC2 micro instance the CPU goes up to 99% for about 10 seconds when it does the closure compilation. Sucks!

So, I threw UglifyJS into the mix and instead of replacing the Closure compiler I added it so it runs alongside but I obviously only keep one of the outputs.

Here is the log output when I run it on my MacbookPro:


MAKING ./static/js/account.js
UglifyJS took 0.0866 seconds to compress 3066 bytes into 1304 (42.5%)
Closure took 1.2365 seconds to compress 3066 bytes into 1225 (40.0%)
MAKING ./static/js/ext/jquery.cookie.js
UglifyJS took 0.0843 seconds to compress 3655 bytes into 3009 (82.3%)
Closure took 1.3472 seconds to compress 3655 bytes into 4086 (111.8%)
MAKING ./static/js/ext/jquery.tipsy.js
UglifyJS took 0.1029 seconds to compress 7527 bytes into 3581 (47.6%)
Closure took 1.3062 seconds to compress 7527 bytes into 3425 (45.5%)
MAKING ./static/js/maxlength_countdown.js
UglifyJS took 0.082 seconds to compress 1502 bytes into 1033 (68.8%)
Closure took 1.2159 seconds to compress 1502 bytes into 853 (56.8%)
MAKING ./static/js/ext/socket.io-0.6.3.js
UglifyJS took 0.299 seconds to compress 76870 bytes into 30787 (40.1%)
Closure took 2.4817 seconds to compress 76870 bytes into 30628 (39.8%)
MAKING ./static/js/scoreboard.js
UglifyJS took 0.084 seconds to compress 2768 bytes into 1239 (44.8%)
Closure took 1.2512 seconds to compress 2768 bytes into 1167 (42.2%)
MAKING ./static/js/rumbler.js
UglifyJS took 0.0872 seconds to compress 3087 bytes into 1384 (44.8%)
Closure took 1.2587 seconds to compress 3087 bytes into 1235 (40.0%)
MAKING ./static/js/ext/shortcut.js
UglifyJS took 0.0987 seconds to compress 5796 bytes into 2537 (43.8%)
Closure took 1.3231 seconds to compress 5796 bytes into 2410 (41.6%)
MAKING ./static/js/play.js
UglifyJS took 0.1483 seconds to compress 18473 bytes into 10592 (57.3%)
Closure took 1.4497 seconds to compress 18473 bytes into 10703 (57.9%)
MAKING ./static/js/playsound.js
UglifyJS took 0.0824 seconds to compress 1205 bytes into 869 (72.1%)
Closure took 1.2335 seconds to compress 1205 bytes into 873 (72.4%)  

(Note here that for the file ./static/js/ext/jquery.cookie.js Closure failed and when it fails it leaves the code as is and prepends it with a copy about the error from the stdout. that's why it's greater than 100% on that file)

Here are the averages of those numbers:


AVERAGE TIME: (lower is better)
  * UglifyJS: 0.11554 seconds
  * Closure: 1.41037 seconds

AVERAGE REDUCTION: (higher is better)
  * UglifyJS: 45.6% 
  * Closure: 51.5%

(I'm skipping the file that Closure failed to minify)

So, what does that mean in bytes? These are the source Javascript files for two pages but the total is 123949.0 bytes. With Closure that saves me 63833.7 bytes (62 kbytes) whereas UglifyJS only saves me 57760.2 bytes (56 kbytes) of bandwidth.

Discussion...

The fact that Closure fails on one file is a real bummer. I'm not even using the advanced options here.

UglifyJS doesn't save as many bytes as Closure does. This is potentially important because after all, minification process happens only once per revision of the original file but might be served hundreds or millions of times.

Because I run my minifications on-the-fly it does matter to me that UglifyJS is 1220% faster.

It's rare but I have observed twice that the minified Javascript from Closure has actually broken my code. I default to suspect that's my fault for not making the code "minifyable" enough (e.g. too many global variables).

I've just noticed that I'm relying on files that almost never change (e.g. jquery.tipsy.js). I might as well create a ...min.js versions of them and add them to the repository.

In conclusion...

Because of the convenience of UglifyJS being so much faster and that it doesn't choke on that jquery.cookie.js file I'm going switch to UglifyJS for the moment. The remaining bytes that I don't save become insignificant if you add the gzip effect and compared to images the bandwidth total is quite insignificant.

UPDATE (JAN 2016)

I wrote a follow-up post comparing UglifyJS2 with Closure Compiler with the advanced option.

Slides about Kwissle from yesterdays London Python Dojo

July 8, 2011
0 comments Python

Since this was published, I've abandoned the kwissle.com domain name. So no link no more.

Here are the slides from yesterday's London Python Dojo event.

I presented and demo'ed "Kwissle" to my fellow Python London friends and focused a lot on the technology but also tried to plug the game a bit.

Having seen that there's a lot of interest in "socket" related web applications about I thought this was a good chance to say that you don't need NodeJS and that tornadio is a great framework for that.

Optimization story involving something silly I call "dict+"

June 13, 2011
0 comments Python, MongoDB

Here's a little interesting story about using MongoKit to quickly draw items from a MongoDB

So I had a piece of code that was doing a big batch update. It was slow. It took about 0.5 seconds per user and I sometimes had a lot of users to run it for.

The code looked something like this:


 for play in db.PlayedQuestion.find({'user.$id': user._id}):
    if play.winner == user:
         bla()
    elif play.draw:
         ble()
    else:
         blu()

Because the model PlayedQuestion contains DBRefs MongoKit will automatically look them up for every iteration in the main loop. Individually very fast (thank you indexes) but because of the number of operations very slow in total. Here's how to make it much faster:


   for play in db.PlayedQuestion.collection.find({'user.$id': user._id}):

The problem with this is that you get dict instances for each which is more awkward to work with. I.e. instead of `play.winner` you have use `play['winner'].id`. Here's my solution that makes this a lot easier:


class dict_plus(dict):

  def __init__(self, *args, **kwargs):
       if 'collection' in kwargs:  # excess we don't need
           kwargs.pop('collection')
       dict.__init__(self, *args, **kwargs)
       self._wrap_internal_dicts()

   def _wrap_internal_dicts(self):
       for key, value in self.items():
           if isinstance(value, dict):
               self[key] = dict_plus(value)

   def __getattr__(self, key):
       if key.startswith('__'):
           raise AttributeError(key)
       return self[key]

  ...

 for play in db.PlayedQuestion.collection.find({'user.$id': user._id}):
    play = dict_plus(play)
    if play.winner.id == user._id:
         bla()
    elif play.draw:
         ble()
    else:
         blu()

Now, the whole thing takes 0.01 seconds instead of 0.5. 50 times faster!!