IssueTrackerProduct now officially abandoned

March 30, 2012
6 comments Zope, IssueTrackerProduct

In 2001 I started my first and perhaps most successful Open Source project I've ever made: IssueTrackerProduct. After nearly a decade of maintaining it I have now officially abandoned it.

It all started when I needed a way to track feedback on my personal website. That's why it was originally called "SiteTrackerProduct". I needed something where I could collect bug reports and any other pieces of feedback and then process it in some structured fashion. It was therefore very important that it would be possible to run the application open for anonymous access. People should be able to submit bugs and issues without having to create an account. You see, kids, back in that day it was actually very common that sites would force users to register and create accounts even just because the content owner wanted it. These days, it's common knowledge that to get people to open up and share anything for others to benefit you make it absolutely trivial to jump straight in without having to see a registration page that looks like a tax return form.

Now, since I long ago abandoned the Zope2 application server technology stack and I no longer use IssueTrackerProduct for anything real it's no longer feasible to maintain this project. In the last five years or so we were actually using it actively to track all projects at Fry-IT where I used to work. I have to say, even though we did grow out of it, it was actually successful. It handled the load (after some much needed patches towards optimization) and it was easy for people to actually use since unlike many other bug trackers, it focused on the non-technical end user first and foremost. As much as possible was done to make it trivial to type in your bug or issue and it automatically took care of all notifications and access rights.

Being a personal Open Source project, over the years, it became a melting pot for experimenting and perfecting various new ideas. Many of them we take for granted today but back then it was quite novel if I may say so. This includes:

  • ability to auto-save unfinished form inputs (added before Gmail had it)
  • automatic updates of the content without reload made it possible to see other people participating as you're typing
  • ability to reply directly to email notifications without having to open a web browser
  • an advanced via-the-web programmable interface for adding and modifying custom fields (e.g. "Customer reference code")
  • full-text search combined with ability to search on specific fields by key
  • file attachments that are images automatically appear as little thumbnails
  • file attachments that have text become searchable (e.g Word documents)
  • advanced filtering where you can easily decide to search inclusive or exclusive on certain fields
  • persistent filtering automatically saved and share'able between different users
  • programmable search filters that are coded in Python which made it possible to create very specific reports
  • ability to export and import bugs from and to Excel for offline processing

Writing all of this, I can not resist to get a bit nostalgic. I did sink A LOT of time into this project. Today when I look back at the code and almost feel sick seeing all the mistakes that I made. Much of the ugliness of the code can be attributed partially to the fact that I often used and abused the code to add new features. Also, because we often needed some features (since it was used to manage all of our projects) "yesterday" and then it was hard to justify doing things "properly". For example, the main .py file is over 14,000 lines of code!

I did called it "perhaps most successful Open Source project I've ever made" in the first sentence. The reason for that is that over the years many many people have downloaded it and installed and let it be used by thousands of users. That's something to be proud of.

Anyway! It's time to move on. So long and thank you for all the fish!

The code is still available at github.com/peterbe/IssueTrackerProduct

How much faster is Nginx+gunicorn than Apache+mod_wsgi?

March 22, 2012
9 comments Linux, Django

Short answer: about 5%

I had a few minutes and wanted to see if changing from Apache + mod_wsgi to Nginx + gunicorn would make the otherwise slow site any faster. It's not this site but another Django site for work (which, by the way, doesn't have to be fast). It's slow because it doesn't cache any of the SQL queries.

# with Apache + mod_wsgi
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    39 [#/sec] (mean)
...
# Uses about 110 Mb

That's after running multiple times and roughly averaging the requests per seconds.

# with Nginx + guncorn --workers=4
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    41 [#/sec] (mean)
...
# uses about 70 Mb

So, if you want to make a site fast forget about how the code is being served until all the slow db I/O is taken care of properly.

String length truncation optimization difference in Python

March 19, 2012
8 comments Python

We have a piece of code that is going to be run A LOT on a server infrastructure that needs to be fast. I know that I/O is much more important but because I had the time I wanted to figure out which is fastest:


def a(s, m):
    if len(s) > m:
        s = s[:m]
    return s

...or...


def b(s, m):
    return s[:m]

Truncated! Read the rest by clicking the link below.

When to __deepcopy__ classes in Python

March 14, 2012
9 comments Python

When using mutables in Python you have to be careful:


>>> a = {'value': 1}
>>> b = a
>>> a['value'] = 2
>>> b
{'value': 2}

So, you use the copy module from the standard library:


>>> import copy
>>> a = {'value': 1}
>>> b = copy.copy(a)
>>> a['value'] = 2
>>> b
{'value': 1}

That's nice but it's limited. It doesn't deal with the nested mutables as you can see here:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.copy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'else'}}

That's when you need the copy.deepcopy function:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.deepcopy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'Something'}}

Now, suppose we have a custom class that overrides the dict type. That's a very common thing to do. Let's demonstrate:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(name='Value')
>>> b = copy.copy(a)
>>> a['name'] = 'Other'
>>> b
{'name': 'Value'}

And again, if you have a nested mutable object you need copy.deepcopy:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(data={'name': 'Something'})
>>> b = copy.deepcopy(a)
>>> a['data']['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

But oftentimes you'll want to make your dict subclass behave like a regular class so you can access data with dot notation. Like this:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'

Now here's a problem. If you do that, you loose the ability to use copy.deepcopy since the class has now been slightly "abused".


>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/copy.py", line 172, in deepcopy
    copier = getattr(x, "__deepcopy__", None)
  File "<stdin>", line 3, in __getattr__
KeyError: '__deepcopy__'

Hmm... now you're in trouble and to get yourself out of it you have to define a __deepcopy__ method as well. Let's just do it:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
...     def __deepcopy__(self, memo):
...         return ORM(copy.deepcopy(dict(self)))
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
>>> a.data['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

Yeah!!! Now we get what we want. Messing around with the __getattr__ like this is, as far as I know, the only time you have to go in and write your own __deepcopy__ method.

I'm sure hardcore Python language experts can point out lots of intricacies about __deepcopy__ but since I only learned about this today, having it here might help someone else too.

Persistent caching with fire-and-forget updates

December 14, 2011
4 comments Python, Tornado

I just recently landed some patches on toocool that implements and interesting pattern that is seen more and more these days. I call it: Persistent caching with fire-and-forget updates

Basically, the implementation is this: You issue a request that requires information about a Twitter user: E.g. http://toocoolfor.me/following/chucknorris/vs/peterbe The app looks into its MongoDB for information about the tweeter and if it can't find this user it goes onto the Twitter REST API and looks it up and saves the result in MongoDB. The next time the same information is requested, and the data is available in the MongoDB it instead checks if the modify_date or more than an hour and if so, it sends a job to the message queue (Celery with Redis in my case) to perform an update on this tweeter.

You can basically see the code here but just to reiterate and abbreviate, it looks like this:


tweeter = self.db.Tweeter.find_one({'username': username})
if not tweeter:
   result = yield tornado.gen.Task(...)
   if result:
       tweeter = self.save_tweeter_user(result)
   else:
       # deal with the error!
elif age(tweeter['modify_date']) > 3600:
   tasks.refresh_user_info.delay(username, ...)
# render the template!

What the client gets, i.e. the user using the site, is it that apart from the very first time that URL is request is instant results but data is being maintained and refreshed.

This pattern works great for data that doesn't have to be up-to-date to the second but that still needs a way to cache invalidate and re-fetch. This works because my limit of 1 hour is quite arbitrary. An alternative implementation would be something like this:


tweeter = self.db.Tweeter.find_one({'username': username})
if not tweeter or (tweeter and age(tweeter) > 3600 * 24 * 7):
    # re-fetch from Twitter REST API
elif age(tweeter) > 3600:
    # fire-and-forget update

That way you don't suffer from persistently cached data that is too old.

Cryptic errors when using django-nose

December 7, 2011
2 comments Django

After about 3 days of debugging using pdb, print and writing to a log file I've almost finally solve my bizarre errors I was getting when running a whole test suite. The error that it lead to was that Django refused to re-register models to the admin and the errors looked something like this:


 ...
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/urls.py", line 6, in <module>
   admin.autodiscover()
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/vendor/src/django/django/contrib/admin/__init__.py", line 26, in autodiscover
   import_module('%s.admin' % app)
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/vendor/src/django/django/utils/importlib.py", line 35, in import_module
   __import__(name)
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/apps/users/admin.py", line 30, in <module>
   admin.site.register(UserProfile, UserProfileAdmin)
 File "/Users/peterbe/dev/MOZILLA/PTO/pto/vendor/src/django/django/contrib/admin/sites.py", line 85, in register
   raise AlreadyRegistered('The model %s is already registered' % model.__name__)
AlreadyRegistered: The model UserProfile is already registered

Turns out to be independent of which Django project I ran and it was something no one else was able to reproduce on any machine with the exact same code.

After 2 days I found that there's a difference between a successful run and a failing run was how I specified (to nose) which module to load:


./manage.py test users  # fails!
./manage.py test users.test  # works!

In both cases it finds the same tests. So it would either fail 10 times or work 10 times. Hmmm...

The bridging between nose and Django is done by awesome django-nose developed here at Mozilla by Django extraordinaire Jeff Balogh and it's a non-trivial piece of code as it depends on some really smart importing tricks and stuff which I haven't even begun to understand.

However, after so many trial and errors I finally discovered that the solution (for me) was to delete the ~/.noserc file. What's strange is that all it contained was:


[nosetests]
with-doctest=1

I might never actually find out what went wrong. Ultimately I think a reason things went wrong was because it incorrectly populated sys.modules with excessive keys that would cause double imports of urls.py which in turn runs admin.autodiscover() but incorrectly does so twice.

Sorry for the rambling. And sorry for not actually finding the real bug. I did spent 2-3 days debugging this non-stop and hopefully some other poor frustrated person is going to see this and also look into the ~/.noserc for ways to fix it maybe.

Python file with closing automatically

December 3, 2011
2 comments Python

Perhaps someone who knows more about the internals of python and the recent changes in 2.6 and 2.7 can explain this question that came up today in a code review.

I suggest using with instead of try: ... finally: to close a file that was written to. Instead of this:


dest = file('foo', 'w')
try:
   dest.write('stuff')
finally:
   dest.close()
print open('foo').read()  # will print 'stuff'

We can use this:


with file('foo', 'w') as dest: 
    dest.write('stuff')
print open('foo').read()  # will print 'stuff'

Why does that work? I'm guessing it's because the file() instance object has a built in __exit__ method. Is that right?

That means I don't need to use contextlib.closing(thing) right?

For example, suppose you have this class:


class Farm(object):
   def __enter__(self):
       print "Entering"
       return self
   def __exit__(self, err_type, err_val, err_tb):
       print "Exiting", err_type
       self.close()
   def close(self):
       print "Closing"

with Farm() as farm:
   pass
# this will print:
#   Entering
#   Exiting None
#   Closing

Another way to achieve the same specific result would be to use the closing() decrorator:


class Farm(object):
   def close(self):
       print "Closing"

from contextlib import closing
with closing(Farm()) as farm:
   pass
# this will print:
#   Closing

So the closing() decorator "steals" the __enter__ and __exit__. This last one can be handy if you do this:


from contextlib import closing
with closing(Farm()) as farm:
   raise ValueError

# this will print
#  Closing
#  Traceback (most recent call last):
#   File "dummy.py", line 16, in <module>
#     raise ValueError
#  ValueError

This is turning into my own loud thinking and I think I get it now. contextlib.closing() basically makes it possible to do what I did there with the __enter__ and __exit__ and it seems the file() built-in has a exit handler that takes care of the closing already so you don't have to do it with any extra decorators.

Integrate BrowserID in a Tornado web app

November 22, 2011
2 comments Tornado, Mozilla

Integrate BrowserID in a Tornado web app BrowserID is a new single sign-on initiative lead by Mozilla that takes a very refreshing approach to single sign-on. It's basically like OpenID except better and similar to the OAuth solutions from Google, Twitter, Facebook, etc but without being tied to those closed third-parties.

At the moment, BrowserID is ready for production (I have it on Kwissle) but the getting started docs is still something that is under active development (I'm actually contributing to this).

Anyway, I thought I'd share how to integrate it with Tornado

First, you need to do the client-side of things. I use jQuery but that's not a requirement to be able to use BrowserID. Also, there are different "patterns" to do login. Either you have a header that either says "Sign in"/"Hi Your Username". Or you can have a dedicated page (e.g. mysite.com/login/). Let's, for simplicity sake, pretend we build a dedicated page to log in. First, add the necessary HTML:


<a href="#" id="browserid" title="Sign-in with BrowserID">
  <img src="/images/sign_in_blue.png" alt="Sign in">
</a>
<script src="https://browserid.org/include.js" async></script>

Next you need the Javascript in place so that clicking on the link above will open the BrowserID pop-up:


function loggedIn(response) {
  location.href = response.next_url;
  /* alternatively you could do something like this instead:
       $('#header .loggedin').show().text('Hi ' + response.first_name);
    ...or something like that */
}

function gotVerifiedEmail(assertion) {
 // got an assertion, now send it up to the server for verification
 if (assertion !== null) {
   $.ajax({
     type: 'POST',
     url: '/auth/login/browserid/',
     data: { assertion: assertion },
     success: function(res, status, xhr) {
       if (res === null) {}//loggedOut();
       else loggedIn(res);
     },
     error: function(res, status, xhr) {
       alert("login failure" + res);
     }
   });
 }
 else {
   //loggedOut();
 }
}

$(function() {
  $('#browserid').click(function() {
    navigator.id.getVerifiedEmail(gotVerifiedEmail);
    return false;
  });
});

Next up is the server-side part of BrowserID. Your job is to take the assertion that is given to you by the AJAX POST and trade that with https://browserid.org for an email address:


import urllib
import tornado.web
import tornado.escape 
import tornado.httpclient
...

@route('/auth/login/browserid/')  # ...or whatever you use
class BrowserIDAuthLoginHandler(tornado.web.RequestHandler):

   def check_xsrf_cookie(self):  # more about this later
       pass

   @tornado.web.asynchronous
   def post(self):
       assertion = self.get_argument('assertion')
       http_client = tornado.httpclient.AsyncHTTPClient()
       domain = 'my.domain.com'  # MAKE SURE YOU CHANGE THIS
       url = 'https://browserid.org/verify'
       data = {
         'assertion': assertion,
         'audience': domain,
       }
       response = http_client.fetch(
         url,
         method='POST',
         body=urllib.urlencode(data),
         callback=self.async_callback(self._on_response)
       )

   def _on_response(self, response):
       struct = tornado.escape.json_decode(response.body)
       if struct['status'] != 'okay':
           raise tornado.web.HTTPError(400, "Failed assertion test")
       email = struct['email']
       self.set_secure_cookie('user', email,
                                  expires_days=1)
       self.set_header("Content-Type", "application/json; charset=UTF-8")
       response = {'next_url': '/'}
       self.write(tornado.escape.json_encode(response))
       self.finish()

Now that should get you up and running. There's of couse a tonne of things that can be improved. Number one thing to improve is to use XSRF on the AJAX POST. The simplest way to do that would be to somehow dump the XSRF token generated into your page and include it in the AJAX POST. Perhaps something like this:


<script>
var _xsrf = '{{ xsrf_token }}';
...
function gotVerifiedEmail(assertion) {
 // got an assertion, now send it up to the server for verification
 if (assertion !== null) {  
    $.ajax({
     type: 'POST',
     url: '/auth/login/browserid/',
     data: { assertion: assertion, _xsrf: _xsrf },
... 
</script>

Another thing that could obviously do with a re-write is the way users are handled server-side. In the example above I just set the asserted user's email address in a secure cookie. More realistically, you'll have a database of users who you match by email address but instead store their database ID in a cookie or something like that.

What's so neat about solutions such as OpenID, BrowserID, etc. is that you can combine two things in one process: Sign-in and Registration. In your app, all you need to do is a simple if statement in the code like this:


user = self.db.User.find_by_email(email) 
if not user:
    user = self.db.User()
    user.email = email
    user.save()
self.set_secure_cookie('user', str(user.id))

Hopefully that'll encourage a couple of more Tornadonauts to give BrowserID a try.

Trivial but powerful tips for nosetests

November 19, 2011
0 comments Python

I'm clearly still a nosetests beginner because it was only today that I figured out how to set certain plugins to always be on.

First of all you might like these plugins too:


$ pip install rudolf
$ pip install disabledoc

Docs: rudolf and disabledoc

To get these gorgeous little tricks into every run of nosetests edit the file ~/.noserc and add the following:


[nosetests]
with-disable-docstring=1
with-color=1

That should make your life a little easier.

UPDATE:

I've since managed to shoot myself in both legs with messing around with nosetests plugins because I heavily rely on django-nose in Django. Long story short: be careful if you get strange import related errors!

Going real simple on HTML5 audio

October 14, 2011
0 comments Web development, JavaScript

DoneCal users are to 80+% Chrome and Firefox users. Both Firefox and Chrome support the HTML <audio> element without any weird plugins and they both support the Ogg Vorbis (.ogg) file format. change log here

So, I used use the rather enterprisey plugin called SoundManager2 which attempts to abstract away all hacks into one single API. It uses a mix of browser sniffing, HTML5 and Flash. Although very promising, it is quite cumbersome. It doesn't work flawlessly despite their hard efforts. Unfortunately, using it also means a 30kb (optimized) Javascript file and a 3kb .swf file (if needed). So, instead of worrying about my very few Internet Explorer users I decided to go really dumb and simple on this.

The solution basically looks like this:


// somewhere.js
var SOUND_URLS = {
  foo: 'path/to/foo.ogg',
  egg: 'path/to/egg.ogg'
};

// play-sounds.js

/* Call to create and partially download the audo element.
 * You can all this as much as you like. */
function preload_sound(key) {
 var id = 'sound-' + key;
 if (!document.getElementById(id)) {
   if (!SOUND_URLS[key]) {
     throw "Sound for '" + key + "' not defined";
   } else if (SOUND_URLS[key].search(/\.ogg/i) == -1) {
     throw "Sound for '" + key + "' must be .ogg URL";
   }
   var a = document.createElement('audio');
   a.setAttribute('id', id);
   a.setAttribute('src', SOUND_URLS[key]);
   document.body.appendChild(a);
 }
 return id;
}

function play_sound(key) {
  document.getElementById(preload_sound(key)).play();
}

// elsewhere.js
$.lightbox.open({
   onComplete: function() {
      preload_sound('foo');
   }
});
$('#lightbox button').click(function() {
   play_sound('foo');
});

Basically, only Firefox, Chrome and Opera support .ogg but it's a good and open source encoding so I don't mind being a bit of an asshole about it. This little script could be slightly extended with some browser sniffing to work with Safari people but right now it doesn't feel like it's worth the effort.

This make me happy and I feel lean and light. A good feeling!