Introducing django-spellcorrector

May 28, 2009
0 comments Django

I've now made a vastly improved spellcorrector specifically tied into Django and it's models. It's the old class as before but hooked up to models so Django can take care of persisting the trained words. Again, I have to give tribute to Peter Norvig for his inspirational blog How to Write a Spelling Corrector which a large majority of my code is based in. At least in the tricky parts.

What's nice about this little app is that it's very easy to plug in and use. You just download it, put it on your PATH and include it in your INSTALLED_APPS. Then from another app you do something like this:


from spellcorrector.views import Spellcorrector
sc = Spellcorrector()
sc.load() # nothing will happen the first time

sc.train(u"peter")
print sc.correct(u"petter") # will print peter
sc.save()

sc2 = Spellcorrector()
sc2.load()
print sc2.correct(u"petter") # will print peter

Truncated! Read the rest by clicking the link below.

Crossing the world - new feature on Crosstips

May 23, 2009
1 comment Django

Crossing the world - new feature on Crosstips I've added a very fun new feature on Crosstips called Crossing the world which shows real-time searches happening all over the world. Admittedly the traffic on Crosstips isn't particularly high, (At the time of writing, 1 search every 2 minutes) so you might have to sit there for a while until something happens. It's strangely addictive to watch it.

To do this I had to use all sorts of buzz words. AJAX, function cache decorators, GeoIP and Google Maps. I'm currently using the free version of GeoIP City Lite which seems to work on a large majority of all captured IP addresses. And since the map is sufficiently zoomed out you can't really tell how inaccurate it is.

One little detail I'm quite proud of is how the AJAX code understands how to change interval between lookups. Each time the server responds with something, the interval is reduced down but if there aren't any new searches the interval slowly increases again. This is done to minimize the number of useless server requests but at the same time try to make it react often if there are plenty of things to show. The next feature to add is Comet (like AJAX but push instead of pull).

Now if we could only get some more action on the site!! Tell all your grand-people to use this site when they get stuck on solving crossword puzzles!

UPDATE

I've just learnt that GeoIP is already shipped in GeoDjango so I've basically reinvented half a wheel :(

Sequences in PostgreSQL and rolling back transactions

May 12, 2009
0 comments Linux

This behavior bit me today and caused me some pain so hopefully by sharing it it can help someone else not ending up in the same pitfall.

Basically, I use Zope to manage a PostgreSQL database and since Zope is 100% transactional it rolls back queries when exception occur. That's great but what I didn't know is that when it rolls back it doesn't roll back the sequences. Makes sense in retrospect I guess. Here's a proof of that:


test_db=# create table "foo" (id serial primary key, name varchar(10));
CREATE TABLE
test_db=# insert into foo(name) values('Peter');
INSERT 0 1
test_db=# select * from foo;
 id | name  
----+-------
  1 | Peter
(1 row)

test_db=#  select nextval('foo_id_seq');
 nextval 
---------
       2
(1 row)

test_db=# begin;
BEGIN
test_db=# insert into foo(id, name) values(2, 'Sonic');
INSERT 0 1
test_db=# rollback;
ROLLBACK
test_db=#  select nextval('foo_id_seq');
 nextval 
---------
       3
(1 row)

In my application I often use the sequences to predict what the auto generate new ID is going to be for things that the application can use such as redirecting or updating some other tables. As I wasn't expecting this it caused a bug in my web app.

Most unusual letters in English language

May 12, 2009
11 comments Python

I needed to find out what are the least used letters in the English language. I pulled down a list of about 100,000+ English words, split them all and made a list of about 1,000,000 letters. Sorted them by usage and came up with this as the result:


esiarntoldcugpmhbyfkwvzxjq

It would be interesting to make a heatmap of this over an image of a QWERTY keyboard.

Truncated! Read the rest by clicking the link below.

To JSON, Pickle or Marshal in Python

May 8, 2009
4 comments Python

To JSON, Pickle or Marshal in Python I was reading David Cramer's tip to use JSONField in Django to be able to store arbitrary fields in a SQL database. Nice. But is it fast enough? Well, I can't answer that but I did look into the difference in read/write performance between simplejson, cPickle and marshal.

Only reading:


JSON 0.00593531370163
PICKLE 0.0109532237053
MARSHAL 0.00413788318634

Reading and writing:


JSON 0.0434390544891
PICKLE 0.0289686655998
MARSHAL 0.00728442907333

Clearly marshal is faster but to quote the documentation:

"Warning: The marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source."

Clearly simplejson is a very fast reader and the JSON format has the delicious advantage that it's "human readable" (compared to the others).

NOTE! I spent about 5 minutes putting together the script and about 10 minutes writing this so feel free to doubt it's scientific accuracy.

Truncated! Read the rest by clicking the link below.

Never seen before Google Server Error

May 7, 2009
1 comment

Never seen before Google Server Error I've never seen a Server Error on Google before. I've seen errors before but they often indicate that the whole service is out for a brief moment. This time it feels like a bug that has caused it.

Don't get me wrong. I still thing Google search is the best Internet invention since e-mail.

Crosstips now has sparklines

April 29, 2009
0 comments Web development

Crosstips now has sparklines My crossword solving website Crosstips now has a cute little chart in the lower right hand corner. It's a sparkline. The line indicates how many searches have been done in the current month. The screenshot was taken on the 28th of April so it's the searches done in April and it's near the right hand side which is the maximum.

These charts are made with Google Chart which is something I've never had the great opportunity to try before.

Making the chart was quite a pleasure actually. I had it up and running within minutes just my looking at some examples. The lib I used to make it happen was pygooglechart which was, despite its lack of documentation, really easy to use.

How useful this sparkline is to the people who try to get unstuck on their crosswords I really don't know but it sure looks cool.

mailto: considered stupid, especially with ?subject

April 25, 2009
5 comments Web development

I don't have any stats to back this up but if I look around the office almost a lot of people use Gmail or Hotmail or something web based. My family uses Gmail, Yahoo mail and Hotmail (and me on Gmail) for example. So it bugs me when websites use the mailto: thing. Especially if they rely on the Subject line.

Here for example, on the EDF Energy Contact us page, they have a long list of "Email us" links. They're almost all going to mailto:customer_correspondence@edfenergy.com but all with a different subject line:


mailto:customer_correspondence@edfenergy.com?subject=Dual Fuel enquiry
mailto:customer_correspondence@edfenergy.com?subject=Dual Fuel sales enquiry
mailto:customer_correspondence@edfenergy.com?subject=Energy efficiency enquiry
mailto:customer_correspondence@edfenergy.com?subject=Priority Services enquiry
mailto:customer_correspondence@edfenergy.com?subject=Electricity prepayment enquiry
mailto:customer_correspondence@edfenergy.com?subject=Gas prepayment enquiry
mailto:customer_correspondence@edfenergy.com?subject=Home movers enquiry
mailto:customer_correspondence@edfenergy.com?subject=Green Tariff enquiry
mailto:customer_correspondence@edfenergy.com?subject=Meter Reading enquiry
mailto:customer_correspondence@edfenergy.com?subject=Bill payment enquiry
mailto:myaccount@edfenergy.com?subject=MyAccount query
...

Does that mean that I have to somehow copy the Subject line from each so that my email gets routed to the right department? I just don't know. Why can't they have different email address for each thing or a web form where I can email them there and then?

Using mailto: should be done very sparingly. Considering that most people (like my mom) don't know to right-click and select "Copy email address" I prefer this way to show an email address:


<a href="mailto:more@userfriendly.com">more@userfriendly.com</a>

Git + Twitter = Friedcode

April 22, 2009
10 comments Python, Linux

Git + Twitter = Friedcode I've now written my first Git hook. For the people who don't know what Git is you have either lived under a rock for the past few years or your not into computer programming at all.

The hook is a post-commit hook and what it does is that it sends the last commit message up to a twitter account I called "friedcode". I guess it's not entirely useful but for you who want to be loud about your work and the progress you make I guess it can make sense. Or if you're a team and you want to get a brief overview of what your team mates are up to. For me, it was mostly an experiment to try Git hooks and pytwitter. Here's how I did it:

Truncated! Read the rest by clicking the link below.