Filtered by Python

Page 22

Reset

split_search() - A Python functional for advanced search applications

May 15, 2008
0 comments Python

Inspired by Google's way of working I today put together a little script in Python for splitting a search. The idea is that you can search by entering certain keywords followed by a colon like this:


Free Text name:Peter age: 28

And this will be converted into two parts:


'Free Text'
{'name': 'Peter', 'age':'28}

You can configure which keywords should be recognized and to make things simple, you can basically set this to be the columns you have to do advanced search on in your application. For example (from_date,to_date)

Feel free to download and use it as much as you like. You might not agree completely with it's purpose and design so you're allowed to change it as you please.

Here's how to use it:


$ wget https://www.peterbe.com/plog/split_search/split_search.py
$ python
>>> from split_search import split_search
>>> free_text, parameters = split_search('Foo key1:bar', ('key1',))
>>> free_text
'Foo'
>>> parameters
{'key1': 'bar'}

UPDATE

Version 1.3 fixes a bug when all you've entered is one of the keywords.

See you at PyCon 2008

March 11, 2008
0 comments Python

I'm going to Chicago on Wednesday for the PyCon 2008 conference. I'm going to stay at the Crowne Plaza (or whatever it was called) like many of the other people at the conference.

This is what I look like:

See you at PyCon 2008

If you see this mug, go up to it and say Hi. It speaks British, Swedish and some American and loves food, beer and tea which might be helpful to know if you would feel like to talk more to it. Its interests for this conference are: Grok, Zope, Django, Plone, buildout, automated testing, agile development and Javascript. Its main claim-to-fame is an Open Source bug/issue tracker program called IssueTrackerProduct which it is more than delighted to talk about.

I've never been to Chicago before and I'm really excited about Tuesday night as I've bought tickets to a Chicago Bulls NBA game (basketball). All other nights I'm hoping to socialise, get drunk, get full and get down and dirty nerdy all week. See you there!

CommandLineApp by Doug Hellmann

February 22, 2008
0 comments Python

I just read the feature article "Command line programs are classes, too!" by Doug Hellmann in the January 2008 issue of Python Magazine about his program CommandLineApp and I've tried it out on one of my old Python programs where I do the opt parsing manually with getopt. The results are beautiful and quick. It's sprinkled with Doug specific magic but I quickly got over that when I saw out easy it was to work with. There are still a few questions of things I didn't manage to work out but that will unfortunately have to wait.

If anything, the worst thing about this library is that it's not part of the standard library so either you have to tell people to sudo easy_install CommandLineApp in the instructions or include it yourself in your packages if you prefer to ship things with a kitchen sink included.

If you want to check it out in action, either subscribe to the magazine (and support the effort) or just download csvcat

String comparison function in Python (alpha)

December 22, 2007
7 comments Python

I was working on a unittest which when it failed would say "this string != that string" and because some of these strings were very long (output of a HTML lib I wrote which spits out snippets of HTML code) it became hard to spot how they were different. So I decided to override the usual self.assertEqual(str1, str2) in Python's unittest class instance with this little baby:


def assertEqualLongString(a, b):
   NOT, POINT = '-', '*'
   if a != b:
       print a
       o = ''
       for i, e in enumerate(a):
           try:
               if e != b[i]:
                   o += POINT
               else:
                   o += NOT
           except IndexError:
               o += '*'

       o += NOT * (len(a)-len(o))
       if len(b) > len(a):
           o += POINT* (len(b)-len(a))

       print o
       print b

       raise AssertionError, '(see string comparison above)'

It's far from perfect and doesn't really work when you've got Unicode characters that the terminal you use can't print properly. It might not look great on strings that are really really long but I'm sure that's something that can be solved too. After all, this is just a quick hack that helped me spot that the difference between one snippet and another was that one produced <br/> and the other produced <br />. Below are some examples of this utility function in action.

Truncated! Read the rest by clicking the link below.

Calculator in Python for dummies

December 17, 2007
17 comments Python

I need a mini calculator in my web app so that people can enter basic mathematical expressions instead of having to work it out themselfs and then enter the result in the input box. I want them to be able to enter "3*2" or "110/3" without having to do the math first. I want this to work like a pocket calculator such that 110/3 returns a 36.6666666667 and not 36 like pure Python arithmetic would. Here's the solution which works but works like Python:


def safe_eval(expr, symbols={}):
   return eval(expr, dict(__builtins__=None), symbols)

def calc(expr):
   return safe_eval(expr, vars(math))

assert calc('3*2')==6
assert calc('12.12 + 3.75 - 10*0.5')==10.87
assert calc('110/3')==36

Truncated! Read the rest by clicking the link below.

WSSE Authentication and Apache

December 13, 2007
1 comment Python

I recently wrote a Grok application that implements a REST API for Atom Publishing so that I can connect a website I have via my new Nokia phone has LifeBlog which uses the Atom API to talk to the server.

Anyway, the authentication on Atom is WSSE (good introduction article) which basically works like this:


PasswordDigest = Base64 \ (SHA1 (Nonce + CreationTimestamp + Password))

This is one of the pieces in a request header called Authorization which can look something like this:


Authorization: WSSE profile="UsernameToken"
X-WSSE: UsernameToken Username="bob", PasswordDigest="quR/EWLAV4xLf9Zqyw4pDmfV9OY=", 
Nonce="d36e316282959a9ed4c89851497a717f", Created="2003-12-15T14:43:07Z"

What I did was I wrote a simple Python script to mimic what the Nokia does but from a script. The script creates a password digest using these python modules: sha, binascii and base64 and then fires off a POST request. Here's thing, if you generate this header with base64.encodestring(ascii_string) you get something like this:


quR/EWLAV4xLf9Zqyw4pDmfV9OY=\n

Notice the extra newline character at the end of the base64 encoded string. This is perfectly valid and is decoded easily with base64.decodestring(base64_string) by the Grok app. Everything was working fine when I tried posting to http://localhost:8080/++rest++atompub/snapatom and my application successfully authenticated the dummy user. I was happy.

Then I set this up properly on atom.someotherdomain.com which was managed by Apache who internally rewrote the URL to a Grok on localhost:8080. The problem now was that the Authentication header value was broken into two lines because of the newline character and then the whole request was rejected by Apache because some header values came without a : semi-colon.

The solution was to not use base64.encodestring() and base64.decodestring() but to instead use base64.urlsafe_b64encode() and base64.urlsafe_b64decode(). Let me show you:


>>> import base64
>>> x = 'Peter'
>>> base64.encodestring(x)
'UGV0ZXI=\n'
>>> base64.urlsafe_b64encode(x)
'UGV0ZXI='
>>> base64.decodestring(base64.urlsafe_b64encode(x))
'Peter'

If you're still reading, then hopefully you won't make the same mistake as I did and wasting time on trying to debug Apache. The lesson learned from this is to use the URL safe base64 header values and not the usual ones.

geopy distance calculation pitfall

December 10, 2007
1 comment Python

Geopy is a great little Python library for working with geocoding and distances using various online services such as Google's geocoder API.

Today I spent nearly half an hour trying to debug what was going on with my web application since I was getting this strange error:


AttributeError: 'VincentyDistance' object has no attribute '_kilometers'

Truncated! Read the rest by clicking the link below.

Spellcorrector 0.2

September 24, 2007
3 comments Python

Unlike previous incarnations of Spellcorrector not it does not by default load the two huge language files for English and Swedish. Alternatively/additionally you can load your own language file. The difference between loading a language file and training on your own words is that trained words are always assumed to be correct.

Another major change with this release is that a pickle file is created once the language file or own training file has been parsed once. This works like a cache, if the original text file changes, the pickle file is recreated. The outcome of this is that the first time you create a Spellcorrector instance it takes a few seconds if the language files is large but on the second time it takes virtually no time at all.

Truncated! Read the rest by clicking the link below.

html2plaintext Python script to convert HTML emails to plain text

August 10, 2007
12 comments Python

From the doc string:


A very spartan attempt of a script that converts HTML to
plaintext.

The original use for this little script was when I send HTML emails out I also
wanted to send a plaintext version of the HTML email as multipart. Instead of 
having two methods for generating the text I decided to focus on the HTML part
first and foremost (considering that a large majority of people don't have a 
problem with HTML emails) and make the fallback (plaintext) created on the fly.

This little script takes a chunk of HTML and strips out everything except the
<body> (or an elemeny ID) and inside that chunk it makes certain conversions 
such as replacing all hyperlinks with footnotes where the URL is shown at the
bottom of the text instead. <strong>words</strong> are converted to *words* 
and it does a fair attempt of getting the linebreaks right.

As a last resort, it strips away all other tags left that couldn't be gracefully
replaced with a plaintext equivalent.
Thanks for Fredrik Lundh's unescape() function things like:
   'Terms &amp;amp; Conditions' is converted to
   'Termss &amp; Conditions'

It's far from perfect but a good start. It works for me for now.

Version at the time of writing this: 0.1.

I wouldn't be surprised if I've reinvented the wheel here but I did plenty of searches and couldn't really find anything like this.

Let's run this for a while until I stumble across some bugs or other inconsistencies which I haven't quite done yet. The one thing I'm really unhappy about is the way I extract the body from the BeautifulSoup parse object. I really couldn't find another better way in the few minutes I had to spare on this.

Feel free to comment on things you think are pressing bugs.

You can download the script here html2plaintext.py version 0.1

UPDATE

I should take a second look at Aaron Swartz's html2text.py script the next time I work on this. His script seems a lot more mature and Aaron is brilliant Python developer.