Filtered by Python

Page 25

Reset

Using MD5 to check equality between files

October 28, 2005
11 comments Python

To some Python users this is old-school old-news stuff but since I've never used it before I found it worth mentioning.

I have a script that scans a rather large tree of folders filled with files. None of the folders have the same name but they can mistakably contain the same files eg:


folder XYZ-2005-11-27/
   email1.bin
   email2.bin
folder CBA-2005-07-10/
   email1.bin
   email2.bin

Sometimes two different folders contain the same file names exactly. Sometimes, the file sizes as equal too. But in some of those cases, even though the file sizes and names are the same they are different files. But! If they are the same files just in different locations I want to find them. How to do that?

Truncated! Read the rest by clicking the link below.

"Increment numbers in a string"

October 20, 2005
2 comments Python

I've just uploaded my second Python Cookbook recipe. It's unfortunately not rocket science but it's application is potentially very useful. With this little function you can generate the next number in a string that contains at least one number.

The mini unittest is quite interesting perhaps:


$ python increment_strings.py
from 10dsc_0010.jpg to 10dsc_0011.jpg
from dsc_9.jpg to dsc_10.jpg
from 0000001.exe to 0000002.exe
from ref-04851 to ref-04852

Playing with Reverend Bayesian

October 19, 2005
0 comments Python

I've been playing around with Reverend a bit on getting it to correctly guess appropriate "Sections" for issues on the Real issuetracker. What I did was that I downloaded all 140 issuetexts and their "Sections" attribute which is a list (that is often of length 1). From list dataset I did a loop over each text and the sections within it (skipped the default section General) so something like this:


data = ({'sections':['General','Installation'], 
         'text':"bla bla bla..."}
        {'sections':['Filter functions'], 
         'text':"Lorem ipsum foo bar..."}
        ...)
for item in data:
    secs = [each for each item['sections'] if each != 'General']
    for section in secs:
        guesser.train(section, item['text'])

Truncated! Read the rest by clicking the link below.

Dream: python bindings for squidclient

October 11, 2005
3 comments Python, This site

At the moment I'm not running Squid for this site but if experimentation time permits I'll have it running again soon. One thing I feel uneasy about is how to "manually" purge cached pages that needs to be updated. For example, if you read this page (and it's cached for one hour) and post a comment, then I'd like to re-cache this page with a purge. Setting a HTTP header could be something but that I would only be able to do on the page where you have this in the URL:


?msg=Comment+added

which, because of the presence of a querystring, is not necessarily cached anyway. The effect is that as soon as the "?msg=Comment+added" is removed from the URL, the viewer will see the page as it was before she posted her comment. squidclient might be the solution. ...sort of.

Truncated! Read the rest by clicking the link below.

Ruby and Python benchmarked

September 25, 2005
27 comments Python

Some Ruby (the programming language) blogger has tried to implement the Edit-Distance algorithm in three different ways in both Python and in Ruby. His conclusion is more steered towards the difference between the algorithms and not so much the language. I guess that's fair because the implementation might be done better in Python than it was in Ruby. But, please notice that this is a Ruby coder and yet his Python implementations are always faster.

A quick glance tells me that the Python programs run about 3 times as fast as the Ruby ones. See the benchmarks

Smurl from Python

September 22, 2005
1 comment Python

If you thought the Web Service example on the about page of Smurl.name was complicated, here's a much simpler version. I use this code on my very own site for the email notifications which will contain long URLs.

Feel free to steal this code into your own projects:


from urllib import urlopen, quote

# variable 'url' is defined elsewhere
if len(url) > 80:
    url = urlopen('http://smurl.name/createSmurl?url=%s' % quote(url)).read()

Was that hard?

Python regular expression tester

September 19, 2005
1 comment Python

retest I've just discovered retest by Christof Hoeke which is a developers tool for testing and experimenting with regular expressions. It doesn't have a GUI so it uses SimpleHTTPServer to serve a web interface on http://localhost:8087 that uses AJAX to make the interface snappier. You use this if you feel uncertain how to write your regular expression syntax and need a helpful sandbox for playing in.

This is cool because as an application it's very modern. The source code is only 100 lines python code, some javascript code for the AJAX and a relatively simple HTML page. A genuine GUI app would be considerably much more code but would admittedly run faster. However, considering how "basic" this application is, speed is not an issue.

Truncated! Read the rest by clicking the link below.

Random ID generator for Zope

September 2, 2005
0 comments Python

I was working on a little application that is similar to tinyurl.com (where any URL gets a unique random id that can be used to redirect with long clumsy URL strings) and came up with this little algorithm that I wanted to share with the world for some feedback.

There are two loops, one nested inside one big loop that goes on for infinity. The inner loop creates a list of possible combinations from a sample of letters and numbers eg (abc, acb, bac, bca, cab, cba). For each such found combinations it does a check to see if this has been saved before and if not it exists the loop like this:


if not hasattr(self, combination):
    return combination

Truncated! Read the rest by clicking the link below.

\B in Python regular expressions

July 23, 2005
0 comments Python

Today I learnt about how to use the \B gadget in Python regular expressions. I've previously talked about the usefulness of \b but there's a big benefit to using \B sometimes too.

What \b does is that it is a word-boundary for alphanumerics. It allows you to find "peter" in "peter bengtsson" but not "peter" in "nickname: peterbe". In other words, all the letters have to be grouped prefixed or suffixed by a wordboundry such as newline, start-of-line, end-of-line or a non alpha character like (.

Truncated! Read the rest by clicking the link below.

SmartDict - a smart 'dict' wrapper

July 14, 2005
9 comments Python

For a work project we needed a convenient way to wrap our SQL recordset, instance objects and dictionary variables to share the same interface. The result is SmartDict which makes it possible to assert that access can be made in any which way you want; something that is very useful when you write templates and don't want to have to know if what you're working with is a dict or a recordset.

This doesn't work:


>>> d= {'name':"Peter"}
>>> print d.get('name') # fine
>>> print d.name # Error!!!

Likewise with some instance objects or record sets, this doesn't work:


>>> d = getRecordsetObject()
>>> print d.name # fine
>>> print d.get('name') # Error!!!

Truncated! Read the rest by clicking the link below.