rfc822() vs. rfc1123_date()

August 16, 2007
0 comments Zope

To set the Expires header in my web application I used to use the Zope DateTime function rfc822() which doesn't return the date in GMT format. Here's how I used it:


>>> from DateTime import DateTime
>>> hours = 5
>>> then = DateTime() + hours/24.0
>>> then.rfc822()
'Thu, 16 Aug 2007 20:43:59 +0100'

Then I found out (from using YSlow) that it's better to use the GMT format (RFC 1123), and here's how to do that in Zope:


>>> from App.Common import rfc1123_date
>>> from time import time
>>> rfc1123_date(time() + 3600*hours)
'Thu, 16 Aug 2007 19:45:12 GMT'

(notice that even though my locale is here in London, because of the summer time an hour is added)

Truncated! Read the rest by clicking the link below.

html2plaintext Python script to convert HTML emails to plain text

August 10, 2007
12 comments Python

From the doc string:


A very spartan attempt of a script that converts HTML to
plaintext.

The original use for this little script was when I send HTML emails out I also
wanted to send a plaintext version of the HTML email as multipart. Instead of 
having two methods for generating the text I decided to focus on the HTML part
first and foremost (considering that a large majority of people don't have a 
problem with HTML emails) and make the fallback (plaintext) created on the fly.

This little script takes a chunk of HTML and strips out everything except the
<body> (or an elemeny ID) and inside that chunk it makes certain conversions 
such as replacing all hyperlinks with footnotes where the URL is shown at the
bottom of the text instead. <strong>words</strong> are converted to *words* 
and it does a fair attempt of getting the linebreaks right.

As a last resort, it strips away all other tags left that couldn't be gracefully
replaced with a plaintext equivalent.
Thanks for Fredrik Lundh's unescape() function things like:
   'Terms &amp;amp; Conditions' is converted to
   'Termss &amp; Conditions'

It's far from perfect but a good start. It works for me for now.

Version at the time of writing this: 0.1.

I wouldn't be surprised if I've reinvented the wheel here but I did plenty of searches and couldn't really find anything like this.

Let's run this for a while until I stumble across some bugs or other inconsistencies which I haven't quite done yet. The one thing I'm really unhappy about is the way I extract the body from the BeautifulSoup parse object. I really couldn't find another better way in the few minutes I had to spare on this.

Feel free to comment on things you think are pressing bugs.

You can download the script here html2plaintext.py version 0.1

UPDATE

I should take a second look at Aaron Swartz's html2text.py script the next time I work on this. His script seems a lot more mature and Aaron is brilliant Python developer.

YSlow grade A (96) but not with doubts

August 6, 2007
0 comments Web development

YSlow grade A (96) but not with doubts If you're a web developer and care about having snappy web sites you'll know about YSlow for Firebug. I managed to get a grade A (96) but I'm suspecting that there's a bug in the YSlow analysis.

Setting an Expires header is inferior to using Cache-Control which my site was already using fine with headers like:


Cache-Control: public,max-age=3600

according to the latest documentation but YSlow kept going on about setting Expires headers. I prefer Cache-Control since you don't have to do any date formatting which eats a few excess CPU cycles. If anybody knows why it's a good idea to use both Cache-Control and Expires let me know.

Truncated! Read the rest by clicking the link below.

Interesting lesson learnt on shortcut taking in usability

August 2, 2007
3 comments Plone

In Plone 2.1 (don't know about other versions) the default date input used by the CalendarWidget uses 5 dropdowns (year, month, day, hour, minute) plus a little popup calendar which you can use to avoid flipping the dropdowns. When the underlying value is not set the dropdowns look like this:


2007/--/-- [#] --:--

where the little [#] is the popup calendar widget. The year 2007 is preselected because they assume that most of the cases you'll just be selecting a month and a day since is highly likely that the year will be that of today's year. So far so good.

Truncated! Read the rest by clicking the link below.

XML header and childNodes

July 26, 2007
0 comments Web development

I discovered something really odd today that maybe a seasoned AJAX guru already knew as a legendary bug which might even have a name. I was developing a little AJAX method on the server side that returned this:


<?xml version="1.0"?>
<sections>
  <section>
    <number>001</number>
    <title>PLug 1</title>
  </section>
  <section>
    <number>003</number>
    <title>PLug 3 xyz</title>
  </section>
</sections>

Note: The Content-Type used was "text/xml"

I used jQuery to kick off the AJAX call and then I loop over the document element returned with childNodes almost like this:


children = data.childNodes[0].childNodes;
for (var i=0, len=children.length; i<len; i++)
  // bla bla

It was working fine in Firefox and of course not in IE 6.0.

Truncated! Read the rest by clicking the link below.

How did Google do that?

July 14, 2007
6 comments Web development

How did Google do that? If you search Google for 'yogurt' www.dannon.com comes up second in the search results. Neither the title nor the URL contains any reference to the word yogurt. The word "Yogurt" is the 37st word of all (55) non-HTML English words that appear in the crappy table-inside-tables-nested piece of crap source code.

A SEO expert would immediately count Dannon.com as doomed on the search term Yogurt but clearly Google had other plans. According to my Google toolbar the Dannon website has a measly 5 out of 10 PageRank™ only, so that's not the explaination either.

So how did Google do that?

Truncated! Read the rest by clicking the link below.

Worst gigolo sales pitch ever

July 11, 2007
0 comments

If I wanted to become a male gigolo I would probably do two things differently from this guy: 1) I wouldn't contact someone for help in a different continent and 2) I wouldn't get a Yahoo! account called "always alone":


Date: Mon, 9 Jul 2007 11:19:26 -0700 (PDT)
From: always alone <always_alone7@yahoo.com>
Subject: become gigolo
To: ...@peterbe.com

hi i am rohit  from india i want be a gigolo plz help me out 
 thank you 
 my num +91009312337329

---------------------------------
Building a website is a piece of cake. 
Yahoo! Small Business gives you all the tools to get online.

UPDATE

Writing about idiots writing to me spawned more idiots writing just a day after posting this. Here's one of them:


Date: Fri, 13 Jul 2007 16:44:00 +0100
From: kapoor_jack@....com
To: ...@peterbe.com

hi we  r two frnds we wanna be gigolos coz we need money give 
me any contact no. we r handsome guys  tall and expiernced as well

I am not a pimp!

I'm not a hacker

July 8, 2007
2 comments This site

Every week I get an email via this website from someone who wants me to help them hack something. I've written things about the subject "hacking" but that doesn't make me a hacker. I'm not a hacker. Here's this week's nutter email I got:

"Hai, im suraj from India.
Actually i want to b a computer expert.There must b nothin wth the coputer tht i cant do.so i think it can b done only wth a hacker.So can u plz help me wth this.pPleae tel me wht i hav to learn.
Than Q.
"

Does that make any sense?

Mac OS X's python binary icons

July 6, 2007
6 comments macOS

Mac OS X's python binary icons The mac os x icon for .pyc files is a document with a background of ones and zeros but the foreground is a 16 ton weight. WTF?! What's the 16 ton got to do with anything?

Perhaps I got it all wrong. Maybe this is the icon used for many different files but I had a look around and couldn't find any other file that uses the 16 ton image in the icon.

iPhones review on WSJ

June 30, 2007
0 comments Misc. links

iPhones review on WSJ A lot of news sites have started publishing reviews of the new iPhone by Apple. The review I chose was the Wall Street Journal column by Walter Mossberg. It's a great review from a user experience point of view too. You start by watching the video clip which is nice and then if you're interested in more details you move on to actually read the article.

One thing I appreciated about this review was that I was able to skim the article first whilst waiting for the video to first show the ad and then do the buffering. Very practical.

Oh yeah, the phone looks pretty exciting too. Shame about the Edge and being totally stuck on ATT. Don't know what that ATT deal means in terms of getting an iPhone here in Europe.