Local jed settings

April 19, 2013
4 comments Linux, macOS

(if you're wondering what you're doing here, jed is a hardcore text based editor for programmers)

Thanks to fellow Jed user and hacker Ullrich Horlacher I can now have local settings per directory.

I personally prefer 2 spaces in my Javascript. And thankfully most projects I work on agrees with that standard. However, I have one Mozilla project I work on which uses 4 spaces for indentation. So, what I've had to get used to to is to edit my ~/.jedrc every time I switch to work on that particular project. I change: variable C_INDENT = 2; to variable C_INDENT = 4; and then back again when switching to another project.

No more of that. Now I just add a file into the project root like this:

$ cd dev/airmozilla
$ cat .jed.sl
variable C_INDENT = 4;

And whenever I work on any file in that tree it applies the local override setting.

Here's how you can do that too:

First, put this code into your <your jed lib>/defaults.sl: (on my OSX, the jed lib is /usr/local/Cellar/jed/0.99-19/jed/lib/)

% load .jed.sl from current or parent directories
% but only if the user is the same
define load_local_config() {
  variable dir = getcwd();
  variable uid = getuid;
  variable jsl,st;
  while (dir != "/" and strlen(dir) > 1) {
    st = stat_file(dir);
    if (st == NULL) return;
    if (st.st_uid != uid) return;
    jsl = dir + "/.jed.sl";
    st = stat_file(jsl);
    if (st != NULL) {
      if (st.st_uid == uid) {
        pop(evalfile(jsl));
        return;
      }
    }
    dir = path_dirname(dir);
  }
}

Then add this to the bottom of your ~/.jedrc:

define startup_hook() {
  load_local_config(); % .jed.sl
}

Now, go into a directory where you want to make local settings, create a file called .jed.sl and fill it to your hearts content!

Careful with your assertRaises() and inheritance of exceptions

April 10, 2013
10 comments Python

This took me by surprise today!

If you run this unit test, it actually passes with flying colors:


import unittest


class BadAssError(TypeError):
    pass


def foo():
    raise BadAssError("d'oh")


class Test(unittest.TestCase):

    def test(self):
        self.assertRaises(BadAssError, foo)
        self.assertRaises(TypeError, foo)
        self.assertRaises(Exception, foo)


if __name__ == '__main__':
    unittest.main()

Basically, assertRaises doesn't just take the exception that is being raised and accepts it, it also takes any of the raised exceptions' parents.

I've only tested it with Python 2.6 and 2.7. And the same works equally with unittest2.

I don't really know how I feel about this. It did surprise me when I was changing one of the exceptions and expected the old tests to break but they didn't. I mean, if I want to write a test that really makes sure the exception really is BadAssError it means I can't use assertRaises().

Recruiters: if you're going to lie, do it properly

April 7, 2013
9 comments Work, Web development

Being a recruiter is hard work. A lot of pressure and having to deal with people's egos. Although I have no plans to leave Mozilla any time soon, it's still some sort of value in seeing that my skills are sought after in the industry. That's why I haven't yet completely cancelled my LinkedIn membership.

When I get automated emails from bots that just scrape LinkedIn I don't bother. Sometimes I get emails from recruiters who have actually studied my profile (my blog, my projects, my github, etc) and then I do take the time to reply and say "Hi Name! Thank you for reaching out. It looks really exciting but it's not for me at the moment. Keep up the good work!"

Then there's this new trend where people appear to try to automate what the bots do by doing it manually but without actually reading anything. I understand that recruiters are under a lot of pressure to deliver and try to reach out to as many potential candidates as possible but my advice is: if you're going to do, do it properly. You'll reach fewer candidates but it'll mean so much more.

I got this email the other day about a job offer at LinkedIn:
Shaming a stressed out recruiter from LinkedIn

  • I have a Swedish background. Not "Sweetish". And what difference does that make?
  • I haven't worked on "FriedZopeBase" (which is on my github) for several years
  • I haven't worked on "IssueTrackerProduct" for several years
  • Let's not "review [my] current employment". That's for me to think about.

So what can we learn from this? Well, for starters if you're going pretend to have taken time, do it properly! If you don't have time to do in-depth research on a candidate, then don't pretend that you have.

I got another recruiter emailing me personally yesterday and it was short and sweet. No mention of free lunch or other superficial trappings. The only personal thing about it was that it had my first name. I actually bothered to reply to them and thank them for reaching out.

"Did you mean this domain?" Auto-correction for the browser's address bar

April 5, 2013
4 comments Mozilla

People rarely type in long URLs. Therefore it's unlikely that one little typo in that long URL is the the deciding factor whether you get a 200 Found or a 404 Not Found.

However, what people often do is type in a domain name and hit enter. Sometimes they fumble and miss a character or accidentally add an additional one and ultimately land on this error:

One little typo and it looks like your Internet is down

Another thing I often do is I type the start of the domain name and fumble with the Awesome Bar and accidentally try to reach just the start of the domain. Like www.mozill for example.

The browser should in these cases be able to recognize the mistake and offer a nice "Did you mean this domain?" button or something that makes it one click to correct the innocent fumble.

How it could do this would be quite simple. It could record every domain you've visited based on your history. Then it could compute a an Edit distance and if it finds exactly one suggestion, offer it.

Here's how you can use an Edit distance algorithm:


>>> from edit_distance import EditDistance
>>> ed = EditDistance(('www.peterbe.com', 'www.mozilla.org', 'news.ycombinator.com',
...                    'twitter.com', 'www.facebook.com', 'github.com'))
>>> ed.match('www.peterbe.cm')
[u'www.peterbe.com']
>>> ed.match('twittter.com')
['twitter.com']
>>> ed.match('www.faecbook.com')
['www.facebook.com']
>>> ed.match('github.comm')
['github.com']
>>> ed.match('neverheardof')
[]

Here's the implementation I used.

Of course, this functionality should only kick in in the most desperate of cases. Ie. the URL can't resolve to anything. If someone is clever enough to buy the domain name facebok.com they deserve their traffic. And equally, if you type something like ww.peterbe.com or wwww.peterbe.com I've already set that up redirect to www.peterbe.com.

Here's what it could look like instead:
Here's what the improved error page could look like

Never put external Javascript in the <head>

April 2, 2013
13 comments Web development

First of all, the title is perhaps misleading. Basically, don't put plain script tags that are not async in the head tag.

If you put a piece of javascript in the head of HTML page, the browser will start to download that and proceed down the lines of HTML and download other resources too as it encounters them such as the CSS files.

Then, when all javascript and CSS has been downloaded it will start rendering the page and when it does that it will download any images referenced in the HTML. At roughly the same time it will start to display things on the screen. But it won't do this until the CSS and Javascript has been downloaded.

To repeat: The browser screen will appear blank. It won't start downloading any images if downloading a javascript URL referenced in the head gets stuck.

Here are two perfectly good examples from this morning's routine hunt for news:

Wired.com is relying on some external resource to load (forever!) until anything is rendered.
Wired.com is guilty

getharvest.com depends on Rackspace's CDN before anything is displayed
getharvest.com is guilty

Here's what getharvest.com does in their HTML:


<!DOCTYPE html>
<html lang="en">

  <head>
    <script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script>
    <script type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>
  ...

Why it gets stuck on connecting to c761485.r85.cf2.rackcdn.com I just don't know. But it does. The Internet is like that oftentimes. You simply can't connect to otherwise perfectly configured web servers.

Update-whilst-writing-this-text! As I was writing this text I gave getharvest.com a second chance thinking that most likely the squirrels in my internet tubes will be back up and running to rackcdn.com but then [this happened!/static/cache/bd/02/bd02367be6bbe6d16444051619d88bee.jpg)

So, what's the right thing to do? Simple: don't rely on external resources. For example, can you move the Javascript script tag to the very very bottom of the HTML page. That way it will render as much as it possibly can whilst waiting for the Javascript resource to get unstuck. Or, almost equally you can keep the script tag in the <head> but then but in async attribute on it like this:


<script async type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>

Another thing you can do is not use an external resource URL (aka. third-party domain). Instead of using cdn.superfast.com/file.js you instead use /file.js. Sure, that fancy CDN might be faster at serving up stuff than your server but looking up a CDN's domain is costing one more DNS lookup which we know can be very expensive for that first-time impression.

I know I'm probably guilty of this new on some of my (now) older projects. For example, if you open aroundtheworldgame.com it won't render anything until it has managed to connect to maps.googleapis.com and dn4avfivo8r6q.cloudfront.net but that's more of an app rather than a web site.

By the way... I wrote some basic code to play around with how this actually works. I decided to put this up in case you want to experiment with it too: https://github.com/peterbe/slowpage

premailer now honours specificity

March 21, 2013
0 comments Python

Thanks to Theo Spears awesome effort premailer now support CSS specificity. What that means is that when linked and inline CSS blocks are transformed into tag style attributes the order is preserved as you'd expect.

When the browser applies CSS to elements it does it in a specific order. For example if you have this CSS:


p.tag { color: blue; }
p { color: red; }

and this HTML:


<p>Regular text</p>
<p class="tag">Special text</p>

the browser knows to draw the first paragraph tag in red and the second paragraph in red. It does that because p.tag is more specific that p.

Before, it would just blindly take each selector and set it as a style tag like this:


<p style="color:red">Regular text</p>
<p style="color:blue; color:red" class="tag">Special text</p>

which is not what you want.

The code in action is here.

Thanks Theo!

HTML whitespace "compression" - don't bother!

March 11, 2013
4 comments Web development

This morning I came across this site on Hacker News. It's a cute site with some basic tips on how to make your sites faster.

It's very much a for-beginners document as all the tips are quite basic. For example it doesn't even mention the use of CDNs.

One tip in particular stood out to me: "it can be useful to minify your HTML with automated tools."
And it links to the htmlcompressor project. Ignore this advice.

What matters 10 times more is Gzip compression. This is usually very easy to set up with Nginx or Apache. It's not something you do in your web framework and if you don't have a web framework, you don't need to manully Gzip HTML files on the filesystem.

For example, downloading the home page here on my blog, at the time of writing, this is: 66,770 bytes big. Hefty, sure, but with all excess whitespace removed it reduces down to 59,356 bytes. But that really doesn't matter when you Gzip.

Gzipped from original version: 18,470 bytes
Gzipped from whitespace trimmed version: 18,086 bytes

The gain is 2% which is definitely not worth the hassle of adding a whitespace compressor.

django-fancy-cache with or without stats

March 11, 2013
1 comment Python, Django

If you use django-fancy-cache you can either run with stats or without. With stats, you can get a number of how many times a cache key "hits" and how many times it "misses". Keeping stats incurs a small performance slowdown. But how much?

I created a simple page that either keeps stats or ignores it. I ran the benchmark over Nginx and Gunicorn with 4 workers. The cache server is a memcached running on the same host (my OSX 10.7 laptop).

With stats:

Average: 768.6 requests/second
Median: 773.5 requests/second
Standard deviation: 14.0

Without stats:

Average: 808.4 requests/second
Median: 816.4 requests/second
Standard deviation: 30.0

That means, roughly that running with stats incurs a 6% slower performance.

The stats is completely useless to your users. The stats tool is purely for your own curiousity and something you can switch on and off easily.

Note: This benchmark assumes that the memcached server is running on the same host as the Nginx and the Gunicorn server. If there was more network in between, obviously all the .incr() commands would cause more performance slowdown.

This site is now 100% inline CSS and no bytes are wasted

March 5, 2013
8 comments Python, Web development, Django

This personal blog site of mine uses django-fancy-cache and mincss.

What that means is that I can cache the whole output of every blog post for weeks and when I do that I can first preprocess the HTML and convert every external CSS into one inline STYLE block which will only reference selectors that are actually used.

To see it in action, right-click and select "View Page Source". You'll see something like this:

/*
Stats about using github.com/peterbe/mincss
-------------------------------------------
Requests:         1 (now: 0)
Before:           81Kb
After:            11Kb
After (minified): 11Kb
Saving:           70Kb
*/
section{display:block}html{font-size:100%;-webkit-text-size-adjust:100%;-ms-tex...

The reason the saving is so huge, in my case, is because I'm using Twitter Bootstrap CSS framework which is awesome but as any framework, it will inevitably contain a bunch of stuff that I don't use. Some stuff I don't use on any page at all. Some stuff is used only on some pages and some other stuff is used only on some other pages.

What I gain by this, is faster page loads. What the browser does is that it, gets a URL, downloads all HTML, opens the HTML to look for referenced CSS (using the link tag) and downloads that too. Once all of that is downloaded, it starts to render the page. Approximately after that it starts to download all referenced Javascript and starts evaluating and executing that.

By not having to download the CSS the browser has one less thing to do. Only one request? Well, that request might be on a CDN (not a great idea actually) so even though it's just 1 request it will involve another DNS look-up.

Here's what the loading of the homepage looks like in Firefox from a US east coast IP.

Granted, a downloaded CSS file can be cached by the browser and used for other pages under the same domain. But, on my blog the bounce rate is about 90%. That doesn't necessarily mean that visitors leave as soon as they arrived, but it does mean that they generally just read one page and then leave. For those 10% of visitors who visit more than one page will have to download the same chunk of CSS more than once. But mind you, it's not always the same chunk of CSS because it's different for different pages. And the amount of CSS that is now in-line only adds about 2-3Kb on the HTML load when sent gzipped.

Getting to this point wasn't easy because I first had to develop mincss and django-fancy-cache and integrate it all. However, what this means is that you can have it done on your site too! All the code is Open Source and it's all Python and Django which are very popular tools.

Welcome to the world django-fancy-cache!

March 1, 2013
3 comments Python, Django

** A Django cache_page on steroids**

Django ships with an awesome view decorator called cache_page which is awesome. But a bit basic too.

What it does is that it stores the whole view response in memcache and the key to it is the URL it was called with including any query string. All you have to do is specify the length of the cache timeout and it just works.
Now, it's got some shortcomings which django-fancy-cache upgrades. These "steroids" are:

  1. Ability to override the key prefix with a callable.
  2. Ability to remember every URL that was cached so you can do invalidation by a URL pattern.
  3. Ability to modify the response before it's stored in the cache.
  4. Ability to ignore certain query string parameters that don't actually affect the view but does yield a different cache key.
  5. Ability to serve from cache but always do one last modification to the response.
  6. Incrementing counter of every hit and miss to satisfy your statistical curiosity needs.

The documentation is here:
https://django-fancy-cache.readthedocs.org/

You can see it in a real world implementation by seeing how it's used on my blog here. You basically use it like this::


from fancy_cache import cache_page

@cache_page(60 * 60)
def myview(request):
    ...
    return render(request, 'template.html', stuff)

What I'm doing with it here on my blog is that I make the full use of caching on each blog post but as soon as a new comment is posted, I wipe the cache by basically creating a new key prefix. That means that pages are never cache stale but the views never have to generate the same content more than once.

I'm also using django-fancy-cache to do some optimizations on the output before it's stored in cache.