Peterbe.com

Peter Bengtsson's blog

Page 40

Go vs. Python

October 24, 2014
42 comments Python, Go

tl;dr; It's not a competition! I'm just comparing Go and Python. So I can learn Go.

So recently I've been trying to learn Go. It's a modern programming language that started at Google but has very little to do with Google except that some of its core contributors are staff at Google.

The true strength of Go is that it's succinct and minimalistic and fast. It's not a scripting language like Python or Ruby but lots of people write scripts with it. It's growing in popularity with systems people but web developers like me have started to pay attention too.

The best way to learn a language is to do something with it. Build something. However, I don't disagree with that but I just felt I needed to cover the basics first and instead of taking notes I decided to learn by comparing it to something I know well, Python. I did this a zillion years ago when I tried to learn ZPT by comparing it DTML which I already knew well.

My free time is very limited so I'm taking things by small careful baby steps. I read through An Introduction to Programming in Go by Caleb Doxey in a couple of afternoons and then I decided to spend a couple of minutes every day with each chapter and implement something from that book and compare it to how you'd do it in Python.

I also added some slightly more full examples, Markdownserver which was fun because it showed that a simple Go HTTP server that does something can be 10 times faster than the Python equivalent.

What I've learned

Go is very unforgiving but I kinda like it. It's like Python but with pyflakes switched on all the time.
Go is much more verbose than Python. It just takes so much more lines to say the same thing.
Goroutines are awesome. They're a million times easier to grok than Python's myriad of similar solutions.
In Python, the ability to write to a list and it automatically expanding at will is awesome.
Go doesn't have the concept of "truthy" which I already miss. I.e. in Python you can convert a list type to boolean and the language does this automatically by checking if the length of the list is 0.
Go gives you very few choices (e.g. there's only one type of loop and it's the for loop) but you often have a choice to pass a copy of an object or to pass a pointer. Those are different things but sometimes I feel like the computer could/should figure it out for me.
I love the little defer thing which means I can put "things to do when you're done" right underneath the thing I'm doing. In Python you get these try: ...20 lines... finally: ...now it's over... things.
The coding style rules are very different but in Go it's a no brainer because you basically don't have any choices. I like that. You just have to remember to use gofmt.
Everything about Go and Go tools follow the strict UNIX pattern to not output anything unless things go bad. I like that.
godoc.org is awesome. If you ever wonder how a built in package works you can just type it in after godoc.org like this godoc.org/math for example.
You don't have to compile your Go code to run it. You can simply type go run mycode.go it automatically compiles it and then runs it. And it's super fast.
go get can take a url like github.com/russross/blackfriday and just install it. No PyPI equivalent. But it scares me to depend on peoples master branches in GitHub. What if master is very different when I go get something locally compared to when I run go get weeks/months later on the server?

UPDATE

Here's a similar project comparing Python vs. JavaScript by Ilya V. Schurov

localForage vs. XHR

October 22, 2014
9 comments JavaScript

tl;dr; Fetching from IndexedDB is about 5-15 times faster than fetching from AJAX.

localForage is a wrapper for the browser that makes it easy to work with any local storage in the browser. Different browsers have different implementations. By default, when you use localForage in Firefox is that it used IndexedDB which is asynchronous by default meaning your script don't get blocked whilst waiting for data to be retrieved.

A good pattern for a "fat client" (lots of javascript, server primarly speaks JSON) is to download some data, by AJAX using JSON and then store that in the browser. Next time you load the page, you first read from the local storage in the browser whilst you wait for a fresh new JSON from the server. That way you can present data to the screen sooner. (This is how Buggy works, blogged about it here)

Another similar pattern is that you load everything by AJAX from the server, present it and store it in the local storage. Then you perdiocally (or just on onload) you send the most recent timestamp from the data you've received and the server gives you back everything new and everything that has changed by that timestamp. The advantage of this is that the payload is continuously small but the server has to make a custom response for each client whereas a big fat blob of JSON can be better cached and such. However, oftentimes the data is dependent on your credentials/cookie anyway so most possibly you can't do much caching.

Anyway, whichever pattern you attempt I thought it'd be interesting to get a feel for how much faster it is to retrieve from the browsers memory compared to doing a plain old AJAX GET request. After all, browsers have seriously optimized for AJAX requests these days so basically the only thing standing in your way is network latency.

So I wrote a little comparison script that tests this. It's here: https://www.peterbe.com/localforage-vs-xhr/index.html

It retrieves a 225Kb JSON blob from the server and measures how long that took to become an object. Equally it does the same with localforage.getItem and then it runs this 10 times and compares the times. It's obviously not a surprise the local storage retrieval is faster, what's interesting is the difference in general.

What do you think? I'm sure both sides can be optimized but at this level it feels quite realistic scenarios.

django-html-validator

October 20, 2014
2 comments Python, Web development, Django

A couple of weeks ago we had accidentally broken our production server (for a particular report) because of broken HTML. It was an unclosed tag which rendered everything after that tag to just plain white. Our comprehensive test suite failed to notice it because it didn't look at details like that. And when it was tested manually we simply missed the conditional situation when it was caused. Neither good excuses. So it got me thinking how can we incorporate HTML (html5 in particular) validation into our test suite.

So I wrote a little gist and used it a bit on a couple of projects and was quite pleased with the results. But I thought this might be something worthwhile to keep around for future projects or for other people who can't just copy-n-paste a gist.

With that in mind I put together a little package with a README and a setup.py and now you can use it too.

There are however some caveats. Especially if you intend to run it as part of your test suite.

Caveat number 1

You can't flood htmlvalidator.nu. Well, you can I guess. It would be really evil of you and kittens will die. If you have a test suite that does things like response = self.client.get(reverse('myapp:myview')) and there are many tests you might be causing an obscene amount of HTTP traffic to them. Which brings us on to...

Caveat number 2

The htmlvalidator.nu site is written in Java and it's open source. You can basically download their validator and point django-html-validator to it locally. Basically the way it works is java -jar vnu.jar myfile.html. However, it's slow. Like really slow. It takes about 2 seconds to run just one modest HTML file. So, you need to be patient.

Premailer on Python 3

October 8, 2014
1 comment Python

Premailer is probably my most successful open source project in recent years. I base that on the fact that 25 different people have committed to it.

Today I merged a monster PR by Michael Jason Smith of OnlineGroups.net.

What it does is basically that it makes premailer work in Python 3, PyPy and Python 2.6. Check out the tox.ini file. Test coverage is still 100%.

If you look at the patch the core of the change is actually surprisingly little. The majority of the "secret sauce" is basically a bunch of import statements which are split by if sys.version_info >= (3, ): and some various minor changes around encoding UTF-8. The rest of the changes are basically test sit-ups.

A really interesting thing that hit us was that the code had assumptions about the order of things. Basically the tests assumed the the order of certain things in the resulting output was predictable even though it was done using a dict. dicts are famously unreliable in terms of the order you get things out and it's meant to be like that and it's a design choice. The reason it worked till now is not only luck but quite amazing.

Anyway, check it out. Now that we have a tox.ini file it should become much easier to run tests which I hope means patches will be better checked as they come in.

Do you curl a lot to check headers?

September 5, 2014
6 comments Linux

I, multiple times per day, find myself wanting to find out what headers I get back on a URL but I don't care about the response payload. The command to use then is:

curl -v https://www.peterbe.com/ > /dev/null

That'll print out all the headers sent and received. Nice and crips.

So because I type this every day I made it into a shortcut script

cd ~/bin
echo '#!/bin/bash
> set -x
> curl -v "$@" > /dev/null
> ' > c
chmod +x c

If it's not clear what the code looks like, it's this:


#!/bin/bash
set -x
curl -v "$@" > /dev/null

Now I can just type:

c https://www.peterbe.com

Or if I want to add some extra request headers for example:

c -H 'User-Agent: foobar' https://www.peterbe.com

An AngularJS directive with itself as the attribute

September 3, 2014
8 comments JavaScript, AngularJS

Because this took me quite a while to figure out, I thought I'd share in case somebody else is falling into the same pit of confusion.

When you write an attribute directive in angularjs you might want to have it fed by an attribute value.
For example, something like this:


<div my-attribute="somevalue"></div>

How then do you create a new scope that takes that in? It's not obvious. Any here's how you do it:


app.directive('myAttribute', function() {
    return {
        restrict: 'A',
        scope: {
            myAttribute: '='
        },
        template: '<div style="font-weight:bold">{{ myAttribute | number:2 }}</div>'
    };
});

The trick to notice is that the "self attribute" because the name of the attribute in camel case.

Thanks @mythmon for helping me figure this out.

set -ex - The most useful bash trick of the year

August 31, 2014
8 comments Linux

I just learned a really good bash trick which is something I've wanted to have but didn't really appreciate that it was possible so I never even searched for it.

set -ex

Ok, one thing at a time.

`set -e`

What this does, at the top of your bash script is that it exits as soon as any line in the bash script fails.
Suppose you have a script like this:

git pull origin master
find . | grep '\.pyc$' | xargs rm
./restart_server.sh

If the first line fails you don't want the second line to execute and you don't want the third line to execute either. The naive solution is to "and" them:

git pull origin master && find . | grep '\.pyc$' | xargs rm && ./restart_server.sh

but now it's just getting silly. (and is it even working?)

What set -e does is that it exits if any of the lines fail.

`set -x`

What this does is that it prints each command that is going to be executed with a little plus.
The output can look something like this:

+ rm -f pg_all.sql pg_all.sql.gz
+ pg_dumpall
+ apack pg_all.sql.gz pg_all.sql
++ date +%A
+ s3cmd put --reduced-redundancy pg_all.sql.gz s3://db-backups-peterbe/Sunday/
pg_all.sql.gz -> s3://db-backups-peterbe/Sunday/pg_all.sql.gz  [part 1 of 2, 15MB]
 15728640 of 15728640   100% in    0s    21.22 MB/s  done
pg_all.sql.gz -> s3://db-backups-peterbe/Sunday/pg_all.sql.gz  [part 2 of 2, 14MB]
 14729510 of 14729510   100% in    0s    21.50 MB/s  done
+ rm pg_all.sql pg_all.sql.gz

...when the script looks like this:

#!/bin/bash
set -ex
rm -f pg_all.sql pg_all.sql.gz
pg_dumpall > pg_all.sql
apack pg_all.sql.gz pg_all.sql
s3cmd put --reduced-redundancy pg_all.sql.gz s3://db-backups-peterbe/`date +%A`/
rm pg_all.sql pg_all.sql.gz

And to combine these two gems you simply put set -ex at the top of your bash script.

Thanks @bramwelt for showing me this one.

UPDATE

Checkout out ExplainShell.com

Highlighted code syntax in Keynote

August 30, 2014
0 comments macOS

Do you want to display some code in a Keynote presentation?

It's easy. All you need is Homebrew installed.

First you need to install the program highlight.

$ brew install highlight

So you have a piece of code. For example some Python code. The take that snippet of code and save it to a file like code.py. Now all you need to do is run this:

$ highlight -O rtf code.py | pbcopy

Then, switch back into Keynote and simply paste.

But if you don't want to create a file of the snippet, simply copy the snippet from within your editor and run this:

$ pbpaste | highlight -S py -O rtf | pbcopy

The -S py means "syntax is py (for python)".

You can use highlight for a bunch of other things like creating HTML. See man highlight for more tips.

How I back up all my photos on S3 via Dropbox

August 28, 2014
4 comments Photos

First of all, Dropbox is awesome. When I plug in my iPhone to my laptop it automatically backs up all my photos to Dropbox fastly and conveniently. If I want to share a photo or a movie I can easily get a "Share link" to show friends and family.

The only disadvantage with Dropbox is that you only get 5Gb for free. Upgrading to the 100Gb Pro account is $10 per month. Not a massive amount but my inner geek uses that as an excuse to hack around it.

So, what I do is set up a S3 bucket. Let's call it camera-uploads-peterbe and then I use s3cmd to upload all the pictures by month. Like this:

$ cd Dropbox/Camera\ Uploads/
$ s3cmd put --reduced-redundancy 2014-01-* s3://camera-uploads-peterbe/2014/
$ rm 2014-01-*

I'm sure that by writing this here, people will write comments saying much better ways to do properly offside backups and I'm eager to hear that but for me S3 feels great. It's tools I'm very familiar with. It's a very established and mature business so it's unlikely to go away any time soon. It's secure and it's incredibly cheap because there's virtually no transfer of these files.

Also, I like how it's clearly explicit and simple. I don't have to worry about some obscure background app not working properly and me not noticing.

Let's see what the future holds. At the moment I'm just worrying about storing the files but it's a bit clunky to retrieve individual files.

UPDATE

For $99 per year I now get 1TB of space on Dropbox.

Also, the reason I discovered that way because s3cmd stopped working with these strange [Errno 32] Broken pipe errors. Lots of others have suffered from this too. I verified my access key and secret but didn't feel like spending too much time trying to understand explicit policies on S3 puts.

Yay! Dropbox!

premailer now with 100% test coverage

August 22, 2014
0 comments Python

One of my most popular GitHub Open Source projects is premailer. It's a python library for combining HTML and CSS into HTML with all its CSS inlined into tags. This is a useful and necessary technique when sending HTML emails because you can't send those with an external CSS file (or even a CSS style tag in many cases).

The project has had 23 contributors so far and as always people come in get some itch they have scratched and then leave. I really try to get good test coverage and when people come with code I almost always require that it should come with tests too.

But sometimes you miss things. Also, this project was born as a weekend hack that slowly morphed into an actual package and its own repository and I bet there was code from that day that was never fully test covered.

So today I combed through the code and plugged all the holes where there wasn't test coverage.
Also, I set up Coveralls (project page) which is an awesome service that hooks itself up with Travis CI so that on every build and every Pull Request, the tests are run with --with-cover on nosetests and that output is reported to Coveralls.

The relevant changes you need to do are:

1) You need to go to coveralls.io (sign in with your GitHub account) and add the repo.
2) Edit your .travis.yml file to contain the following:

before_install:
    - pip install coverage
...
after_success:
    - pip install coveralls
    - coveralls

And you need to execute your tests so that coverage is calculated (the coverage module stores everything in a .coverage file which coveralls analyzes and sends). So in my case I change to this:

script:
    - nosetests premailer --with-cover --cover-erase --cover-package=premailer

3) You must also give coveralls some clues. So it reports on only the relevant files. Here's what mine looked like:

[run]
source = premailer

[report]
omit = premailer/test*

Now, I get to have a cute "coverage: 100%" badge in the README and when people post pull requests Coveralls will post a comment to reflect how the pull request changes the test coverage.

I am so grateful for all these wonderful tools. And it's all free too!