Peterbe.com

Peter Bengtsson's blog

Page 39

Reset

Why is it important to escape & in href attributes in tags?

November 11, 2014
13 comments Web development

Here’s an example of unescaped & characters in a A HREF tag attribute.
http://jsfiddle.net/32zbogfw/ It’s working fine.

I know it might break XML and possibly XHTML but who uses that still?

And I know an unescaped & in a href shows as red in the View Source color highlighting.

What can go wrong? Why is it important? Perhaps it used to be in 2009 but no longer the case.

This all started because I was reviewing some that uses python urllib.urlencode(...) and inserts the results into a Django template with href="{{ result_of_that_urlencode }}" which would mean you get un-escaped & characters and then I tried to find how and why that is bad but couldn't find any examples of it.

God, No! by Penn Jillette

November 9, 2014
0 comments Books

A couple of months ago my wife went to see the Penn & Teller show in Las Vegas. Afterwards she stayed backstage to meet Penn, have a quick chat and sign a copy of his book. My wife said "My husband is going to be so jealous that I met you", to which Penn replied "Wanna make him really jealous? Grab my ass." Which she did. Haha!

I've been a long time fan of their show. I remember watching it when I was big enough to appreciate magic but had no idea what the jokes were and I thought they was just kinda dark and odd.

These guys do everything together but this book is all Penn. It's completely without a plot line other than, I guess, it goes through the 10 commandments in the bible and for each, tells a couple of stories that are somewhat related. Funny stories. Sexy stories. And very very personal stories.

Despite its title not that much of the book is about atheism. The prolog and the epilogue is though. In fact, the prolog was "mindblowingly" profound and well written. I was really impressed. There were so many interesting thoughts that I could quote the whole thing but instead I'm just going to quote this little piece:

Some will tell you "God is love" and then defy you not to believe in love. Bug, if X = Y, why have a fucking X? Just keep it at Y. Why call love god? Why not call love ... love? "Beauty is god." Okay. If you change what the word means, you can get me to say I believe in it. Say "God is bacon" or "God is tits" and I'll love and praise god, but you're just changing the word, not the idea.

Funny! And I'd never thought of that as a rebuttal.

I used to be an atheist and was almost militant about it meaning; I was prone to proclaim it loudly in hope of convincing people. I am no longer an atheist. Partly that's because I've come to understand two things: Preaching for the negative is a paradoxical oxymoron. Secondly, I have new-found respect and admiration for church as a community.

Which brings me to conclude with my final thought: After reading this atheism proclamation I and now even less atheist. The more arguments Penn makes the less I believe in atheism. Strange.

I guess I can say "God is leaving people to make up their own minds". Which means I can say: "Leaving people to make up their own minds is leaving people to make up their own minds.

But I did enjoy many of the stories in the book. You might too.

Bye bye AWS EC2, Hello Digital Ocean

November 3, 2014
0 comments Linux

Before trolls get inspired let me start with this: EC2 is awesome!

But, wanna know what's also awesome?: Digital Ocean

The reason I switched was two-fold: A) money and B) curiousity.

As part of a very generous special friendship I got a "m1.large" for free. That deal had to come to an end so I had to start paying that myself. It was well over $100 per month. I have about 10 servers running on that machine hovering around 3+Gb of RAM.

So I thought this is an excuse to do some spring cleaning and then switch to this newfangled Digital Ocean which is all SSD drives, got good reviews and has a fixed cost per month. First I decommissioned some servers and some sites that used to have multiple processors were reduced to just a single process. Now I got everything down to a steady 2+Gb.

I decided to splash out a bit and I went for the $40/month option which is 4GB, 2 core, 60GB SSD and 4TB transfer. Setting up all the servers on this new Ubuntu 14.04 was relatively easy (thank you pip freeze and rsync!).

So far, I have to say I'm wildly impressed. The interface is gorgeous. It's easy to do everything. I love that the price is fixed. That suits me more that corporations might care about but I'm just little old me.

If you get inspired to try it out please use my referral code. Then you get $10 free credit: https://www.digitalocean.com/?refcode=9c9126b69f33

uwsgi and uid

November 3, 2014
4 comments Python, Linux, Django

So recently, I moved home for this blog. It used to be on AWS EC2 and is now on Digital Ocean. I wanted to start from scratch so I started on a blank new Ubuntu 14.04 and later rsync'ed over all the data bit by bit (no pun intended).

When I moved this site I copied the /etc/uwsgi/apps-enabled/peterbecom.ini file and started it with /etc/init.d/uwsgi start peterbecom. The settings were the same as before:

# this is /etc/uwsgi/apps-enabled/peterbecom.ini
[uwsgi]
virtualenv = /var/lib/django/django-peterbecom/venv
pythonpath = /var/lib/django/django-peterbecom
user = django
master = true
processes = 3
env = DJANGO_SETTINGS_MODULE=peterbecom.settings
module = django_wsgi2:application

But I kept getting this error:

Traceback (most recent call last):
...
  File "/var/lib/django/django-peterbecom/venv/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 182, in _cursor
    self.connection = Database.connect(**conn_params)
  File "/var/lib/django/django-peterbecom/venv/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect
    conn = _connect(dsn, connection_factory=connection_factory, async=async)
psycopg2.OperationalError: FATAL:  Peer authentication failed for user "django"

What the heck! I thought. I was able to connect perfectly fine with the same config on the old server and here on the new server I was able to do this:

django@peterbecom:~/django-peterbecom$ source venv/bin/activate
(venv)django@peterbecom:~/django-peterbecom$ ./manage.py shell
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from peterbecom.apps.plog.models import *
>>> BlogItem.objects.all().count()
1040

Clearly I've set the right password in the settings/local.py file. In fact, I haven't changed anything and I pg_dump'ed the data over from the old server as is.

I edit edited the file psycopg2/__init__.py and added a print "DSN=", dsn and those details were indeed correct.
I'm running the uwsgi app as user django and I'm connecting to Postgres as user django.

Anyway, what I needed to do to make it work was the following change:

# this is /etc/uwsgi/apps-enabled/peterbecom.ini
[uwsgi]
virtualenv = /var/lib/django/django-peterbecom/venv
pythonpath = /var/lib/django/django-peterbecom
user = django
uid = django   # THIS IS ADDED
master = true
processes = 3
env = DJANGO_SETTINGS_MODULE=peterbecom.settings
module = django_wsgi2:application

The difference here is the added uid = django.

I guess by moving across (I'm currently on uwsgi 1.9.17.1-debian) I get a newer version of uwsgi or something that simply can't just take the user directive but needs the uid directive too. That or something else complicated to do with the users and permissions that I don't understand.

Hopefully, by having blogged about this other people might find it and get themselves a little productivity boost.

Shout-out to eventlog

October 30, 2014
4 comments Django

If you do things with the Django ORM and want an audit trails of all changes you have two options:

Insert some cleverness into a pre_save signal that writes down all changes some way.
Use eventlog and manually log things in your views.

(you have other options too but I'm trying to make a point here)

eventlog is almost embarrassingly simple. It's basically just a model with three fields:

User
An action string
A JSON dump field

You use it like this:



from eventlog.models import log

def someview(request):
    if request.method == 'POST':
        form = SomeModelForm(request.POST)
        if form.is_valid():
            new_thing = form.save()
            log(request.user, 'mymodel.create', {
                'id': new_thing.id,
                'name': new_thing.name,
                # You can put anything JSON 
                # compatible in here
            })
            return redirect('someotherview')
    else:
        form = SomeModelForm()
    return render(request, 'view.html', {'form': form})

That's all it does. You then have to do something with it. Suppose you have an admin page that only privileged users can see. You can make a simple table/dashboard with these like this:


from eventlog.models import Log  # Log the model, not log the function

def all_events(request):
    all = Log.objects.all()
    return render(request, 'all_events.html', {'all': all})

And something like this to to all_events.html:


<table>
  <tr>
    <th>Who</th><th>When</th><th>What</th><th>Details</th>
  </tr>
  {% for event in all %}
  <tr>
    <td>{{ event.user.username }}</td>
    <td>{{ event.timestamp | date:"D d M Y" }}</td>
    <td>{{ event.action }}</td>
    <td>{{ event.extra }}</td>
  </tr>
  {% endfor %}
</table>

What I like about it is that it's very deliberate. By putting it into views at very specific points you're making it an audit log of actions, not of data changes.

Projects with overly complex model save signals tend to dig themselves into holes that make things slow and complicated. And it's not unrealistic that you'll then record events that aren't particularly important to review. For example, a cron job that increments a little value or something. It's more interesting to see what humans have done.

I just wanted to thank the Eldarion guys for eventlog. It's beautifully simple and works perfectly for me.

Go vs. Python

October 24, 2014
42 comments Python, Go

tl;dr; It's not a competition! I'm just comparing Go and Python. So I can learn Go.

So recently I've been trying to learn Go. It's a modern programming language that started at Google but has very little to do with Google except that some of its core contributors are staff at Google.

The true strength of Go is that it's succinct and minimalistic and fast. It's not a scripting language like Python or Ruby but lots of people write scripts with it. It's growing in popularity with systems people but web developers like me have started to pay attention too.

The best way to learn a language is to do something with it. Build something. However, I don't disagree with that but I just felt I needed to cover the basics first and instead of taking notes I decided to learn by comparing it to something I know well, Python. I did this a zillion years ago when I tried to learn ZPT by comparing it DTML which I already knew well.

My free time is very limited so I'm taking things by small careful baby steps. I read through An Introduction to Programming in Go by Caleb Doxey in a couple of afternoons and then I decided to spend a couple of minutes every day with each chapter and implement something from that book and compare it to how you'd do it in Python.

I also added some slightly more full examples, Markdownserver which was fun because it showed that a simple Go HTTP server that does something can be 10 times faster than the Python equivalent.

What I've learned

Go is very unforgiving but I kinda like it. It's like Python but with pyflakes switched on all the time.
Go is much more verbose than Python. It just takes so much more lines to say the same thing.
Goroutines are awesome. They're a million times easier to grok than Python's myriad of similar solutions.
In Python, the ability to write to a list and it automatically expanding at will is awesome.
Go doesn't have the concept of "truthy" which I already miss. I.e. in Python you can convert a list type to boolean and the language does this automatically by checking if the length of the list is 0.
Go gives you very few choices (e.g. there's only one type of loop and it's the for loop) but you often have a choice to pass a copy of an object or to pass a pointer. Those are different things but sometimes I feel like the computer could/should figure it out for me.
I love the little defer thing which means I can put "things to do when you're done" right underneath the thing I'm doing. In Python you get these try: ...20 lines... finally: ...now it's over... things.
The coding style rules are very different but in Go it's a no brainer because you basically don't have any choices. I like that. You just have to remember to use gofmt.
Everything about Go and Go tools follow the strict UNIX pattern to not output anything unless things go bad. I like that.
godoc.org is awesome. If you ever wonder how a built in package works you can just type it in after godoc.org like this godoc.org/math for example.
You don't have to compile your Go code to run it. You can simply type go run mycode.go it automatically compiles it and then runs it. And it's super fast.
go get can take a url like github.com/russross/blackfriday and just install it. No PyPI equivalent. But it scares me to depend on peoples master branches in GitHub. What if master is very different when I go get something locally compared to when I run go get weeks/months later on the server?

UPDATE

Here's a similar project comparing Python vs. JavaScript by Ilya V. Schurov

localForage vs. XHR

October 22, 2014
9 comments JavaScript

tl;dr; Fetching from IndexedDB is about 5-15 times faster than fetching from AJAX.

localForage is a wrapper for the browser that makes it easy to work with any local storage in the browser. Different browsers have different implementations. By default, when you use localForage in Firefox is that it used IndexedDB which is asynchronous by default meaning your script don't get blocked whilst waiting for data to be retrieved.

A good pattern for a "fat client" (lots of javascript, server primarly speaks JSON) is to download some data, by AJAX using JSON and then store that in the browser. Next time you load the page, you first read from the local storage in the browser whilst you wait for a fresh new JSON from the server. That way you can present data to the screen sooner. (This is how Buggy works, blogged about it here)

Another similar pattern is that you load everything by AJAX from the server, present it and store it in the local storage. Then you perdiocally (or just on onload) you send the most recent timestamp from the data you've received and the server gives you back everything new and everything that has changed by that timestamp. The advantage of this is that the payload is continuously small but the server has to make a custom response for each client whereas a big fat blob of JSON can be better cached and such. However, oftentimes the data is dependent on your credentials/cookie anyway so most possibly you can't do much caching.

Anyway, whichever pattern you attempt I thought it'd be interesting to get a feel for how much faster it is to retrieve from the browsers memory compared to doing a plain old AJAX GET request. After all, browsers have seriously optimized for AJAX requests these days so basically the only thing standing in your way is network latency.

So I wrote a little comparison script that tests this. It's here: https://www.peterbe.com/localforage-vs-xhr/index.html

It retrieves a 225Kb JSON blob from the server and measures how long that took to become an object. Equally it does the same with localforage.getItem and then it runs this 10 times and compares the times. It's obviously not a surprise the local storage retrieval is faster, what's interesting is the difference in general.

What do you think? I'm sure both sides can be optimized but at this level it feels quite realistic scenarios.

django-html-validator

October 20, 2014
2 comments Python, Web development, Django

A couple of weeks ago we had accidentally broken our production server (for a particular report) because of broken HTML. It was an unclosed tag which rendered everything after that tag to just plain white. Our comprehensive test suite failed to notice it because it didn't look at details like that. And when it was tested manually we simply missed the conditional situation when it was caused. Neither good excuses. So it got me thinking how can we incorporate HTML (html5 in particular) validation into our test suite.

So I wrote a little gist and used it a bit on a couple of projects and was quite pleased with the results. But I thought this might be something worthwhile to keep around for future projects or for other people who can't just copy-n-paste a gist.

With that in mind I put together a little package with a README and a setup.py and now you can use it too.

There are however some caveats. Especially if you intend to run it as part of your test suite.

Caveat number 1

You can't flood htmlvalidator.nu. Well, you can I guess. It would be really evil of you and kittens will die. If you have a test suite that does things like response = self.client.get(reverse('myapp:myview')) and there are many tests you might be causing an obscene amount of HTTP traffic to them. Which brings us on to...

Caveat number 2

The htmlvalidator.nu site is written in Java and it's open source. You can basically download their validator and point django-html-validator to it locally. Basically the way it works is java -jar vnu.jar myfile.html. However, it's slow. Like really slow. It takes about 2 seconds to run just one modest HTML file. So, you need to be patient.

Premailer on Python 3

October 8, 2014
1 comment Python

Premailer is probably my most successful open source project in recent years. I base that on the fact that 25 different people have committed to it.

Today I merged a monster PR by Michael Jason Smith of OnlineGroups.net.

What it does is basically that it makes premailer work in Python 3, PyPy and Python 2.6. Check out the tox.ini file. Test coverage is still 100%.

If you look at the patch the core of the change is actually surprisingly little. The majority of the "secret sauce" is basically a bunch of import statements which are split by if sys.version_info >= (3, ): and some various minor changes around encoding UTF-8. The rest of the changes are basically test sit-ups.

A really interesting thing that hit us was that the code had assumptions about the order of things. Basically the tests assumed the the order of certain things in the resulting output was predictable even though it was done using a dict. dicts are famously unreliable in terms of the order you get things out and it's meant to be like that and it's a design choice. The reason it worked till now is not only luck but quite amazing.

Anyway, check it out. Now that we have a tox.ini file it should become much easier to run tests which I hope means patches will be better checked as they come in.

Do you curl a lot to check headers?

September 5, 2014
6 comments Linux

I, multiple times per day, find myself wanting to find out what headers I get back on a URL but I don't care about the response payload. The command to use then is:

curl -v https://www.peterbe.com/ > /dev/null

That'll print out all the headers sent and received. Nice and crips.

So because I type this every day I made it into a shortcut script

cd ~/bin
echo '#!/bin/bash
> set -x
> curl -v "$@" > /dev/null
> ' > c
chmod +x c

If it's not clear what the code looks like, it's this:


#!/bin/bash
set -x
curl -v "$@" > /dev/null

Now I can just type:

c https://www.peterbe.com

Or if I want to add some extra request headers for example:

c -H 'User-Agent: foobar' https://www.peterbe.com