Github Pull Request Triage tool

March 6, 2014
0 comments Web development, AngularJS

Screenshot
Last week I built a little tools called github-pr-triage. It's a single page app that sits on top of the wonderful GitHub API v3.

Its goal is to try to get an overview of what needs to happen next to open pull requests. Or rather, what needs to happen next to get it closed. Or rather, who needs to act next to get it closed.

It's very common, at least in my team, that someone puts up a pull request, asks someone to review it and then walks away from it. She then doesn't notice that perhaps the integrated test runner fails on it and the reviewer is thinking to herself "I'll review the code once the tests don't fail" and all of a sudden the ball is not in anybody's court. Or someone makes a comment on a pull request that the author of the pull requests misses in her firehose of email notifictions. Now she doesn't know that the comment means that the ball is back in her court.

Ultimately, the responsibility lies with the author of the pull request to pester and nag till it gets landed or closed but oftentimes the ball is in someone elses court and hopefully this tool makes that clearer.

Here's an example instance: https://prs.paas.allizom.org/mozilla/socorro

Currently you can use prs.paas.allizom.org for any public Github repo but if too many projects eat up all the API rate limits we have I might need to narrow it down to use mozilla repos. Or, you can simply host your own. It's just a simple Flask server

About the technology

I'm getting more and more productive with Angular but I still consider myself a beginner. Saying that also buys me insurance when you laugh at my code.

So it's a single page app that uses HTML5 pushState and an angular $routeProvider to make different URLs.

The server simply acts as a proxy for making queries to api.github.com and bugzilla.mozilla.org/rest and the reason for that is for caching.

Every API request you make through this proxy gets cached for 10 minutes. But here's the clever part. Every time it fetches actual remote data it stores it in two caches. One for 10 minutes and one for 24 hours. And when it stores it for 24 hours it also stores its last ETag so that I can make conditional requests. The advantage of that is you quickly know if the data hasn't changed and more importantly it doesn't count against you in the rate limiter.

Moby Dick (by Herman Melville)

February 25, 2014
0 comments Books

What a book!
I defend that it took months to finish it with; it's not easy reading, I only read on the short train commute 3 days a week and I'm a really really slow reader.

Moby Dick book cover
Even though it was hard going at times I generally enjoyed every page. Some passages were like reading Latin but with English words. Some pages where thrilling and some pages where beautiful as poetry. Some short passages were so amazing that you have to stop and just take a quick smile-break.

Unlike many people I actually didn't know how the book plays out or how it ends and I'm NOT going to spoil that here, in case you too want to read it too, not knowing how it ends. The only thing I regret is reading the editors introduction which revealed something crucial to the plot line without any warning.

It was not until afterwards when I read about the book on Wikipedia (link contains spoilers) that I appreciated the many sub-plots and sub-contexts. For example, the many metaphysical and theological undertones. For one thing (this is NOT a spoiler), if you're going to read it pay extra attention to peoples' names.

One thing I can reveal is that the book is basically three books but you don't really notice that when it goes from one to the other. I can not imagine a modern day publisher allowing that to happen to a contemporary book with contemporary readers who have less attention span than a gold fish. However, I am glad I've read it because not only is it an entertaining book it's also a good exercise in modern life that not everything has to be so perfect and lean.

And a final tip to you who now feel inspired to read the book for the first time; it's an old-English book with lots of words you won't know and that's fine, but do take the time to look up some of the nautical words related to the ship because they re-appear again and again when you're reading action filled passages. Like bulwark, masthead and starboard.

What's the average number of domains a website depends on?

February 24, 2014
10 comments Web development

tl;dr 36

For some time now, I've been running an experiment where I analyze how many different domains any website depends on. For example, you might have Google Analytics on your site (that's www.google-analytics.com) and you might have a Facebook Like button (that's platform.twitter.com and/or s-static.ak.facebook.com) and you might serve your images from a CDN (that's d1ac1bzf3lrf3c.cloudfront.net). That there is 3-4 distinct domains.

Independent of how many requests come from each domain, I wanted to measure how many distinct domains a website depends on so I wrote a script and started collecting random URLs across the web. Most of the time, to get a sample of different URLs I would take the RSS feed on Digg.com and the RSS feed on Hacker News on a periodic basis.

Network tab on the Dev Tools console for a page on the-toast.net
The results are amazing! Some websites depend on over 100 different domain names!

Take this page on The Toast for example, it depends on 143 different domains. Loading it causes your browser to make 391 requests, download 4.8Mb and takes 29 seconds (in total, not necessarily till you can start reading it). What were they thinking!?!

I think what this means is that website makers will probably continue to make websites like this. What we, as web software engineers, can not tell people it's a bad idea but instead to try to do something about it. It's quite far from my expertise but clearly if you want to make the Internet faster, DNS would be an area to focus on.

Test it out for yourself here: Number of Domains

Advanced live-search with AngularJS

February 4, 2014
12 comments JavaScript

For people familar with AngularJS, it's almost frighteningly easy to make a live-search on a repeating iterator.

Here's such an example: http://jsfiddle.net/r26xm/1/

Out of the box it just works. If nothing is typed into the search field it returns everything.

A big problem with this is that the pattern matching isn't very good. For example, if you search for ter you get Teresa and Peter.
More realistically you want it to only match with a leading word delimiter. In other words, if you type ter you want it only to match Teresa but not Peter because Peter doesn't start with ter.
So, to remedy that we construct a regular expression on the fly with a leading word delimiter. I.e. \bter.

Here's an example of that: http://jsfiddle.net/f4Zkm/2/

Now, there's a problem. For every item in the list the regular expression needs to be created and compiled which, when the list is very long, can become incredibly slow.
To remedy that we use $scope.$watch to create a local regular expression which only happens once per update to $scope.search.

Here's an example of that: http://jsfiddle.net/f4Zkm/4/

That, I think, is a really good pattern. Unfortunately we've left the simplicity but we now have something snappier.

Unfortunately the example is a little bit contrived because the list of names it filters on is so small but the list could be huge. It could also be that we want to make a more advanced regular expression. For example, you might want to allow multiple words to match so as ter ma should match Teresa Mayers, John Mayor and Maria Connor. Then you could make a regular expression with something like \b(ter|ma).

For seasoned Angularnauts this is trivial stuff but it really helped me make an app much faster and smoother. I hope it helps someones else doing something similar.

Sorting mixed type lists in Python 3

January 18, 2014
4 comments Python

Because this bit me harder than I was ready for, I thought I'd make a note of it for the next victim.

In Python 2, suppose you have this:


Python 2.7.5
>>> items = [(1, 'A number'), ('a', 'A letter'), (2, 'Another number')]

Sorting them, without specifying how, will automatically notice that it contains tuples:


Python 2.7.5
>>> sorted(items)
[(1, 'A number'), (2, 'Another number'), ('a', 'A letter')]

This doesn't work in Python 3 because comparing integers and strings is not allowed. E.g.:


Python 3.3.3
>>> 1 < '1'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()

You have to convert them to stings first.


Python 3.3.3
>>> sorted(items, key=lambda x: str(x[0]))
[(1, 'A number'), (2, 'Another number'), ('a', 'A letter')]

If you really need to sort by 1 < '1' this won't work. Then you need a more complex key function. E.g.:


Python 3.3.3
>>> def keyfunction(x):
...   v = x[0]
...   if isinstance(v, int): v = '0%d' % v
...   return v
...
>>> sorted(items, key=keyfunction)
[(1, 'A number'), (2, 'Another number'), ('1', 'Actually a string')]

That's really messy but the best I can come up with at past 4PM on Friday.

How I do deployments

December 16, 2013
5 comments Linux, Web development

I have and have had many sites that I run. They're all some form of side-project.

What they almost all have in common is two things

  1. They have very little traffic (thus not particularly mission critical)
  2. I run everything on one server (no need for "spinning up" new VMs here and there)

Many many years ago, when current interns I work with were mere babies, I started a very simple "procedure".

  1. On the server, in the user directory where the site is deployed, I write a script called something like upgrade_myproject.sh which is executable and does what the name of the script is: it upgrades the site.

  2. In the server's root home directory I write a script called restart_myproject.sh which also does exactly what the name of the script is: it restarts the service.

  3. On my laptop, in my ~/bin directory I create a script called UpgradeMyproject.sh (*) which runs upgrade_myproject.sh on the server and runs restart_myproject.sh also on the server.

And here is, if I may say so, the cleverness of this; I use ssh to execute these scripts remotely by simply piping the commands to ssh. For example:

#!/bin/bash
echo "./upgrade_generousfriends.sh" | ssh -A django@ec2-54-235-210-62.compute-1.amazonaws.com
echo "./restart_generousfriends.sh" | ssh root@ec2-54-235-210-62.compute-1.amazonaws.com

That's an example I use for Wish List Granted.

This works so darn well, and has done for years, that this is why I've never really learned to use more advanced tools like Fabric, Salt, Puppet, Chef or <insert latest deployment tool name>.

This means that all I need to do run a deployment is just type UpgradeMyproject.sh[ENTER] and the simple little bash scripts takes care of everything else.

The reason I keep these on the server and not on my laptop is simply because that's where they naturally belong and if I'm ssh'ed in and mess around I don't have to exit out to re-run them.

Here's an example of the upgrade_generousfriends.sh I use for Wish List Granted:

#!/bin/bash
cd generousfriends
source venv/bin/activate
git pull origin master
find . | grep '\.pyc$' | xargs rm -f
pip install -r requirements/prod.txt
./manage.py syncdb --noinput
./manage.py migrate webapp.main
./manage.py collectstatic --noinput
./manage.py compress --force
echo "Restart must be done by root"

I hope that, by blogging about this, that someone else sees that it doesn't really have to be that complicated. It's not rocket science and most complex tools are only really needed when you have a significant bigger scale in terms of people- and skill-complexity.

In conclusion

Keep it simple.

(*) The reason for the capitalization of my scripts is also an old habit. I use that habit to differentiate my scripts for stuff I install from any third parties.

Wish List Granted on Hacker News report

November 29, 2013
2 comments Web development

On Wednesday this week, I managed to get a link to Wish List Granted onto Hacker News. It had enough upvotes to be featured on the front page for a couple of hours. I'm very grateful for the added traffic but not quite so impressed with the ultimate conversions.

  • 4,428 unique visitors
  • 43 Wish Lists created
  • 2 Usersnap pieces of constructive feedback
  • 0 payments made

Google Analytics
So that's 1% conversion of people setting up a wish list. But kinda disappointing that no body ever made a payment. Actually, one friend did make a payment. But he's a colleague and a friend so not a stranger who stumbled onto it from Hacker News.

Also, it's now been 3 days since those 43 wish lists were created and still no payments. That's kinda disappointing too.

I'm starting to fear that Wish List Granted is one of those ideas that people think it's a great idea but have no interest in using.

Welcome to the world, Wish List Granted

November 27, 2013
4 comments Web development, Django

Logo
I built something. It's called Wish List Granted.

It's a mash-up using Amazon.com's Wish List functionality. What you do is you hook up your Amazon wish list onto wishlistgranted.com and pick one item. Then you share that page with friends and familiy and they can then contribute a small amount each. When the full amount is reached, Wish List Granted will purchase the item and send it to you.

The Rules page has more details if you're interested.

The problem it tries to solve is that you have friends would want something and even if it's a good friend you might be hesitant to spend $50 on a gift to them. I'm sure you can afford it but if you have many friends it gets unpractical. However, spending $5 is another matter. Hopefully Wish List Granted solves that problem.

Wish List Granted started as one of those insomnia late-night project. I first wrote a scraper using pyQuery then a couple of Django models and views and then tied it up by integrating Balanced Payments. It was actually working on the first night. Flawed but working start to finish.

When it all started, I used Persona to require people to authenticate to set up a Wish List. After some thought I decided to ditch that and use "email authentication" meaning they have to enter an email address and click a secure link I send to them.

One thing I'm very proud of about Wish List Granted is that it does NOT store any passwords, any credit cards or any personal shipping addresses. Despite being so totally void of personal data I thought it'd look nicer if the whole site is on HTTPS.

More information on the Help & Frequently Asked Questions page.

UPDATE

Wish List Granted is now shut down. Sad.

Credit Card formatter in Javascript

November 19, 2013
28 comments JavaScript

I looked around for Javascript libs that do automatic input formatting for credit card inputs.

The first one was formatter.js which looked promising but it weighs over 6Kb minified and also, when you apply it the placeholder attribute you have on the input disappears.

So, in true software engineering fashion I wrote my own:


function cc_format(value) {
  var v = value.replace(/\s+/g, '').replace(/[^0-9]/gi, '')
  var matches = v.match(/\d{4,16}/g);
  var match = matches && matches[0] || ''
  var parts = []
  for (i=0, len=match.length; i<len; i+=4) {
    parts.push(match.substring(i, i+4))
  }
  if (parts.length) {
    return parts.join(' ')
  } else {
    return value
  }
}

And some tests to prove it:


assert(cc_format('1234') === '1234')
assert(cc_format('123456') === '1234 56')
assert(cc_format('123456789') === '1234 5678 9')
assert(cc_format('') === '')
assert(cc_format('1234 1234 5') === '1234 1234 5')
assert(cc_format('1234 a 1234x 5') === '1234 1234 5')

Check out the Demo

A Django base class for all your Forms

November 16, 2013
2 comments Django

This has served me well of the last couple of years of using Django:


from django import forms

class _BaseForm(object):
    def clean(self):
        cleaned_data = super(_BaseForm, self).clean()
        for field in cleaned_data:
            if isinstance(cleaned_data[field], basestring):
                cleaned_data[field] = (
                    cleaned_data[field].replace('\r\n', '\n')
                    .replace(u'\u2018', "'").replace(u'\u2019', "'").strip())

        return cleaned_data


class BaseModelForm(_BaseForm, forms.ModelForm):
    pass


class BaseForm(_BaseForm, forms.Form):
    pass

So instead of doing...


class SigupForm(forms.Form):
    name = forms.CharField(max_length=100)
    nick_name = forms.CharField(max_length=100, required=False)

...you do:


class SigupForm(BaseForm):
    name = forms.CharField(max_length=100)
    nick_name = forms.CharField(max_length=100, required=False)

What it does is that it makes sure that any form field that takes a string strips all preceeding and trailing whitespace. It also replaces the strange "curved" apostrophe ticks that Microsoft Windows sometimes uses.

Yes, this might all seem trivial and I'm sure there's something as good or better out there but isn't it a nice thing to never have to worry about doing things like this again:



class SignupForm(forms.Form):
    ...

    def clean_name(self):
        return self.cleaned_data['name'].strip()

#...or...

form = SignupForm(request.POST)
if form.is_valid():
    name = form.cleaned_data['name'].strip()

UPDATE

This breaks some fields, like DateField.


>>> class F(BaseForm):
...     start_date = forms.DateField()
...     def clean_start_date(self):
...         return self.cleaned_data['start_date']
...
>>> f=F({'start_date': '2013-01-01'})
>>> f.is_valid()
True
>>> f.cleaned_data['start_date']
datetime.datetime(2013, 1, 1, 0, 0)

As you can see, it cleans up '2013-01-01' into datetime.datetime(2013, 1, 1, 0, 0) when it should become datetime.date(2013, 1, 1).

Not sure why yet.