Peterbe.com

Peter Bengtsson's blog

Filtered by Web development

Page 14

What's the average number of domains a website depends on?

February 24, 2014
10 comments Web development

tl;dr 36

For some time now, I've been running an experiment where I analyze how many different domains any website depends on. For example, you might have Google Analytics on your site (that's www.google-analytics.com) and you might have a Facebook Like button (that's platform.twitter.com and/or s-static.ak.facebook.com) and you might serve your images from a CDN (that's d1ac1bzf3lrf3c.cloudfront.net). That there is 3-4 distinct domains.

Independent of how many requests come from each domain, I wanted to measure how many distinct domains a website depends on so I wrote a script and started collecting random URLs across the web. Most of the time, to get a sample of different URLs I would take the RSS feed on Digg.com and the RSS feed on Hacker News on a periodic basis.

The results are amazing! Some websites depend on over 100 different domain names!

Take this page on The Toast for example, it depends on 143 different domains. Loading it causes your browser to make 391 requests, download 4.8Mb and takes 29 seconds (in total, not necessarily till you can start reading it). What were they thinking!?!

I think what this means is that website makers will probably continue to make websites like this. What we, as web software engineers, can not tell people it's a bad idea but instead to try to do something about it. It's quite far from my expertise but clearly if you want to make the Internet faster, DNS would be an area to focus on.

Test it out for yourself here: Number of Domains

How I do deployments

December 16, 2013
5 comments Linux, Web development

I have and have had many sites that I run. They're all some form of side-project.

What they almost all have in common is two things

They have very little traffic (thus not particularly mission critical)
I run everything on one server (no need for "spinning up" new VMs here and there)

Many many years ago, when current interns I work with were mere babies, I started a very simple "procedure".

On the server, in the user directory where the site is deployed, I write a script called something like upgrade_myproject.sh which is executable and does what the name of the script is: it upgrades the site.
In the server's root home directory I write a script called restart_myproject.sh which also does exactly what the name of the script is: it restarts the service.
On my laptop, in my ~/bin directory I create a script called UpgradeMyproject.sh (*) which runs upgrade_myproject.sh on the server and runs restart_myproject.sh also on the server.

And here is, if I may say so, the cleverness of this; I use ssh to execute these scripts remotely by simply piping the commands to ssh. For example:

#!/bin/bash
echo "./upgrade_generousfriends.sh" | ssh -A django@ec2-54-235-210-62.compute-1.amazonaws.com
echo "./restart_generousfriends.sh" | ssh root@ec2-54-235-210-62.compute-1.amazonaws.com

That's an example I use for Wish List Granted.

This works so darn well, and has done for years, that this is why I've never really learned to use more advanced tools like Fabric, Salt, Puppet, Chef or <insert latest deployment tool name>.

This means that all I need to do run a deployment is just type UpgradeMyproject.sh[ENTER] and the simple little bash scripts takes care of everything else.

The reason I keep these on the server and not on my laptop is simply because that's where they naturally belong and if I'm ssh'ed in and mess around I don't have to exit out to re-run them.

Here's an example of the upgrade_generousfriends.sh I use for Wish List Granted:

#!/bin/bash
cd generousfriends
source venv/bin/activate
git pull origin master
find . | grep '\.pyc$' | xargs rm -f
pip install -r requirements/prod.txt
./manage.py syncdb --noinput
./manage.py migrate webapp.main
./manage.py collectstatic --noinput
./manage.py compress --force
echo "Restart must be done by root"

I hope that, by blogging about this, that someone else sees that it doesn't really have to be that complicated. It's not rocket science and most complex tools are only really needed when you have a significant bigger scale in terms of people- and skill-complexity.

In conclusion

Keep it simple.

(*) The reason for the capitalization of my scripts is also an old habit. I use that habit to differentiate my scripts for stuff I install from any third parties.

Wish List Granted on Hacker News report

November 29, 2013
2 comments Web development

On Wednesday this week, I managed to get a link to Wish List Granted onto Hacker News. It had enough upvotes to be featured on the front page for a couple of hours. I'm very grateful for the added traffic but not quite so impressed with the ultimate conversions.

4,428 unique visitors
43 Wish Lists created
2 Usersnap pieces of constructive feedback
0 payments made

So that's 1% conversion of people setting up a wish list. But kinda disappointing that no body ever made a payment. Actually, one friend did make a payment. But he's a colleague and a friend so not a stranger who stumbled onto it from Hacker News.

Also, it's now been 3 days since those 43 wish lists were created and still no payments. That's kinda disappointing too.

I'm starting to fear that Wish List Granted is one of those ideas that people think it's a great idea but have no interest in using.

Welcome to the world, Wish List Granted

November 27, 2013
4 comments Web development, Django

I built something. It's called Wish List Granted.

It's a mash-up using Amazon.com's Wish List functionality. What you do is you hook up your Amazon wish list onto wishlistgranted.com and pick one item. Then you share that page with friends and familiy and they can then contribute a small amount each. When the full amount is reached, Wish List Granted will purchase the item and send it to you.

The Rules page has more details if you're interested.

The problem it tries to solve is that you have friends would want something and even if it's a good friend you might be hesitant to spend $50 on a gift to them. I'm sure you can afford it but if you have many friends it gets unpractical. However, spending $5 is another matter. Hopefully Wish List Granted solves that problem.

Wish List Granted started as one of those insomnia late-night project. I first wrote a scraper using pyQuery then a couple of Django models and views and then tied it up by integrating Balanced Payments. It was actually working on the first night. Flawed but working start to finish.

When it all started, I used Persona to require people to authenticate to set up a Wish List. After some thought I decided to ditch that and use "email authentication" meaning they have to enter an email address and click a secure link I send to them.

One thing I'm very proud of about Wish List Granted is that it does NOT store any passwords, any credit cards or any personal shipping addresses. Despite being so totally void of personal data I thought it'd look nicer if the whole site is on HTTPS.

More information on the Help & Frequently Asked Questions page.

UPDATE

Wish List Granted is now shut down. Sad.

Lazy loading below the fold

October 26, 2013
2 comments Web development, JavaScript

I've started experimenting with my home page to make it load even faster.

Amazon famously does this too which you can read more about in this Steve Souders post. They make sure everything that needs to be visible above the fold is loaded first, then, it starts loading all the other "stuff" below the fold. The assumption is that the user requests the page, watches it render, and sometimes after it has rendered reaches for the mouse and starts scrolling down for more content. Or perhaps, never bothers to scroll down at all. Either way, everything below the fold can wait. We have more time, to load that in, later.

What we want to avoid is a load graph like this:

big html document delays loading other stuff

The graph is deliberately zoomed out so that we don't get stuck on the details of that particular graph. But basically, you have a very heavy document to load which needs to be fully loaded (and partially rendered) before it can load all other stuff that that page entails. As you can see, the first load (the HTML document) is taking up a majority of the load time. Once that's downloaded the browser can start parsing it and start rendering it. Simultaneously it can start downloading all the mentioned resources such as images, javascript, and CSS.

On WebPagetest they call this Speed Index; "The Speed Index is the average time at which visible parts of the page are displayed."
So basically, you want to display as much as you possibly can and then load in other things that are necessary but can wait in the background.

So, how did I accomplish this on my site?

Basically, the home page uses as piece of Django code that picks up the 10 most recent blog posts and includes them into the template. Instead, I made it only pick up the first 2 and then after window.onload a piece if AJAX code loads the HTML for the remaining 8 blog posts.
That means that much less is required to load the home page. The page is smaller and references less images. The AJAX code is very crude and simple but works enough:


onload = function() {
  microAjax("/rest/2/10/", function (res) {
    document.getElementById('rest').innerHTML = res;
  });
};

The user probably won't notice a huge difference if she avoids looking at the loading spinner of her browser. Only if she is really really fast at scrolling down will she notice that the rest of the page (about 80% of its vertical space) comes in a little bit later.

So, did it work?

I hope so! The theory is sound. However, my home page is, unlike an Amazon.com product page, very sparse. The page weighs a total of 77Kb (excluding external resources) but now only the first 25Kb is loaded and the rest later.

Here's a measurement before and one after. It's kinda hard to compare because "fluctuations" on network I/O make measurements like this quite unpredictable. Also, there's various odd requests like New Relic and Google Analytics which clouds the waterfall view. However, what really matters is in the "First View" of the after measurement. If you look closely you'll see that now a bunch of images aren't loaded until after the "Document Complete" event has fired. That, to me, is a big win.

If you're interested in how it was done, check out this changeset.

premailer now excludes pseudo selectors by default

May 27, 2013
0 comments Python, Web development

Thanks to Igor who emailed me and made me aware, you can't put pseudo classes in style attributes in HTML. I.e. this does not work:


<a href="#" style="color:pink :hover{color:red}">Sample Link</a>

See for yourself: Sample Link

Note how it does not become red when you hover over the link above.
This is what premailer used to do. Until yesterday.

BEFORE:


>>> from premailer import transform
>>> print transform('''
... <html>
... <style>
... a { color: pink }
... a:hover { color: red }
... </style>
... <a href="#">Sample Link</a>
... </html>
... ''')
<html><head><a href="#" style="{color:pink} :hover{color:red}">Sample Link</a></head></html>

AFTER:


>>> from premailer import transform
>>> print transform('''
... <html>
... <style>
... a { color: pink }
... a:hover { color: red }
... </style>
... <a href="#">Sample Link</a>
... </html>
... ''')
<html><head>
<style>a:hover {color:red}</style>
<a href="#" style="color:pink">Sample Link</a>
</head></html>

That's because the new default is exclude pseudo classes by default.

Thanks Igor for making me aware!

Registration and sign-in by email verification

April 29, 2013
10 comments Web development

I was going to title this blog post "I don't want your stinkin' password!" but realised that this isn't the first site that uses entirely OpenID, OAuth and stuff.

On Around The World you can now log in with either your Google account, your Twitter account or simply by entering your email. It looks like this:

What's neat about this is that it works independent of if you've signed in before (aka. log in) or if you're new (aka. register).

What's not so neat about it is that people might not recognize it. We're so used to both registration forms and log in forms to ask for passwords. Often, you can quickly tell of it's log in because you expect two input fields.

Another slight flaw with this is the fact that my emails usually take several tens of seconds to send. This is because they're sent by a cron job async. So, people who enter their email address might get disappointed if they don't get the email immediately.

Anyway, let's wait and see if people actually use it. At least it means you don't really need a third party service and you don't need to type in a password.

Recruiters: if you're going to lie, do it properly

April 7, 2013
9 comments Work, Web development

Being a recruiter is hard work. A lot of pressure and having to deal with people's egos. Although I have no plans to leave Mozilla any time soon, it's still some sort of value in seeing that my skills are sought after in the industry. That's why I haven't yet completely cancelled my LinkedIn membership.

When I get automated emails from bots that just scrape LinkedIn I don't bother. Sometimes I get emails from recruiters who have actually studied my profile (my blog, my projects, my github, etc) and then I do take the time to reply and say "Hi Name! Thank you for reaching out. It looks really exciting but it's not for me at the moment. Keep up the good work!"

Then there's this new trend where people appear to try to automate what the bots do by doing it manually but without actually reading anything. I understand that recruiters are under a lot of pressure to deliver and try to reach out to as many potential candidates as possible but my advice is: if you're going to do, do it properly. You'll reach fewer candidates but it'll mean so much more.

I got this email the other day about a job offer at LinkedIn:

I have a Swedish background. Not "Sweetish". And what difference does that make?
I haven't worked on "FriedZopeBase" (which is on my github) for several years
I haven't worked on "IssueTrackerProduct" for several years
Let's not "review [my] current employment". That's for me to think about.

So what can we learn from this? Well, for starters if you're going pretend to have taken time, do it properly! If you don't have time to do in-depth research on a candidate, then don't pretend that you have.

I got another recruiter emailing me personally yesterday and it was short and sweet. No mention of free lunch or other superficial trappings. The only personal thing about it was that it had my first name. I actually bothered to reply to them and thank them for reaching out.

Never put external Javascript in the <head>

April 2, 2013
13 comments Web development

First of all, the title is perhaps misleading. Basically, don't put plain script tags that are not async in the head tag.

If you put a piece of javascript in the head of HTML page, the browser will start to download that and proceed down the lines of HTML and download other resources too as it encounters them such as the CSS files.

Then, when all javascript and CSS has been downloaded it will start rendering the page and when it does that it will download any images referenced in the HTML. At roughly the same time it will start to display things on the screen. But it won't do this until the CSS and Javascript has been downloaded.

To repeat: The browser screen will appear blank. It won't start downloading any images if downloading a javascript URL referenced in the head gets stuck.

Here are two perfectly good examples from this morning's routine hunt for news:

Wired.com is guilty

getharvest.com is guilty

Here's what getharvest.com does in their HTML:


<!DOCTYPE html>
<html lang="en">

  <head>
    <script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script>
    <script type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>
  ...

Why it gets stuck on connecting to c761485.r85.cf2.rackcdn.com I just don't know. But it does. The Internet is like that oftentimes. You simply can't connect to otherwise perfectly configured web servers.

Update-whilst-writing-this-text! As I was writing this text I gave getharvest.com a second chance thinking that most likely the squirrels in my internet tubes will be back up and running to rackcdn.com but then [this happened!/static/cache/bd/02/bd02367be6bbe6d16444051619d88bee.jpg)

So, what's the right thing to do? Simple: don't rely on external resources. For example, can you move the Javascript script tag to the very very bottom of the HTML page. That way it will render as much as it possibly can whilst waiting for the Javascript resource to get unstuck. Or, almost equally you can keep the script tag in the <head> but then but in async attribute on it like this:


<script async type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>

Another thing you can do is not use an external resource URL (aka. third-party domain). Instead of using cdn.superfast.com/file.js you instead use /file.js. Sure, that fancy CDN might be faster at serving up stuff than your server but looking up a CDN's domain is costing one more DNS lookup which we know can be very expensive for that first-time impression.

I know I'm probably guilty of this new on some of my (now) older projects. For example, if you open aroundtheworldgame.com it won't render anything until it has managed to connect to maps.googleapis.com and dn4avfivo8r6q.cloudfront.net but that's more of an app rather than a web site.

By the way... I wrote some basic code to play around with how this actually works. I decided to put this up in case you want to experiment with it too: https://github.com/peterbe/slowpage

HTML whitespace "compression" - don't bother!

March 11, 2013
4 comments Web development

This morning I came across this site on Hacker News. It's a cute site with some basic tips on how to make your sites faster.

It's very much a for-beginners document as all the tips are quite basic. For example it doesn't even mention the use of CDNs.

One tip in particular stood out to me: "it can be useful to minify your HTML with automated tools."
And it links to the htmlcompressor project. Ignore this advice.

What matters 10 times more is Gzip compression. This is usually very easy to set up with Nginx or Apache. It's not something you do in your web framework and if you don't have a web framework, you don't need to manully Gzip HTML files on the filesystem.

For example, downloading the home page here on my blog, at the time of writing, this is: 66,770 bytes big. Hefty, sure, but with all excess whitespace removed it reduces down to 59,356 bytes. But that really doesn't matter when you Gzip.

Gzipped from original version: 18,470 bytes
Gzipped from whitespace trimmed version: 18,086 bytes

The gain is 2% which is definitely not worth the hassle of adding a whitespace compressor.