Panasonic Lumix from 2008 or a iPhone 5S from 2014

September 26, 2015
5 comments Photos

Rummaging through an old box I found an old digital camera I bought in 2008, the Panasonic Lumix DMC-LX3. It was hot stuff when it came out and I loved it. So much lighter and smaller than my previous Nikon DSLR behemoth.

But how does this 7 year old camera compare to my iPhone 5S?? Without any scientific rigor I went to the park and took one picture with each "camera" (the iPhone is not really a camera, it just has a (good) camera).

Note! The thumbnails shown below are heavily optimized for web use. You have to click to see the original.

Here are the pictures taken with the Lumix:

Lumix 1

Lumix 2

Lumix 3


Lumix 4

Lumix 5



And here are the pictures taken with the iPhone 5S:

iPhone 5S 1

iPhone 5S 2

iPhone 5S 3


iPhone 5S 4

iPhone 5S 5



To compare, the best thing you can do is to open one of each so to say in separate tabs, or download, and zoom in and stare it down.

The total pixel area across all 5 images is about the same. The iPhone 5S pictures are slightly smaller in terms of dimension. The Lumix pictures are all 3,648x2,736 pixels. The iPhone 5S pictures are 3,264x2,448 pixels.

The 5 Lumix pictures weigh 19.1Mb and the iPhone 5S pictures weigh 11.6Mb.

Observations

  • The color on the Braille sign, the 4th one, is significantly different. I can't tell or remember which one is "right".
  • The last one, of the wall clock in my garage, was taken with flash on both. Interestingly, with the iPhone 5S, the picture was actually better without flash.
  • The iPhone 5S pictures adds a yellowish glow which feels artificial but looks quite nice and warm.

In conclusion

I don't know which is better. The Lumix weighs more and is bigger volume than the iPhone and it doesn't have a web browser, GPS or WiFi. So if the pictures are about the same, the iPhone wins.

What do you think? If we ignore the practical aspect of carrying the Lumix, which pictures do you prefer?

ElasticSearch, snowball analyzer and stop words

September 25, 2015
1 comment Python

Disclaimer: I'm an ElasticSearch noob. Go easy on me

I have an application that uses ElasticSearch's more_like_this query to find related content. It basically works like this:

>>> index(index, doc_type, {'id': 1, 'title': 'Your cool title is here'})
>>> index(index, doc_type, {'id': 2, 'title': 'About is a cool headline'})
>>> index(index, doc_type, {'id': 3, 'title': 'Titles are your big thing'})

Then you can pick one ID (1, 2 or 3) and find related ones.
We can tell by looking at these three silly examples, the 1 and 2 have the words "is" and "cool" in common. 1 and 3 have "title" (stemming taken into account) and "your" in common. However, is there much value in connected these documents on the words "is" and "your"? I think not. Those are stop words. E.g. words like "the", "this", "from", "she" etc. Basically words that are commonly used as "glue" between more unique and specific words.

Anyway, if you index something in ElasticSearch as a text field you get, by default, the "standard" analyzer to analyze the incoming stuff to be indexed. The standard analyzer just splits the words on whitespace. A more compelling analyzer is the Snowball analyzer (original here) which supports intelligent stemming (turning "wife" ~= "wives") and stop words.

The problem is that the snowball analyzer has a very different set of stop words. We did some digging and thought this was the list it bases its English stop words on. But this was wrong. Note that that list has words like "your" and "about" listed there.

The way to find out how your analyzer treats a string and turns it into token is to the the _analyze tool. For example:

curl -XGET 'localhost:9200/{myindexname}/_analyze?analyzer=snowball' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 5,
      "token": "about",
      "type": "<ALPHANUM>",
      "start_offset": 0,
      "position": 1
    },
    {
      "end_offset": 10,
      "token": "your",
      "type": "<ALPHANUM>",
      "start_offset": 6,
      "position": 2
    },
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

So what you can see is that it finds the tokens "about", "your", "special" and "word". But it stop word ignored "is", "a" and "the". Hmm... I'm not happy with that. I don't think "about" and "your" are particularly helpful words.

So, how do you define your own stop words and override the one in the Snowball analyzer? Well, let me show you.

In code, I use pyelasticsearch so the index creation is done in Python.


STOPWORDS = (
    "a able about across after all almost also am among an and "
    "any are as at be because been but by can cannot could dear "
    "did do does either else ever every for from get got had has "
    "have he her hers him his how however i if in into is it its "
    "just least let like likely may me might most must my "
    "neither no nor not of off often on only or other our own "
    "rather said say says she should since so some than that the "
    "their them then there these they this tis to too twas us "
    "wants was we were what when where which while who whom why "
    "will with would yet you your".split()
)

def create():
    es = get_connection()
    index = get_index()
    es.create_index(index, settings={
        'settings': {
            'analysis': {
                'analyzer': {
                    'extended_snowball_analyzer': {
                        'type': 'snowball',
                        'stopwords': STOPWORDS,
                    },
                },
            },
        },
        'mappings': {
            doc_type: {
                'properties': {
                    'title': {
                        'type': 'string',
                        'analyzer': 'extended_snowball_analyzer',
                    },
                }
            }
        }
    })

With that in place, now delete your index and re-create it. Now you can use the _analyze tool again to see how it analyzes text on this particular field. But note, to do this we need to know the name of the index we used. (so replace {myindexname} in the URL):

$ curl -XGET 'localhost:9200/{myindexname}/_analyze?field=title' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

Cool! Now we see that it considers "about" and "your" as stop words. Much better. This is handy too because you might have certain words that are globally not very common but within your application it's very repeated and not very useful.

Thank you willkg and Erik Rose for your support in tracking this down!

django-semanticui-form

September 14, 2015
2 comments Python, Django

I'm working on a (side)project in Django that uses the awesome Semantic UI CSS framework. This project has some Django forms that are rendered on the server and so I can't let Django render the form HTML or else the CSS framework can't do its magic.

The project is called django-semanticui-form and it's a fork from django-bootstrap-form.

It doesn't come with the Semantic UI CSS files at all. That's up to you. Semantic UI is available as a big fat bundle (i.e. one big .css file) but generally you just pick the components you want/need. To use it in your Django templates simply, create a django.forms.Form instance and render it like this:


{% load semanticui %}

<form>
  {{ myform | semanticui }}
</form>

The project is very quickly put together. The elements I intend to render seem to work but you might find that certain input elements don't work as nicely. However, if you want to help on the project, it's really easy to write tests and run tests. And Travis and automatic PyPI deployment is all set up so pull requests should be easy.

peepin - a great companion to peep

September 10, 2015
0 comments Python

I actually wrote peepin several months ago but forgot to blog about it.
It's a great library that accompanies peep which is a wrapper on top of pip. Actually, it's for pip install. When you normally do pip install -r requirements.txt the only check it does is on the version number, assuming your requirements.txt has lines in it like Django==1.8.4. With peep it does a checksum comparison of the wheel, tarball or zip file. It basically means that the installer will get EXACTLY the same package files as was used by the developer who decides to add it to requirements.txt.

If you're using pip and want strong reliability and much higher security, I strongly recommend you consider switching to peep.

Anyway, what peepin is, is a executable use to modify your requirements.txt automatically for you. It can do two things. At least one.

1) Automatically figure out what the right checksums should be.
2) It can figure out what is the latest version on PyPI.

For example:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form $)$ peepin --verbose django-bootstrap-form
* Latest version for 3.2
https://pypi.python.org/pypi/django-bootstrap-form/3.2
* Found URL https://pypi.python.org/packages/source/d/django-bootstrap-form/django-bootstrap-form-3.2.tar.gz#md5=1e95b05a12362fe17e91b962c41d139e
*   Re-using /var/folders/1x/2hf5hbs902q54g3bgby5bzt40000gn/T/django-bootstrap-form-3.2.tar.gz
*   Hash AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
* Editing requirements.txt

And once that's done...:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form *$)$ git diff
diff --git a/requirements.txt b/requirements.txt
index a6600f1..5f1374c 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -83,8 +83,8 @@ BeautifulSoup==3.2.1
 django_compressor==1.4
 # sha256: F3KVsUQkAMks22fo4Y-f9ZRvtEL4WBO50IN4I3IuoI0
 django-cronjobs==0.2.3
-# sha256: 2G3HpwzvCTy3dc1YE7H4XQH6ZN8M3gWpkVFR28OOsNE
-django-bootstrap-form==3.1
+# sha256: AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
+django-bootstrap-form==3.2
 # sha256: jiOPwzhIDdvXgwiOhFgqN6dfB8mSdTNzMsmjmbIBkfI
 regex==2014.12.24
 # sha256: ZY2auoUzi-jB0VMsn7WAezgdxxZuRp_w9i_KpCQNnrg
 

If you want to you can open up and inspect the downloaded package and check that no hacker has meddled with the package. Or, if you don't have time to do that, at least use the package locally and run your tests etc. If you now feel comfortable with the installed package you can be 100% certain that will be installed on your server once the code goes into production.

Be careful with using dict() to create a copy

September 9, 2015
9 comments Python

Everyone who's done Python for a while soon learns that dicts are mutable. I.e. that they can change.

One way of "forking" a dictionary into two different ones is to create a new dictionary object with dict(). E.g:


>>> first = {'key': 'value'}
>>> second = dict(first)
>>> second['key'] = 'other'
>>> first
{'key': 'value'}
>>> second
{'key': 'other'}

See, you can change the value of a key without affecting the dictionary it came from.

But, if one of the values is also mutable, beware!


>>> first = {'key': ['value']}
>>> second = dict(first)
>>> second['key'].append('second value')
>>> first
{'key': ['value', 'second value']}
>>> second
{'key': ['value', 'second value']}

This is where you need to use the built in copy.deepcopy.


>>> import copy
>>> first = {'key': ['value']}
>>> second = copy.deepcopy(first)
>>> second['key'].append('second value')
>>> first
{'key': ['value']}
>>> second
{'key': ['value', 'second value']}

Yay! Hope it helps someone avoid some possibly confusing bugs some day.

UPDATE

As ëRiC reminded me, there are actually three ways to make a "shallow copy" of a dictionary:

1) some_copy = dict(some_dict)

2) some_copy = some_dict.copy()

3) some_copy = copy.copy(some_dict) # after importing 'copy'

Examples of mozjpeg savings

September 1, 2015
5 comments Web development, Mozilla

I'm currently working on a Django library that uses mozjpeg to optimize thumbnails that are generated from stored images. I first wanted to get a feel for how good mozjpeg really is.

In my ~/Downloads directory I have all sorts of "junk" from all sorts of saves and experiments. It'll work as a good testbed of relatively random JPEG images of all sorts of sizes and qualities. Without further ado, here's the results:

FILENAME                                          OPTIMIZE   ORIGINAL     SAVING  PERCENT
-----------------------------------------------------------------------------------------
180697_1836563311933_3364808_n.jpg                  45.2Kb     50.4Kb      5.1Kb    10.2%
2014-03-20 17.35.39.jpg                           2040.1Kb   2207.8Kb    167.7Kb     7.6%
2015-03-04 21.18.16.jpg                           1521.5Kb   1629.2Kb    107.7Kb     6.6%
2015-03-04 21.19.16.jpg                           1602.4Kb   1720.0Kb    117.6Kb     6.8%
2015-03-04 21.23.16.jpg                           1181.7Kb   1272.1Kb     90.4Kb     7.1%
2015-03-05 06.03.00.jpg                           1426.7Kb   1557.7Kb    131.0Kb     8.4%
20150626_200629_001.jpg                           1566.4Kb   1717.3Kb    151.0Kb     8.8%
20150626_200631.jpg                               2157.6Kb   2319.6Kb    162.0Kb     7.0%
Boba_Fett_by_RobD4E.jpg                             96.2Kb    104.3Kb      8.1Kb     7.8%
Horse_Play.jpg                                     170.4Kb    185.2Kb     14.9Kb     8.0%
Image (107).jpg                                    344.9Kb    390.6Kb     45.7Kb    11.7%
Misc Candle Holder NECA FOTR Balrog Dec2002.jpg     37.1Kb     37.7Kb      0.6Kb     1.5%
Mozilla_Lightbeam.jpg                               55.1Kb     79.7Kb     24.6Kb    30.8%
Photo on 12-17-14 at 5.55 PM.jpg                   168.5Kb    187.7Kb     19.2Kb    10.2%
dev.jpg                                             17.5Kb     30.8Kb     13.3Kb    43.2%
dev2.jpg                                            41.1Kb     54.3Kb     13.3Kb    24.4%
dev3.jpg                                            35.3Kb     49.0Kb     13.7Kb    28.0%
dev4.jpg                                            42.0Kb     56.0Kb     14.0Kb    25.0%
dev5.jpg                                            24.6Kb     37.9Kb     13.2Kb    35.0%
dev6.jpg                                            28.9Kb     42.8Kb     13.9Kb    32.4%
hr_0570_220_135__0570220135006.jpg                3124.3Kb   3467.8Kb    343.5Kb     9.9%
hr_0570_220_158__0570220158006.jpg                3010.0Kb   3319.1Kb    309.1Kb     9.3%
hr_0570_220_175__0570220175006.jpg                2245.5Kb   2442.6Kb    197.0Kb     8.1%
hr_0570_227_599__0570227599006.jpg                2561.7Kb   2809.8Kb    248.1Kb     8.8%
hr_0596_622_701__0596622701006.jpg                3238.8Kb   3453.6Kb    214.7Kb     6.2%
hr_0596_623_849__0596623849006.jpg                2902.9Kb   3102.1Kb    199.3Kb     6.4%
hr_0622_219_873__0622219873006.jpg                 985.3Kb   1066.9Kb     81.7Kb     7.7%
logo.jpg                                            43.5Kb     51.2Kb      7.7Kb    15.1%
mvm-header.jpg                                       8.5Kb     12.4Kb      3.9Kb    31.6%
mvm-postcard-picture.jpg                            72.2Kb     73.4Kb      1.3Kb     1.7%
overhang_pixels.jpg                               3014.3Kb   3370.8Kb    356.4Kb    10.6%
peterbe copy.jpg                                     4.2Kb     10.4Kb      6.2Kb    59.7%
peterbe.jpg                                         36.7Kb     44.3Kb      7.5Kb    17.0%
pjt-mcguinty-2.jpg                                  96.8Kb    101.6Kb      4.8Kb     4.8%
sl1.jpg                                             28.7Kb     35.4Kb      6.7Kb    18.9%

That's an median of 9.3% (average of 15.3%) savings.

It's not very fast though. Some of the large files take more than a second. In total it took 23.7 seconds to create all of those optimized files. Do what you want with that fact, bear in mind that these are hopefully "once in a lifetime" operations (depending on the ephemerality of your thumbnail storage). Mind you, the really large JPEGs skew that since the median is 72.1 milliseconds and average is 527.0 milliseconds. Also, when I look through the numbers I find that the large JPGs take the longest but had the least benefit in terms of byte savings.

UPDATE

Chris Adams, in the comment below, inspired me to compare my trials with jpegoptim and jpegrescan. So, I took my script that generated a directory of 45 JPEGs and changed it to use jpegoptim and jpegrescan.

The mozjpeg total size of that output directory is 34.1Mb and it took a total of 23.3 seconds (median 76.4 milliseconds).

The jpegoptim & jpegrescan total size of that output directory is 35.6Mb and it took a total of 4.6 seconds (median 32.1 milliseconds).

In other words, roughly speaking mozjpeg is 4.2% more space effective and 58% slower than jpegoptim & jpegrescan.

Crash-stats just became a whole lot faster

August 25, 2015
0 comments Web development, Mozilla

tl;dr Crash-stats is Mozilla's crash reporter dashboard. Simply fixing the static assets made the site 25% faster.

Before http://www.webpagetest.org/result/150820_X5_V5T/

After http://www.webpagetest.org/result/150824_7F_1C3Q/

(The "First Byte Time" is still terrible but that's for another discussion. We're working on a re-write of the underlying data model for that particular report.)

  • Note how the SpeedIndex dropped from 2823 to 2098 which basically means, you can see stuff sooner.

  • The Load Time used to be 5.7 seconds on average. Now it takes 3.5 seconds.

  • It used to weigh 717 KB to load the whole thing. Now it weighs 326 KB.

The only thing we changed was a long overdue correction of static asset headers and Gzip compression. Now, files with unique URLs (e.g. /static/CACHE/css/23a811f100bc.css) have maximum aggressive cache headers. And now all .js, .css and text/html is Gzipped.

Was it easy to do? Hell no!
Does it matter? Hell yeah! We don't have a lot of users or traffic on these reports but the people who use them do this for a living and making the site feel snappier for them would make their lives more productive.

How to test if gzip is working on your site or app

August 20, 2015
0 comments Linux, Web development

Suppose that you've enabled gzip delivery of your site and its static assets. How do you test it?

One obvious way is to load the site with the developer tools in your browser and look at the headers there. Like this for example:
Is it gzip'ed? Yes!

Another more hard code and geek-power way is to simply use curl.

It goes without saying, the ideal way to set up Nginx is to make it optional. Don't upload a gzipped file to your server and force gzip down on every client. Instead, let something like Nginx handle it on-the-fly (don't worry, it's ultrafast).

So to see if gzip is working, take your URL and run these two commands:

$ curl -v --compressed http://www.example.com/page.cat > /dev/null

And look for this line:

< Content-Encoding: gzip

Also you should look for this line:

Content-Length: 2403

(number will obviously vary)

Now run the same curl command but without the --compressed. E.g.

$ curl -v http://www.example.com/page.cat > /dev/null

Now there won't be a Content-Encoding header in the response. It defaults to plain text.
Also, now look for the Content-Length and amuse yourself in the profit that this number is likely to be much larger than before.

Here's a realistic example:

With --compressed

$ curl -v --compressed https://crash-stats.allizom.org/home/products/Firefox > /dev/null
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 52.26.241.244...
* Connected to crash-stats.allizom.org (52.26.241.244) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate: crash-stats.allizom.org
* Server certificate: DigiCert SHA2 Secure Server CA
* Server certificate: DigiCert Global Root CA
> GET /home/products/Firefox HTTP/1.1
> User-Agent: curl/7.37.1
> Host: crash-stats.allizom.org
> Accept: */*
> Accept-Encoding: deflate, gzip
>
< HTTP/1.1 200 OK
< Content-Encoding: gzip
< Content-Type: text/html; charset=utf-8
< Date: Thu, 20 Aug 2015 18:38:00 GMT
* Server nginx/1.6.3 is not blacklisted
< Server: nginx/1.6.3
< Set-Cookie: anoncsrf=yieBMvzCn4fO4lmMQbjuq0Cibl9s7oxG; expires=Thu, 20-Aug-2015 20:38:00 GMT; httponly; Max-Age=7200; Path=/; secure
< Vary: Accept-Encoding
< Vary: Cookie
< X-Frame-Options: DENY
< Content-Length: 2403
< Connection: keep-alive
<
{ [data not shown]
100  2403  100  2403    0     0   4734      0 --:--:-- --:--:-- --:--:--  4730
* Connection #0 to host crash-stats.allizom.org left intact

Without

$ curl -v  https://crash-stats.allizom.org/home/products/Firefox > /dev/null
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 54.213.30.86...
* Connected to crash-stats.allizom.org (54.213.30.86) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate: crash-stats.allizom.org
* Server certificate: DigiCert SHA2 Secure Server CA
* Server certificate: DigiCert Global Root CA
> GET /home/products/Firefox HTTP/1.1
> User-Agent: curl/7.37.1
> Host: crash-stats.allizom.org
> Accept: */*
>
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< Date: Thu, 20 Aug 2015 18:38:05 GMT
* Server nginx/1.6.3 is not blacklisted
< Server: nginx/1.6.3
< Set-Cookie: anoncsrf=evG8kmoXjHv5aeyFIQHxNcnGahdxwIOy; expires=Thu, 20-Aug-2015 20:38:05 GMT; httponly; Max-Age=7200; Path=/; secure
< Vary: Accept-Encoding
< Vary: Cookie
< X-Frame-Options: DENY
< Content-Length: 12299
< Connection: keep-alive
<
{ [data not shown]
100 12299  100 12299    0     0  24314      0 --:--:-- --:--:-- --:--:-- 24306
* Connection #0 to host crash-stats.allizom.org left intact

Introducing optisorl

August 18, 2015
0 comments Python

optisorl is a Python package for sorl-thumbnail which is a kick-ass Python package for Django. sorl-thumbnail is pretty popular and used by a lot of people who have images they want to display as thumbnails.

A problem you find is that oftentimes the PNG thumbnails aren't as optimized as they can be. A great tool for having a second optimization pass on an PNG file is pngquant. You basically, run it like this:

$ ls -l bugzilla.png
-rw-r--r--@ 1 peterbe  staff  12188 Dec 12  2014 bugzilla.png
$ pngquant bugzilla.png
:~/Downloads$ ls -l bugzilla-fs8.png
-rw-r--r--@ 1 peterbe  staff  6630 Aug 18 13:15 bugzilla-fs8.png

That's a 140x140 pixel PNG that became 5,558 bytes smaller (46% saving).

Anyway, this is where optisorl comes in. It's an extension to sorl-thumbnail that is able to execute pngquant on the PNG right after the thumbnail file has been created. It does so by calling out a sub-process command to pngquant. See the code here which is all the magic there is to it really.

The reason I built this was to reduce the images on Air Mozilla. At the time I did the measurement, the PNGs total weight on the home page was 129KB and after running them all through optisorl the total weight was only 65KB.

To install, it just pip install it like so:

$ pip install optisorl

And you need to install pngquant like brew install pngquant or apt-get install pngquant.

Then, to activate it you need to set this Django setting:


THUMBNAIL_BACKEND = 'optisorl.backend.OptimizingThumbnailBackend'

If you decide to put the pngquant executable somewhere not on the PATH you can add to your settings.py file something like this:


PNGQUANT_LOCATION = '/path/to/bin/pngquant'

There's a bunch of features it doesn't have but we can work together on that. For example, there are certain PNG images that you might want to display as thumbnails but due to something about the image, e.g. its use of Alpha channels, you might want to explicitly disable optimizations.

News sites suck in terms of performance

August 8, 2015
5 comments Web development

These days, it's almost a painful experience reading newspaper news online.

It's just generally a pain to read these news sites. Because they are soooooo sloooooooow. Your whole browser is stuttering and coughing blood and every click feels like a battle.

The culprit is generally that these sites are so full of crap. Like click trackers, ads, analytics trackers, videos, lots of images and heavy Javascript.

Here's a slight comparison (in no particular order) of various popular news websites (using a DSL connection):

Site Load Time Requests Bytes in SpeedIndex
BBC News homepage 14.4s 163 1,316 KB 6935
Los Angeles Times 35.4s 264 1,909 KB 13530
The New York Times 28.6s 330 4,154 KB 17948
New York Post 40.7s 197 6,828 KB 13824
USA Today 19.6s 368 2,985 KB 3027
The Washington Times 81.7s 547 12,629 KB 18104
The Verge 18.9s 152 2,107 KB 7850
The Huffington Post 22.3s 213 3,873 KB 4344
CNN 45.9s 272 5,988 KB 12579

Wow! That's an average of...

  • 34.2 seconds per page
  • 278 requests per page
  • 4,643 KB per page

That is just too much. I appreciate that these news companies that make these sites need to make a living and most of the explanation of the slow-down is the ads. Almost always, if you click to open one of those pages, it's just a matter of sitting and waiting. It loading in a background tab takes so much resources you feel it in other tabs. And by the time you go to view and you start scrolling your computer starts to cough blood and crying for mercy.

But what about the user? Imagine if you simply pull out of some of the analytics trackers and opt for simple images for the ads and just simplify the documents down to the minimum? Google Search does this and they seem to do OK in terms of ad revenue.

An interesting counter-example; this page on nytimes.com is a long article and when you load it on WebPageTest it yields a result of 163 requests. But if you use your browser dev tools and keep an eye on all network traffic as you slowly scroll down the length of the article all the way down to the footer, you notice it downloads an additional 331 requests. It starts with about 2KB of data but by the time you've downloaded to the end and scrolled past all ads and links it's amassed about 5.5KB of data. I think that's fair. At least the inital feeling when you arrive on the page isn't that of disgust.

Another realization I've found whilst working on this summary is that oftentimes, sites that are REALLY really slow and horrible to use don't necessarily have super many external resources from different domains but they just have far far too much Javascript. This random page on Washington Times for example has 209 Javascript files that together weighs 9.4KB (roughly 8 times the amount of image data). Not only does that need to be downloaded. It also needs to be parsed and I bet a zillion event handlers and DOM manipulators kick in and basically make scrolling a crying game even on a fast desktop computer.

And then it hit me!

We've all been victims of websites annoyingly try to lure you to install their native app when trying to visit their website on a smartphone. The reason they're doing that is because they've been given a second chance to build something without eleventeen iframes and 4MB of Javascript so the native app feels a million better because they've started over and gotten things right.

My parting advice to newspapers who can do something about their website; Radically refactor your code. Question EVERYTHING and delete, remove and clean up things that doesn't add value. You might lose some stats and you might not be able to show as much ads but if your site loads faster you'll get a lot more visitors and a lot more potential to earn bigger $$$ on fewer more targetted ads.