Mozilla Symbol Server (aka. Tecken) load testing

September 6, 2017
0 comments Python, Web development, Django, Mozilla

(Thanks Miles Crabil not only for being an awesome Ops person but also for reviewing this blog post!)

My project over the summer, here at Mozilla, has been a project called Mozilla Symbol Server. It's a web service that uploads C++ symbol files, downloads C++ symbol files and symbolicates C++ crash stacktraces. It went into production last week which was fun but there's still lots of work to do on adding beyond-parity features and more optimizations.

What Is Mozilla Symbol Server?

The code name for this project is Tecken and it's written in Python (Django, Gunicorn) and uses PostgreSQL, Redis and Celery. The frontend is entirely static and developed (almost) as a separate project within. The frontend is written in React (using create-react-app and react-router). Everything is run as Docker containers. And if you ask me more details about how it's configured/deployed I'm afraid I have to defer to the awesome Mozilla CloudOps team.

One the challenges I faces developing Tecken is that symbol downloads need to be fast to handle high volumes of traffic. Today I did some load testing on our stage deployment and managed to start 14 concurrent clients that bombarded our staging server with realistic HTTPS GET queries based on log files. It's actually 7 + 1 + 4 + 2 concurrent clients. 7 of them from a m3.2xlarge EC2 node (8 vCPUs), 1 from a m3.large EC2 node (1 vCPU), 2 from two separate NYC based DigitalOcean personal servers and 2 clients here from my laptop on my home broadband. Basically, each loadtest script process got its own CPU.

Total req/s
It's hard to know how much more each client could push if it wasn't slowed down. Either way, the server managed to sustain about 330 requests per second. Our production baseline goal is to able to handle at least 40 requests per second.

After running for a while the caches started getting warm but about 1-5% of requests do have to make a boto3 roundtrip to an S3 bucket located on the other side of America in Oregon. There is also a ~5% penalty in that some requests trigger a write to a central Redis ElastiCache server. That's cheaper than the boto3 S3 call but still hefty latency costs to pay.

The ELB in our staging environment spreads the load between 2 c4.large (2 vCPUs, 3.75GB RAM) EC2 web heads. Each running with preloaded Gunicorn workers between Nginx and Django. Each web head has its own local memcached server to share memory between each worker but only local to the web head.

Is this a lot?

How long is a rope? Hard to tell. Tecken's performance is certainly more than enough and by the sheer fact that it was only just production deployed last week tells me we can probably find a lot of low-hanging fruit optimizations on the deployment side over time.

One way of answering that is to compare it with our lightest endpoint. One that involves absolutely no external resources. It's just pure Python in the form of ELB → Nginx → Gunicorn → Django. If I run hey from the same server I did the load testing I get a topline of 1,300 requests per second.

$ hey -n 10000 -c 10 https://symbols.stage.mozaws.net/__lbheartbeat__
Summary:
  Total:    7.6604 secs
  Slowest:  0.0610 secs
  Fastest:  0.0018 secs
  Average:  0.0075 secs
  Requests/sec: 1305.4199
...

That basically means that all the extra "stuff" (memcache key prep, memcache key queries and possible other high latency network requests) it needs to do in the Django view takes up roughly 3x the time it takes the absolute minimal Django request-response rendering.

Also, if I use the same technique to bombard a single URL, but one that actually involves most code steps but is definitely able to not require any slow ElastiCache writes or boto3 S3 reads you I get 800 requests per second:

$ hey -n 10000 -c 10 https://symbols.stage.mozaws.net/advapi32.pdb/5EFB9BF42CC64024AB64802E467394642/advapi32.sy
Summary:
  Total:    12.4160 secs
  Slowest:  0.0651 secs
  Fastest:  0.0024 secs
  Average:  0.0122 secs
  Requests/sec: 805.4150
  Total data:   300000 bytes
  Size/request: 30 bytes
...

Lesson learned

Max CPU Used
It's a recurring reminder that performance is almost all about latency. If not RAM or disk it's networking. See the graph of the "Max CPU Used" which basically shows that CPU of user, system and stolen ("CPU spent waiting for the hypervisor to service another virtual CPU") never sum totalling over 50%.

A neat trick to zip a git repo with a version number

September 1, 2017
4 comments Linux, Web development

I have this WebExtension addon. It's not very important. Just a web extension that does some hacks to GitHub pages when I open them in Firefox. The web extension is a folder with a manifest.json, icons/icon-48.png, tricks.js, README.md etc. To upload it to addons.mozilla.org I first have to turn the whole thing into a .zip file that I can upload.

So I discovered a neat way to make that zip file. It looks like this:

#!/bin/bash

DESTINATION=build-`cat manifest.json | jq -r .version`.zip
git archive --format=zip master > $DESTINATION

echo "Created..."
ls -lh $DESTINATION

You run it and it creates a build-1.0.zip file containing all the files that are checked into the git repo. So it discards my local "junk" such as backup files or other things that are mentioned in .gitignore (and .git/info/exclude).

I bet someone's going to laugh and say "Duhh! Of course!" but I didn't know you can do that easily. Hopefully posting this it'll help someone trying to do something similar.

Note; this depends on jq which is an amazing little program.

Ultrafast loading of CSS

September 1, 2017
3 comments Web development, JavaScript

tl;dr; The ideal web performance, with regards to CSS, is to inline the minimal CSS and lazy load the rest after load.

Two key things to understand/appreciate:

  1. The fastest performing web page is one that isn't blocked on rendering.

  2. You use some CSS framework kitchen sink because you're not a CSS guru.

How to deal with this?

Things like HTTP2 and CDNs and preload are nice because they make the network lookup for your main.88c468ef.css file as fast as possible. But what's even faster is to include the CSS with the HTML that the server responds in the first place. Why? Because when the browser downloads your HTML (e.g. GET /) as it parses the HTML document it sees that <link rel="stylesheet" href="/main.88c468ef.css"> there and decides to not render any DOM to screen until that CSS file has been downloaded and parsed. It does this because it doesn't want to have to paint the DOM (as it would look like without CSS) and then repaint the DOM again, this time with CSS rules.

Point number 2 basically boils down to the likely fact that your app depends on somecssframework.min.css like Bootstrap, Bulma or Foundation. They're large blobs of CSS for doing all sorts of types of HTML (e.g. cards, tables, navbar menus etc.). These CSS frameworks are super useful because they make your app look pretty. But they're usually big. Really big.

Popular CSS frameworks:

Framework Size Gzipped
bootstrap.min.css 122K 18K
foundation.min.css 115K 16K
semantic.min.css 553K 93K
bulma.min.css 141K 18K

Actually the size difference isn't hugely important. What's important is that it's yet another thing that needs to be downloaded before the page can start to render. If the URL is in the user's cache, great. Even better, if it's cached by a service worker. However if you care about loading performance (judging by the fact that you're still reading), you know that a large majority of your visitors only come to your site sometimes (according to Google Analytics, 92.7% of my visitors are "new visitors"). Perhaps from a Google search. Or perhaps they visit sometimes but rarely enough that by the time they return their browser cache will have "moved on" and reset (to save disk space) what was previously cached.

CSS is a render blocking resource

With and without render blocking CSS
See Ilya Grigorik's primer on Render Blocking CSS.

It's also easy to demonstrate. Check out this Webpagetest Visual Comparison that compares two pages that are both styled with bootstrap.min.css except one of them uses a piece of JavaScript at the bottom of the page that enables the stylesheet after the page has loaded.

So if it's blocking. What to do about it? Well, make it not blocking. But how?

Solution 1

The simplest solution is to simply move any <link rel="stylesheet" href="bootstrap.min.css"> out of the <head> and put them just before the </body> tag. Here's an example of that.

It's valid HTML5 and seems to work just fine in Safari iOS. The only problem is that pesky "Flash of Unstyled Content" (aka. "FOUT") effect where the user is presented with the page very briefly without any styling, then the whole page re-renders onces the stylesheets have loaded. Chrome and iOS actually block the rendering still. So it's not like JavaScript whereby putting it late in the DOM. In other words, not really a good solution at all.

You can see in this Webpagetest that the "Start render" happens after the .css files have been loaded and parsed.

Solution 2

With JavaScript you can put in code that's definitely going to be executed after the rendering starts and, also, after the first rendering is finished (i.e. "DOM Content Loaded").

This technique is best done with loadCSS which can be done really well if you tune it. In particular the rel="preload" feature is getting more and more established. It used to only work in Chrome and Opera but will soon work in Firefox and iOS Safari too. Note, loadCSS contains a polyfill solution to the rel="preload" thing.

The basics is that you load a piece of JavaScript late which, as soon as it can, puts the <link rel="stylesheet" href="bootstrap.min.css"> into the DOM. You still have the Flash of Unstyled Content effect to confront and that's annoying.

Here's an example implementation. It uses the scripts and techniques laid out by filamentgroup's loadCSS.

It works and the rel="preload" is a bonus for Chrome and Opera users because once the JavaScript "kicks in" the network loading is quite possible already done. As seen in this Webpagetest using Chrome the .css files start downloading before the lazyloadcss.js file has even started downloading.

It's not as hot in Firefox because all the .css files downloading is delayed until after the lazyloadcss.js has loaded and executed.

Solution 3

Just inline all the CSS. Instead of <link rel="stylesheet" href="bootstrap.min.css"> you just make it inline. Like:


<style type="text/css">
/*!
 * Bootstrap v4.0.0-beta (https://getbootstrap.com)
 * Copyright 2011-2017 The Bootstrap Authors
 * Copyright 2011-2017 Twitter, Inc.
 * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)
 */@media print{*,::after,::before{text-shadow:none!important;box-shadow:none!important}......
</style>

All 123KB of it. Why not?! It has to be downloaded sooner or later anyway, might as well nip it in the bud straight away. The Flash of Unstyled Content problem goes away. So does the problem of having to load JavaScript tricks to make the CSS loading non-blocking.

The obvious and immediate caveat is that now the whole HTML document is huge! In this example page the whole HTML document is 127KB (20KB gzipped) whence the regular one is 4.1KB (1.4KB gzipped). And if your visitors, if you're so lucky, click on any other internal link that's another 127KB that has to be downloaded again.

The biggest caveat is that downloading a large HTML document is bad because no other resources (images for example) can be downloaded in parallel whilst the browser is working on rendering the page with what it's downloaded so far. If you compare this Webpagetest with the regular traditional one, you can see that it takes almost 354ms to download the HTML with all CSS inlined compared to 262ms when the CSS was linked. That's roughly 100ms wasted where the browser could start download other resources, like images.

Solution 4

Solution 3 was kinda good because it avoided the Flash of Unstyled Content and it avoided all extra resource loading. However, we can do better.

Instead of inlining all CSS, how about we take out exactly only the CSS we need out of bootstrap.min.css and just inline that. Then, after the page has loaded, we can download the rest of bootstrap.min.css and that way it's ready with all the other selectors and stuff it needs as the page probably changes and morphs depending on interactive JavaScript which is stuff that can and will happen after the initial load.

But how do you know exactly which CSS you need for that initial load? Really, you don't. You have two options:

  1. Manually inspected what DOM elements you have in your initial HTML and start slowly plucking that out of the Bootstrap CSS file.

  2. Automate the inspection of what DOM elements you have in your initial HTML.

Before we dig deeper into the how to automate the inspection let's look at what it'd look like: This page and when Webpagetested. What's cool here is that the DOM is ready in 265ms (it was 262ms when there was no linked CSS).

Notice that there's no Flash of Unstyled Content. No external dependencies. It's basically an inline <style> block with exactly the selectors that are needed and nothing more. The HTML is larger, at 13KB (3.3KB gzipped), but remember it was 4.1KB when we started and the solution where we inlined everything was 127KB.

The immediate problem with this is that we're missing some nice CSS for things that haven't been needed yet. For example, there might be some JavaScript that changes the DOM based on something the user does with the page. For example, clicking on something that adds more elements to the DOM. Or, equally likely, after the the DOM has loaded, an XHR query is made to download some data and display it in a way that needs CSS selectors that weren't included in the minimal set.

By the way, this very blog post builds on this solution. If you're on your desktop browser you can view source and see that there's only inline style blocks.

Solution 5

This builds on Solution 4. The HTML contains the minimal CSS needed for that first render and as soon as possible we additionally download the whole bootstrap.min.css so that it's available if/when the DOM mutates and needs the full CSS not in the minimal CSS.

Basically, let's take Solution 2 (JavaScript lazy loads in the CSS) + Solution 4 (the minimal CSS inlined). Here is one such solution

And there we have it! The ideal solution. The only thing remaining is to verify that it actually makes a difference.

The Webpagetest Final Showdown

We have 5 solutions. Each one different from the next. Let's compare them against each other.

Here it is in its full glory

Visual comparison on WebPagetest.org
(image if you can't open the Webpagetest page right now)

What we notice:

  1. The regular do-nothing solution is 50% slower than the best solution. 3.2 seconds verus 2.2 sceonds.
  2. Putting the <link rel="stylesheet" ...> tags at the bottom of the document doesn't work in Chrome and doesn't do anything good.
  3. Lazy loading the CSS with JavaScript (with no initial CSS) displays content very early but the repaint means it takes unnecessarily longer to load the whole thing.
  4. The ideal solution (Solution 5) loads as fast, visually, as Solution 4 but has the advantage that all CSS is there, eventually.
  5. Inlining all CSS (Solution 3) is only 23% slower than the ideal solution (Solution 5). But, it's much easier to implement. Seriously consider this if your tooling is limited.

Conclusion

One humbling thing to notice is that the difference isn't actually that huge. In this particular example we managed to go from 3.2 seconds to 2.2 seconds (using a 3G connection). The example playground used in this experiment is very far from a real site. Most possibly, a real site is a lot more complex and full of lots more potential bottlenecks that slows things down. For example, instead of obsessing over the CSS payload, perhaps you can make a bigger impact by simply dropping some excessive JavaScript plugins that might not necessarily be needed. Or you can focus on your 2.5MB total of big images.

However, a key ingredient to web performance is to leverge the loading time the best possible way. If you get the CSS un-blocking rendering right, your users' browsers can spend more time, sooner, on other resources such as images and XHR.

UPDATE March 2018

A lot of this work of figuring out the minimal CSS from a DOM has now been put in a rapidly maturing and well tested Nodejs project called minimalcss.

Fastest way to match a filename's extension in Python

August 31, 2017
4 comments Python

tl;dr; By a slim margin, the fastest way to check a filename matching a list of extensions is filename.endswith(extensions)

This turned out to be premature optimization. The context is that I want to check if a filename matches the file extension in a list of 6.

The list being ['.sym', '.dl_', '.ex_', '.pd_', '.dbg.gz', '.tar.bz2']. Meaning, it should return True for foo.sym or foo.dbg.gz. But it should return False for bar.exe or bar.gz.

I put together a litte benchmark, ran it a bunch of times and looked at the results. Here are the functions I wrote:


def f1(filename):
    for each in extensions:
        if filename.endswith(each):
            return True
    return False


def f2(filename):
    return filename.endswith(extensions_tuple)


regex = re.compile(r'({})$'.format(
    '|'.join(re.escape(x) for x in extensions)
))


def f3(filename):
    return bool(regex.findall(filename))


def f4(filename):
    return bool(regex.search(filename))

The results are boring. But I guess that's a result too:

FUNCTION             MEDIAN               MEAN
f1 9543 times        0.0110ms             0.0116ms
f2 9523 times        0.0031ms             0.0034ms
f3 9560 times        0.0041ms             0.0045ms
f4 9509 times        0.0041ms             0.0043ms

For a list of ~40,000 realistic filenames (with result True 75% of the time), I ran each function 10 times. So, it means it took on average 0.0116ms to run f1 10 times here on my laptop with Python 3.6.

More premature optimization

Upon looking into the data and thinking about this will be used. If I reorder the list of extensions so the most common one is first, second most common second etc. Then the performance improves a bit for f1 but slows down slightly for f3 and f4.

Conclusion

That .endswith(some_tuple) is neat and it's hair-splittingly faster. But really, this turned out to not make a huge difference in the grand scheme of things. On average it takes less than 0.001ms to do one filename match.

React lifecycle hooks must-have

August 13, 2017
1 comment Web development, JavaScript, React

I don't know who made this flowchart originally, but whoever you are: Thank you!

At this point, in my React learning I think I've memorized much of this but it's taken me a lot of time and having to dig up the documentation again. (Also, not to mention the number of times I've typo'ed componentWillReciveProps and componentWillRecevieProps etc.)

Remember this; You don't need to know all of these by heart to be good at React. In fact, there's several of these that I almost never use.

React lifecycle hooks flowchart

UPDATE

The above link is dead. Use this blog post instead.

UPDATE April 2018

Here's an even better one from @dan_abramov:

React life-cycle hooks

Fastest *local* cache backend possible for Django

August 4, 2017
11 comments Python, Web development, Django

I did another couple of benchmarks of different cache backends in Django. This is an extension/update on Fastest cache backend possible for Django published a couple of months ago. This benchmarking isn't as elaborate as the last one. Fewer tests and fewer variables.

I have another app where I use a lot of caching. This web application will run its cache server on the same virtual machine. So no separation of cache server and web head(s). Just one Django server talking to localhost:11211 (memcached's default port) and localhost:6379 (Redis's default port).

Also in this benchmark, the keys were slightly smaller. To simulate my applications "realistic needs" I made the benchmark fall on roughly 80% cache hits and 20% cache misses. The cache keys were 1 to 3 characters long and the cache values lists of strings always 30 items long (e.g. len(['abc', 'def', 'cba', ... , 'cab']) == 30).

Also, in this benchmark I was too lazy to test all different parsers, serializers and compressors that django-redis supports. I only test python-memcached==1.58 versus django-redis==4.8.0 versus django-redis==4.8.0 && msgpack-python==0.4.8.

The results are quite "boring". There's basically not enough difference to matter.

Config Average Median Compared to fastest
memcache 4.51s 3.90s 100%
redis 5.41s 4.61s 84.7%
redis_msgpack 5.16s 4.40s 88.8%

UPDATE

As Hal pointed out in the comment, when you know the web server and the memcached server is on the same computer you should use UNIX sockets. They're "obviously" faster since the lack of HTTP overhead at the cost of it doesn't work over a network.

Because running memcached on a socket on OSX is a hassle I only have one benchmark. Note! This basically compares good old django.core.cache.backends.memcached.MemcachedCache with two different locations.

Config Average Median Compared to fastest
127.0.0.1:11211 3.33s 3.34s 81.3%
unix:/tmp/memcached.sock 2.66s 2.71s 100%

But there's more! Another option is to use pylibmc which is a Python client written in C. By the way, my Python I use for these microbenchmarks is Python 3.5.

Unfortunately I'm too lazy/too busy to do a matrix comparison of pylibmc on TCP versus UNIX socket. Here are the comparison results of using python-memcached versus pylibmc:

Client Average Median Compared to fastest
python-memcached 3.52s 3.52s 62.9%
pylibmc 2.31s 2.22s 100%

UPDATE 2

https://plot.ly/~jensens/36.embed

Seems my luck someone else has done the matrix comparison of python-memcached vs pylibmc on TCP vs UNIX socket:

https://plot.ly/~jensens/36.embed

Find static files defined in django-pipeline but not found

July 25, 2017
0 comments Python, Django

If you're reading this you're probably familiar with how, in django-pipeline, you define bundles of static files to be combined and served. If you're not familiar with django-pipeline it's unlike this'll be of much help.

The Challenge (aka. the pitfall)

So you specify bundles by creating things in your settings.py something like this:


PIPELINE = {
    'STYLESHEETS': {
        'colors': {
            'source_filenames': (
              'css/core.css',
              'css/colors/*.css',
              'css/layers.css'
            ),
            'output_filename': 'css/colors.css',
            'extra_context': {
                'media': 'screen,projection',
            },
        },
    },
    'JAVASCRIPT': {
        'stats': {
            'source_filenames': (
              'js/jquery.js',
              'js/d3.js',
              'js/collections/*.js',
              'js/aplication.js',
            ),
            'output_filename': 'js/stats.js',
        }
    }
}

You do a bit more configuration and now, when you run ./manage.py collectstatic --noinput Django and django-pipeline will gather up all static files from all Django apps installed, then start post processing then and doing things like concatenating them into one file and doing stuff like minification etc.

The problem is, if you look at the example snippet above, there's a typo. Instead of js/application.js it's accidentally js/aplication.js. Oh noes!!

What's sad is it that nobody will notice (running ./manage.py collectstatic will exit with a 0). At least not unless you do some careful manual reviewing. Perhaps you will notice later, when you've pushed the site to prod, that the output file js/stats.js actually doesn't contain the code from js/application.js.

Or, you can automate it!

A Solution (aka. the hack)

I started this work this morning because the error actually happened to us. Thankfully not in production but our staging server produced a rendered HTML page with <link href="/static/css/report.min.cd784b4a5e2d.css" rel="stylesheet" type="text/css" /> which was an actual file but it was 0 bytes.

It wasn't that hard to figure out what the problem was because of the context of recent changes but it would have been nice to catch this during continuous integration.

So what we did was add an extra class to settings.STATICFILES_FINDERS called myproject.base.finders.LeftoverPipelineFinder. So now it looks like this:


# in settings.py

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
    'pipeline.finders.PipelineFinder',
    'myproject.finders.LeftoverPipelineFinder',  # the new hotness!
)

And here's the class implementation:


from pipeline.finders import PipelineFinder

from django.conf import settings
from django.core.exceptions import ImproperlyConfigured


class LeftoverPipelineFinder(PipelineFinder):
    """This finder is expected to come AFTER 
    django.contrib.staticfiles.finders.FileSystemFinder and 
    django.contrib.staticfiles.finders.AppDirectoriesFinder in 
    settings.STATICFILES_FINDERS.
    If a path is looked for here it means it's trying to find a file
    that none of the regular staticfiles finders couldn't find.
    """
    def find(self, path, all=False):
        # Before we raise an error, try to find out where,
        # in the bundles, this was defined. This will make it easier to correct
        # the mistake.
        for config_name in 'STYLESHEETS', 'JAVASCRIPT':
            config = settings.PIPELINE[config_name]
            for key in config:
                if path in config[key]['source_filenames']:
                    raise ImproperlyConfigured(
                        'Static file {!r} can not be found anywhere. Defined in '
                        "PIPELINE[{!r}][{!r}]['source_filenames']".format(
                            path,
                            config_name,
                            key,
                        )
                    )
        # If the file can't be found AND it's not in bundles, there's
        # got to be something else really wrong.
        raise NotImplementedError(path)

Now, if you have a typo or something in your bundles, you'll get a nice error about it as soon as you try to run collectstatic. For example:

▶ ./manage.py collectstatic --noinput
Post-processed 'css/search.min.css' as 'css/search.min.css'
Post-processed 'css/base.min.css' as 'css/base.min.css'
Post-processed 'css/base-dynamic.min.css' as 'css/base-dynamic.min.css'
Post-processed 'js/google-analytics.min.js' as 'js/google-analytics.min.js'
Traceback (most recent call last):
...
django.core.exceptions.ImproperlyConfigured: Static file 'js/aplication.js' can not be found anywhere. Defined in PIPELINE['JAVASCRIPT']['stats']['source_filenames']

Final Thoughts

This was a morning hack. I'm still not entirely sure if this the best approach, but there was none better and the result is pretty good.

We run ./manage.py collectstatic --noinput in our continous integration just before it runs ./manage.py test. So if you make a Pull Request that has a typo in bundles.py it will get caught.

Unfortunately, it won't find missing files if you use foo*.js or something like that. django-pipeline uses glob.glob to convert expressions like that into a list of actual files and that depends on the filesystem and all of that happens before the django.contrib.staticfiles.finders.find function is called.

If you have any better suggestions to solve this, please let me know.

Why I'm ditching AdBuff on Songsear.ch

July 20, 2017
0 comments Web development

I'm a performance nerd and if something isn't as fast as it can be it hurts my soul.
I have this side project called SongSear.ch. It's a lyrics search engine with over 2 million songs.

To try to make a buck to pay for the hosting cost, I put in ads. The only one I could find that does NOT use document.write is AdBuff. But their technical implementation, pardon my French, sucks! It's redirects upon redirects in an iframe over HTTP. Granted, it loads async but it's still dragging down performance for people on low-end devices and on mobile networks.

Today I decided to take the AdBuff ads off. Let's see if it made a difference performance-wise:

Before

Load waterfall WITH ads

After

Load waterfall WITHOUT ads

Basically, 2.7 seconds (on LTE) instead of 14.3 seconds. And 211Kb of data instead of 1Mb.

So how much money do I stand to lose for ditching these ads? Well, I've earned a grand total of $10.82 in total for 1,214,072 impressions. That's what I spend on hosting this project every 6 days. Clearly this isn't working out.

UPDATE

Adversal.com emailed me that they rolled out some new features including "Blazing fast ad code". Sure. Let's try it then.
Elements of it might be fast but nothing is fast when it needs some ~80 extra requests.

Adversal downloads a LOT of stuff

Best YouTube Chefs of 2017

July 14, 2017
5 comments Misc. links

tl;dr Highly subjective rundown of YouTube channels cooking inspiring stuff. According to me.

I am, by no means, an expert at cooking or media. I'm a web developer. But I do watch lots of cooking videos on YouTube. So it makes me an expert based on what I see. ...subjectively.

This is list based on the gutteral feeling of imagining opening YouTube (most usually on my Roku) and noticing that all chefs I follow have released a new video. Which order I would watch them is the order I most like their material.

My General "Criteria"

This is what I base "good" on:

  1. Fun and enjoyable to watch
  2. Accessible to cook (doesn't have to be simple but has to be something I can see myself cooking and eating)
  3. Ability to actually do ...later (ideally a link in the video description to the transcript plus ingredient list)
  4. Mouthwatering to look at and imagine to smell
  5. If I make this, will my family love me more?

Number 1: J. Kenji López-Alt and SeriousEats

J. Kenji López-Alt

Before I start my praise; a piece of constructive criticism: It's totally confusing what's Kenji's personal stuff and what's SeriousEats. I still haven't grokked the interconnection but usually it doesn't matter because I mentally lump them together as one.

My wife calls him "my boyfriend" and I don't deny that I adore him. He's everything I want to be as a chef; down-to-earth, professional, varied, scientific, and loves the simple recipes just as much as any busy parent with a family to feed.

The videos are short and focussed and the on-screen extra facts and figures makes it easier to follow along. He also does short little videos about techniques such as how to slice onions.

He's also the author of the book The Food Lab: Better Home Cooking Through Science which I sometimes read before going to bed. There's comfort in knowing that he's done empirical research, like a chemist, and not just basing his statements on years of experience.

Number 2: Food Wishes aka. Chef John

Food Wishes

Just like Kenji, Chef John, is an american chef who records relatively simple recipes but spruces them up to be quite fancy and impressive. His menu is varied and feels like a mix of American and French cuisine in an accessible format. All his videos (I think) as self-recorded and you never see more than his hands. They're one video per thing and sticks to the point which makes it easy to follow along. Also, he talks over the video with little tips like "If you don't have self-raising flour at home you can just use ...jada jada".

All of his recipes are mirrored on his blog which I often have opened on my iPhone right on the kitchen counter, trying to replicate what I've watched.

An extra bonus is that he's got a quirky style and sense of humor that my wife also likes so she usually joins in watching his videos.

Number 3: America's Test Kitchen

America's Test Kitchen

I have to admit that I still haven't really understood what this corporation is. There's Cook's Country which spans "America's Test Kitchen", "Cook's Illustrated", "cook's science". I used to watch their show on PBS, led by Christhoer Kimball, and although it was good, it was a bit too time consuming for the impatient me.

Either way, the main America's Test Kitchen YouTube channel is quite focussed on kitchen equipment and cooking techniques. They're wonderful. So many times I've concluded that my (somekitchengadget) isn't great, then I check their videos and then I go straight to Amazon and buy it. Last item was the di Oro Living - Large Silicone Spatula.

Their kitchen equipment testing is rigorous. When they talk about all the hard core testing they've done to pots, pans, cutting boards, knives, bowls etc. you get that comforting sense that if you do take their advice and buy the thing, it's really going to be the best there is. It's the same with some of their more "dry" videos about empirical testing of cooking techniques. For example, "The Secrets of Cooking Rice". You watch it once, try to remember what you learned, and how you can walk around knowing this is thee way to do something. QED.

Number 4: Gordon Ramsay

Gordon Ramsay

Beyond swear words and putdowns, if there's one thing you can learn from Gordon is is passion for food. Like a naturally genius salesman he can think up the most wonderfully positive words for the ingredients, the smells, the tastes and the foods aesthetics. The passion is contagious!

I never actually watched any of his shows that got him famous. It's "too TV" for my taste. Too much aggression and better-than-you. But beyond that, his videos are impressive. It's also a nice to have in that he makes you think you can cook all that fancy stuff from fancy restaurants you can't afford to go to.

Some of the things he cooks might be a bit more advanced for my skillset (and time!) but there are lots of nuggets of "panache" that you can tuck away in the back of your brain.

By the way Gordon, I have not signed up for your online cooking courses because, basically, this blog post. There's just too much good content from other chefs' channels. Sorry.

Number 5: Jamie Oliver's Food Tube

Jamie Oliver

I've always been a big Jamie Oliver fan. His Jamie's Food Revolution (called "Ministry of Food" in the UK) is probably the book I've used the most in the last couple of years. He's got that impressive and contagious passion for food an ingredients. He's funny and accessible, and his cooking is quite advanced and grand. Some recipes are that extra big of extravagance that a home cook sometimes needs to spruce things up.

Food Tube is his YouTube channel and it contains lots of videos by other chefs. They almost all have one thing in common; they're as nuts as Jamie. I get the feeling that Jamie recruits people into his channel on "Can you be more hyper than me?".

In all fairness some videos are almost too hyper. You get a bit flustered by the high pace and flamboyance and you struggle to take notes. But when it gets a bit too intense I just watch it with the practical part of my brain switched off, just to get inspired.

Number 6: The Dumpling Sisters

Dumpling Sisters

I don't really get it, if they are part of Jamie Oliver's Food Tube or not. Some videos have the name in it. Some don't. Doesn't matter much. These ladies are wonderful. It's probably one my more in-accessible channels because most of their cooking is Asian (Chinese style) and although it's mouthwatering and something I could easily devour with a pair of chopsticks, many things become hard to cook due to ingredients. I live in a suburban community in South Carolina and we don't have one of those, good, Asian supermarkets around here. I guess I could order some of the things online but it's not so easy.

Having said that, their "Perfect Special Fried Rice" is a favorite of mine. I've probably cooked it, in variations, almost 10 times now. Always a hit!

Even though I, admittedly, am failing at my Chinese cooking, I do like this channel to keep me inspired and dreaming about going back to China and learn more about its large cuisine. Gotta have something that reminds me of all the delicious weird and wonderful I've had there.

Number 7: Cupcake Jemma

Cupcake Jemma

I'll be honest; I've never attempted anything of the stuff she bakes. First of all, I almost never bake. And if I do it's either Swedish style cakes or loafs of bread. But there's just something about Jemma. She's so charming and "cute". I know cupcakes are serious business to a lot of people but I actually don't really like cupcakes. However, watching her videos does make you think "some day, I'll attempt that".

If it's one thing I've learned from watching her videos, it's that a lot of baking requires vanilla extract. She uses almost as much vanilla extract as Jamie Oliver uses olive oil.

Also, as mentioned before, someone having passion for what they're cooking is inspiring and contagious. Her videos are so cheerful and colorful it's almost like a free injection of happiness.

Number 8: Kitchen Conundrums with Joseph Thomas

Thomas Joseph

This isn't a channel. The link above is a playlist. I think his videos are part of Everyday Food which is something I haven't explored yet.

His videos are nice because he breaks down things into small easy-to-understand parts. He's also good at showing how to do certain actions. Like, which pot to use when and how to poor it etc. It's quite advanced stuff (but not all!) but I did manage to make French Macaroons from his video. The didn't have that perfect shape you get in a nice bakery but it actually turned out pretty well.

My only criticism is that his kitchen is too neat. Makes my look like a battlefield in comparison.

Rounding Up

Come to think of it, these are the only YouTube cooking channels I follow. Occassionally YouTube recommends other awesome stuff (based on their recommendation engine) that I watch but don't necessarily subsribe. The ordering above is kinda silly (apart from Kenji being my number 1 at the moment). They're all good!

I'm not a great chef. I don't have that "magic" of being able to smell my way into which seasonings to add mid-way. Nor do I immediately just know what to do with all sorts of veggies I see in the supermarket. But I try. I have more cookbooks now than I have years to live and, clearly, I watch a lot of inspiring cooking videos. Very little of what I learn seems to stick in my memory, but YouTube's search engine and my watch history has proven to be my way of memorizing recipes and techniques.

Please please please, share your thoughts and tips about other channels I should check out. Not that these channels aren't keeping me busy but I'm always curious to add more.

How to do performance micro benchmarks in Python

June 24, 2017
7 comments Python

Suppose that you have a function and you wonder, "Can I make this faster?" Well, you might already have thought that and you might already have a theory. Or two. Or three. Your theory might be sound and likely to be right, but before you go anywhere with it you need to benchmark it first. Here are some tips and scaffolding for doing Python function benchmark comparisons.

Tenets

  1. Internally, Python will warm up and it's likely that your function depends on other things such as databases or IO. So it's important that you don't test function1 first and then function2 immediately after because function2 might benefit from a warm up painfully paid for by function1. So mix up the order of them or cycle through them enough that they all pay for or gain from warm ups.

  2. Look at the median first. The mean (aka. average) is often tainted by spikes and these spikes of slow-down can be caused by your local Spotify client deciding to reindex itself or something some such. Sometimes those spikes matter. For example, garbage collection is inevitable and will have an effect that matters.

  3. Run your functions many times. So many times that the whole benchmark takes a while. Like tens of seconds or more. Also, if you run it significantly long it's likely that all candidates gets punished by the same environmental effects such as garbage collection or CPU being reassinged to something else intensive on your computer.

  4. Try to take your benchmark into different, and possibly more realistic environments. For example, don't rely on reading a file like /Users/peterbe/only/on/my/macbook when, likely, the end destination for your code is an Ubuntu server in AWS. Write your code so that it's easy to copy and paste around, like into a vi/jed editor in an ssh session somewhere.

  5. Sanity check each function before benchmarking them. No need for pytest or anything fancy but just make sure that you test them in some basic way. But the assertion testing is likely to add to the total execution time so don't do it when running the functions.

  6. Avoid "prints" inside the time measured code. A print() is I/O and an "external resource" that can become very unfair to compare CPU bound performance.

  7. Don't fear testing many different functions. If you have multiple ideas of doing a function differently, it's cheap to pile them on. But be careful how you "report" because if there are many different ways of doing something you might accidentally compare different fruit without noticing.

  8. Make sure your functions take at least one parameter. I'm no Python core developer or C hacker but I know there are "murks" within a compiler and interpreter that might do what a regular memoizer might done. Also, the performance difference can be reversed on tiny inputs compared to really large ones.

  9. Be humble with the fact that 0.01 milliseconds difference when doing 10,000 iterations is probably not worth writing a more complex and harder-to-debug function.

The Boilerplate

Let's demonstrate with an example:


# The functions to compare
import math


def f1(degrees):
    return math.cos(degrees)


def f2(degrees):
    e = 2.718281828459045
    return (
        (e**(degrees * 1j) + e**-(degrees * 1j)) / 2
    ).real


# Assertions
assert f1(100) == f2(100) == 0.862318872287684
assert f1(1) == f2(1) == 0.5403023058681398


# Reporting
import time
import random
import statistics

functions = f1, f2
times = {f.__name__: [] for f in functions}

for i in range(100000):  # adjust accordingly so whole thing takes a few sec
    func = random.choice(functions)
    t0 = time.time()
    func(i)
    t1 = time.time()
    times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))

Let's break that down a bit.

  • The first area (# The functions to compare) is all up to you. This silly example tries to peg Python's builtin math.cos against your own arithmetic expression.

  • The second area (# Assertions) is where you do some basic sanity checks/tests. This comes in handy to make sure if you keep modifying the functions more and more to try to squeeze out some extra juice.

  • The last area (# Reporting) is the boilerplat'y area. You obviously have to change the line functions = f1, f2 to include all the named functions you have in the first area. And the number of iterations totally depends on how long the functions take to run. Here it's 100,000 times which is kinda ridiculous but I just needed a dead simple function to demonstrate.

  • Note that each measurement is in milliseconds.

You run that and get something like this:

FUNCTION: f1 Used 49990 times
    MEDIAN 0.0
    MEAN   0.00045161219591330375
    STDEV  0.0011268475946446341
FUNCTION: f2 Used 50010 times
    MEDIAN 0.00095367431640625
    MEAN   0.0009188626294516487
    STDEV  0.000642871632138125

More Examples

The example above already broke one of the tenets in that these functions were simply too fast. Doing rather basic mathematics is just too fast to compare with such a trivial benchmark. Here are some other examples:

Remove duplicates from list without losing order


# The functions to compare


def f1(seq):
    checked = []
    for e in seq:
        if e not in checked:
            checked.append(e)
    return checked


def f2(seq):
    checked = []
    seen = set()
    for e in seq:
        if e not in seen:
            checked.append(e)
            seen.add(e)
    return checked


def f3(seq):
    checked = []
    [checked.append(i) for i in seq if not checked.count(i)]
    return checked


def f4(seq):
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]


def f5(seq):
    def generator():
        seen = set()
        for x in seq:
            if x not in seen:
                seen.add(x)
                yield x

    return list(generator())


# Assertion
import random

def _random_seq(length):
    seq = []
    for _ in range(length):
        seq.append(random.choice(
            'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
        ))
    return seq


L = list('abca')
assert f1(L) == f2(L) == f3(L) == f4(L) == f5(L) == list('abc')
L = _random_seq(10)
assert f1(L) == f2(L) == f3(L) == f4(L) == f5(L)

# Reporting
import time
import statistics

functions = f1, f2, f3, f4, f5
times = {f.__name__: [] for f in functions}

for i in range(3000):
    seq = _random_seq(i)
    for _ in range(len(functions)):
        func = random.choice(functions)
        t0 = time.time()
        func(seq)
        t1 = time.time()
        times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))

Results:

FUNCTION: f1 Used 3029 times
    MEDIAN 0.6871223449707031
    MEAN   0.6917867380307822
    STDEV  0.42611748137761174
FUNCTION: f2 Used 2912 times
    MEDIAN 0.054955482482910156
    MEAN   0.05610262627130026
    STDEV  0.03000829926668248
FUNCTION: f3 Used 2985 times
    MEDIAN 1.4472007751464844
    MEAN   1.4371055654145566
    STDEV  0.888658217522005
FUNCTION: f4 Used 2965 times
    MEDIAN 0.051975250244140625
    MEAN   0.05343245816673035
    STDEV  0.02957275548477728
FUNCTION: f5 Used 3109 times
    MEDIAN 0.05507469177246094
    MEAN   0.05678296204202234
    STDEV  0.031521596461048934

Winner:


def f4(seq):
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

Fastest way to count the number of lines in a file


# The functions to compare
import codecs
import subprocess


def f1(filename):
    count = 0
    with codecs.open(filename, encoding='utf-8', errors='ignore') as f:
        for line in f:
            count += 1
    return count


def f2(filename):
    with codecs.open(filename, encoding='utf-8', errors='ignore') as f:
        return len(f.read().splitlines())


def f3(filename):
    return int(subprocess.check_output(['wc', '-l', filename]).split()[0])


# Assertion
filename = 'big.csv'
assert f1(filename) == f2(filename) == f3(filename) == 9999


# Reporting
import time
import statistics
import random

functions = f1, f2, f3
times = {f.__name__: [] for f in functions}

filenames = 'dummy.py', 'hacker_news_data.txt', 'yarn.lock', 'big.csv'
for _ in range(200):
    for fn in filenames:
        for func in functions:
            t0 = time.time()
            func(fn)
            t1 = time.time()
            times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))

Results:

FUNCTION: f1 Used 800 times
    MEDIAN 5.852460861206055
    MEAN   25.403797328472137
    STDEV  37.09347378640582
FUNCTION: f2 Used 800 times
    MEDIAN 0.45299530029296875
    MEAN   2.4077045917510986
    STDEV  3.717931526478758
FUNCTION: f3 Used 800 times
    MEDIAN 2.8804540634155273
    MEAN   3.4988239407539368
    STDEV  1.3336427480808102

Winner:


def f2(filename):
    with codecs.open(filename, encoding='utf-8', errors='ignore') as f:
        return len(f.read().splitlines())

Conclusion

No conclusion really. Just wanted to point out that this is just a hint of a decent start when doing performance benchmarking of functions.

There is also the timeit built-in for "provides a simple way to time small bits of Python code" but it has the disadvantage that your functions are not allowed to be as complex. Also, it's harder to generate multiple different fixtures to feed your functions without that fixture generation effecting the times.

There's a lot of things that this boilerplate can improve such as sorting by winner, showing percentages comparisons against the fastests, ASCII graphs, memory allocation differences, etc. That's up to you.