Unzip benchmark on AWS EC2 c3.large vs c4.large

November 29, 2017
18 comments Python, Linux, Mozilla, Go

Datadog monitoring of time to dump and extract zip files in staging server
This web app I'm working on gets a blob of bytes from a HTTP POST. The nature of the blob is a 100MB to 1,100MB blob of a zip file. What my app currently does is that it takes this byte buffer, uses Python's built in zipfile to extract all its content to a temporary directory. A second function then loops over the files within this extracted tree and processes each file in multiple threads with concurrent.futures.ThreadPoolExecutor. Here's the core function itself:


@metrics.timer_decorator('upload_dump_and_extract')
def dump_and_extract(root_dir, file_buffer):
    zf = zipfile.ZipFile(file_buffer)
    zf.extractall(root_dir)

So far so good.

Speed Speed Speed

I quickly noticed that this is amounting to quite a lot of time spent doing the unzip and the writing to disk. What to do????

At first I thought I'd shell out to good old unzip. E.g. unzip -d /tmp/tempdirextract /tmp/input.zip but that has two flaws:

1) I'd first have to dump the blob of bytes to disk and do the overhead of shelling out (i.e. Python subprocess)
2) It's actually not faster. Did some experimenting and got the same results at Alex Martelli in this Stackoverflow post

Compute EC2 instance types
What about disk speed? Yeah, this is likely to be a part of the total time. The servers that run the symbols.mozilla.org service runs on AWS EC2 c4.large. This only has EBS (Elastic Block Storage). However, AWS EC2 c3.large looks interesting since it's using SSD disks. That's probably a lot faster. Right?

Note! For context, the kind of .zip files I'm dealing with contain many small files and often 1-2 really large ones.

EC2s Benchmarking

I create two EC2 nodes to experiment on. One c3.large and one c4.large. Both running Ubuntu 16.04.

Next, I have this little benchmarking script which loops over a directory full of .zip files between 200MB-600MB large. Roughly 10 of them. It then loads each one, one at a time, into memory and calls the dump_and_extract. Let's run it on each EC2 instance:

On c4.large

c4.large$ python3 fastest-dumper.py /tmp/massive-symbol-zips
138.2MB/s            291.1MB              2.107s
146.8MB/s            314.5MB              2.142s
144.8MB/s            288.2MB              1.990s
84.5MB/s             532.4MB              6.302s
146.6MB/s            314.2MB              2.144s
136.5MB/s            270.7MB              1.984s
85.9MB/s             518.9MB              6.041s
145.2MB/s            306.8MB              2.113s
127.8MB/s            138.7MB              1.085s
107.3MB/s            454.8MB              4.239s
141.6MB/s            251.2MB              1.774s


Average speed: 127.7MB/s
Median speed:  138.2MB/s

Average files created:       165
Average directories created: 129

On c3.large

c3.large$ python3 fastest-dumper.py -t /mnt/extracthere /tmp/massive-symbol-zips
105.4MB/s            290.9MB              2.761s
98.1MB/s             518.5MB              5.287s
108.1MB/s            251.2MB              2.324s
112.5MB/s            294.3MB              2.615s
113.7MB/s            314.5MB              2.767s
106.3MB/s            291.5MB              2.742s
104.8MB/s            291.1MB              2.778s
114.6MB/s            248.3MB              2.166s
114.2MB/s            248.2MB              2.173s
105.6MB/s            298.1MB              2.823s
106.2MB/s            297.6MB              2.801s
98.6MB/s             521.4MB              5.289s


Average speed: 107.3MB/s
Median speed:  106.3MB/s

Average files created:       165
Average directories created: 127

What the heck!? The SSD based instance is 23% slower!

I ran it a bunch of times and the average and median numbers are steady. c4.large is faster than c3.large at unzipping large blobs to disk. So much for that SSD!

Something Weird Is Going On

It's highly likely that the unzipping work is CPU bound and that most of those, for example, 5 seconds is spent unzipping and only a small margin is the time it takes to write to disk.

If the unzipping CPU work is the dominant "time consumer" why is there a difference at all?!

Or, is the "compute power" the difference between c3 and c4 and disk writes immaterial?

For the record, this test clearly demonstrates that the locally mounted SSD drive is 600% faster than ESB.

c3.large$ dd if=/dev/zero of=/tmp/1gbtest bs=16k count=65536
65536+0 records in
65536+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 16.093 s, 66.7 MB/s
c3.large$ sudo dd if=/dev/zero of=/mnt/1gbtest bs=16k count=65536
65536+0 records in
65536+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.62728 s, 409 MB/s

Let's try again. But instead of using c4.large and c3.large, let's use the beefier c4.4xlarge and c3.4xlarge. Both have 16 vCPUs.

c4.4xlarge

c4.4xlarge$ python3 fastest-dumper.py /tmp/massive-symbol-zips
130.6MB/s            553.6MB              4.238s
149.2MB/s            297.0MB              1.991s
129.1MB/s            529.8MB              4.103s
116.8MB/s            407.1MB              3.486s
147.3MB/s            306.1MB              2.077s
151.9MB/s            248.2MB              1.634s
140.8MB/s            292.3MB              2.076s
146.8MB/s            288.0MB              1.961s
142.2MB/s            321.0MB              2.257s


Average speed: 139.4MB/s
Median speed:  142.2MB/s

Average files created:       148
Average directories created: 117

c3.4xlarge

c3.4xlarge$ python3 fastest-dumper.py -t /mnt/extracthere /tmp/massive-symbol-zips
95.1MB/s             502.4MB              5.285s
104.1MB/s            303.5MB              2.916s
115.5MB/s            313.9MB              2.718s
105.5MB/s            517.4MB              4.904s
114.1MB/s            288.1MB              2.526s
103.3MB/s            555.9MB              5.383s
114.0MB/s            288.0MB              2.526s
109.2MB/s            251.2MB              2.300s
108.0MB/s            291.0MB              2.693s


Average speed: 107.6MB/s
Median speed:  108.0MB/s

Average files created:       150
Average directories created: 119

What's going on!? The time it takes to unzip and write to disk is, on average, the same for c3.large as c3.4xlarge!

Is Go Any Faster?

I need a break. As mentioned above, the unzip command line program is not any better than doing it in Python. But Go is faster right? Right?

Please first accept that I'm not a Go programmer even though I can use it to build stuff but really my experience level is quite shallow.

Here's the Go version. Critical function that does the unzipping and extraction to disk here:


func DumpAndExtract(dest string, buffer []byte, name string) {
    size := int64(len(buffer))
    zipReader, err := zip.NewReader(bytes.NewReader(buffer), size)
    if err != nil {
        log.Fatal(err)
    }
    for _, f := range zipReader.File {
        rc, err := f.Open()
        if err != nil {
            log.Fatal(err)
        }
        defer rc.Close()
        fpath := filepath.Join(dest, f.Name)
        if f.FileInfo().IsDir() {
            os.MkdirAll(fpath, os.ModePerm)
        } else {
            // Make File
            var fdir string
            if lastIndex := strings.LastIndex(fpath, string(os.PathSeparator)); lastIndex > -1 {
                fdir = fpath[:lastIndex]
            }
            err = os.MkdirAll(fdir, os.ModePerm)
            if err != nil {
                log.Fatal(err)
            }
            f, err := os.OpenFile(
                fpath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
            if err != nil {
                log.Fatal(err)
            }
            defer f.Close()

            _, err = io.Copy(f, rc)
            if err != nil {
                log.Fatal(err)
            }
        }
    }
}

And the measurement is done like this:


size := int64(len(content))
t0 := time.Now()
DumpAndExtract(tmpdir, content, filename)
t1 := time.Now()
speed := float64(size) / t1.Sub(t0).Seconds()

It's not as sophisticated (since it's only able to use /tmp) but let's just run it see how it compares to Python:

c4.4xlarge$ mkdir ~/GO
c4.4xlarge$ export GOPATH=~/GO
c4.4xlarge$ go get github.com/pyk/byten
c4.4xlarge$ go build unzips.go
c4.4xlarge$ ./unzips /tmp/massive-symbol-zips
56MB/s         407MB          7.27804954
74MB/s         321MB          4.311504933
75MB/s         288MB          3.856798853
75MB/s         292MB          3.90972474
81MB/s         248MB          3.052652168
58MB/s         530MB          9.065985117
59MB/s         554MB          9.35237202
75MB/s         297MB          3.943132388
74MB/s         306MB          4.147176578

Average speed:    70MB/s
Median speed:     81MB/s

So... Go is, on average, 40% slower than Python in this scenario. Did not expect that.

In Conclusion

No conclusion. Only confusion.

I thought this would be a lot clearer and more obvious. Yeah, I know it's crazy to measure two things at the same time (unzip and disk write) but the whole thing started with a very realistic problem that I'm trying to solve. The ultimate question was; will the performance benefit from us moving the web servers from AWS EC2 c4.large to c3.large and I think the answer is no.

UPDATE (Nov 30, 2017)

Here's a horrible hack that causes the extraction to always go to /dev/null:


class DevNullZipFile(zipfile.ZipFile):
    def _extract_member(self, member, targetpath, pwd):
        # member.is_dir() only works in Python 3.6
        if member.filename[-1] == '/':
            return targetpath
        dest = '/dev/null'
        with self.open(member, pwd=pwd) as source, open(dest, "wb") as target:
            shutil.copyfileobj(source, target)
        return targetpath


def dump_and_extract(root_dir, file_buffer, klass):
    zf = klass(file_buffer)
    zf.extractall(root_dir)

And here's the outcome of running that:

c4.4xlarge$ python3 fastest-dumper.py --dev-null /tmp/massive-symbol-zips
170.1MB/s            297.0MB              1.746s
168.6MB/s            306.1MB              1.815s
147.1MB/s            553.6MB              3.765s
132.1MB/s            407.1MB              3.083s
145.6MB/s            529.8MB              3.639s
175.4MB/s            248.2MB              1.415s
163.3MB/s            321.0MB              1.965s
162.1MB/s            292.3MB              1.803s
168.5MB/s            288.0MB              1.709s


Average speed: 159.2MB/s
Median speed:  163.3MB/s

Average files created:       0
Average directories created: 0

I ran it a few times to make sure the numbers are stable. They are. This is on the c4.4xlarge.

So, the improvement of writing to /dev/null instead of the ESB /tmp is 15%. Kinda goes to show how much of the total time is spent reading the ZipInfo file object.

For the record, the same comparison on the c3.4xlarge was 30% improvement when using /dev/null.

Also for the record, if I replace that line shutil.copyfileobj(source, target) above with pass, the average speed goes from 159.2MB/s to 112.8GB/s but that's not a real value of any kind.

UPDATE (Nov 30, 2017)

Here's the same benchmark using c5.4xlarge instead. So, still EBS but...
"3.0 GHz Intel Xeon Platinum processors with new Intel Advanced Vector Extension 512 (AVX-512) instruction set"

Let's run it on this supposedly faster CPU:

c5.4xlarge$ python3 fastest-dumper.py /tmp/massive-symbol-zips
165.6MB/s            314.6MB              1.900s
163.3MB/s            287.7MB              1.762s
155.2MB/s            278.6MB              1.795s
140.9MB/s            513.2MB              3.643s
137.4MB/s            556.9MB              4.052s
134.6MB/s            531.0MB              3.946s
165.7MB/s            314.2MB              1.897s
158.1MB/s            301.5MB              1.907s
151.6MB/s            253.8MB              1.674s
146.9MB/s            502.7MB              3.422s
163.7MB/s            288.0MB              1.759s


Average speed: 153.0MB/s
Median speed:  155.2MB/s

Average files created:       150
Average directories created: 119

So that is, on average, 10% faster than c4.4xlarge.

Is it 10% more expensive? For a 1-year reserved instance, it's $0.796 versus $0.68 respectively. I.e. 15% more expensive. In other words, in this context it's 15% more $$$ for 10% more processing power.

UPDATE (Jan 24, 2018)

I can almost not believe it!

Thanks you Oliver who discovered (see comment below) a blaring mistake in my last conclusion. The for reserved instances (which is what we use on my Mozilla production servers) the c5.4xlarge is actually cheaper than c4.4xlarge. What?!

In my previous update I compared c4.4xlarge and c5.4xlarge and concluded that c5.4xlarge is 10% faster but 15% more expensive. That actually made sense. Fancier servers, more $$$. But it's not like that in the real world. See for yourself:

c4.4xlarge
c4.4xlarge

c5.4xlarge
c5.4xlarge

How to create-react-app with Docker

November 17, 2017
31 comments Linux, Web development, JavaScript, React, Docker

Why would you want to use Docker to do React app work? Isn't Docker for server-side stuff like Python and Golang etc? No, all the benefits of Docker apply to JavaScript client-side work too.

So there are three main things you want to do with create-react-app; dev server, running tests and creating build artifacts. Let's look at all three but using Docker.

Create-react-app first

If you haven't already, install create-react-app globally:

▶ yarn global add create-react-app

And, once installed, create a new project:

▶ create-react-app docker-create-react-app
...lots of output...

▶ cd docker-create-react-app
▶ ls
README.md    node_modules package.json public       src          yarn.lock

We won't need the node_modules here in the project directory. Instead, when building the image we're going let node_modules stay inside the image. So you can go ahead and... rm -fr node_modules.

Create the Dockerfile

Let's just dive in. This Dockerfile is the minimum:

FROM node:8

ADD yarn.lock /yarn.lock
ADD package.json /package.json

ENV NODE_PATH=/node_modules
ENV PATH=$PATH:/node_modules/.bin
RUN yarn

WORKDIR /app
ADD . /app

EXPOSE 3000
EXPOSE 35729

ENTRYPOINT ["/bin/bash", "/app/run.sh"]
CMD ["start"]

A couple of things to notice here.
First of all we're basing this on the official Node v8 repository on Docker Hub. That gives you a Node and Yarn by default.

Note how the NODE_PATH environment variable puts the node_modules in the root of the container. That's so that it doesn't get added in "here" (i.e. the current working directory). If you didn't do this, the node_modules directory would be part of the mounted volume which not only slows down Docker (since there are so many files) it also isn't necessary to see those files.

Note how the ENTRYPOINT points to run.sh. That's a file we need to create too, alongside the Dockerfile file.

#!/usr/bin/env bash
set -eo pipefail

case $1 in
  start)
    # The '| cat' is to trick Node that this is an non-TTY terminal
    # then react-scripts won't clear the console.
    yarn start | cat
    ;;
  build)
    yarn build
    ;;
  test)
    yarn test $@
    ;;
  *)
    exec "$@"
    ;;
esac

Lastly, as a point of convenience, note that the default CMD is "start". That's so that when you simply run the container the default thing it does is to run yarn start.

Build container

Now let's build it:

▶ docker image build -t react:app .

The -t react:app is up to you. It doesn't matter so much what it is unless you're going to upload your container the a registry. Then you probably want the repository to be something unique.

Let's check that the build is there:

▶ docker image ls react:app
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
react               app                 3ee5c7596f57        13 minutes ago      996MB

996MB! The base Node image is about ~700MB and the node_modules directory (for a clean new create-react-app) is ~160MB (at the time of writing). What the remaining difference is, I'm not sure. But it's empty calories and easy to lose. When you blow away the built image (docker image rmi react:app) your hard drive gets all that back and no actual code is lost.

Before we run it, lets go inside and see what was created:

▶ docker container run -it react:app bash
root@996e708a30c4:/app# ls
Dockerfile  README.md  package.json  public  run.sh  src  yarn.lock
root@996e708a30c4:/app# du -sh /node_modules/
148M    /node_modules/
root@996e708a30c4:/app# sw-precache
Total precache size is about 355 kB for 14 resources.
service-worker.js has been generated with the service worker contents.

The last command (sw-precache) was just to show that executables in /node_modules/.bin are indeed on the $PATH and can be run.

Run container

Now to run it:

▶ docker container run -it -p 3000:3000 react:app
yarn run v1.3.2
$ react-scripts start
Starting the development server...

Compiled successfully!

You can now view docker-create-react-app in the browser.

  Local:            http://localhost:3000/
  On Your Network:  http://172.17.0.2:3000/

Note that the development build is not optimized.
To create a production build, use yarn build.

Default app running

Pretty good. Open http://localhost:3000 in your browser and you should see the default create-react-app app.

Next step; Warm reloading

create-react-app does not support hot reloading of components. But it does support web page reloading. As soon as a local file is changed, it sends a signal to the browser (using WebSockets) to tell it to... document.location.reload().

To make this work, we need to do two things:
1) Mount the current working directory into the Docker container
2) Expose the WebSocket port

The WebSocket thing is set up by exposing port 35729 to the host (-p 35729:35729).

Below is an example running this with a volume mount and both ports exposed.

▶ docker container run -it -p 3000:3000 -p 35729:35729 -v $(pwd):/app react:app
yarn run v1.3.2
$ react-scripts start
Starting the development server...

Compiled successfully!

You can now view docker-create-react-app in the browser.

  Local:            http://localhost:3000/
  On Your Network:  http://172.17.0.2:3000/

Note that the development build is not optimized.
To create a production build, use yarn build.

Compiling...
Compiled successfully!
Compiling...
Compiled with warnings.

./src/App.js
  Line 7:  'neverused' is assigned a value but never used  no-unused-vars

Search for the keywords to learn more about each warning.
To ignore, add // eslint-disable-next-line to the line before.

Compiling...
Failed to compile.

./src/App.js
Module not found: Can't resolve './Apps.css' in '/app/src'

In the about example output. First I make a harmless save in the src/App.js file just to see that the dev server notices and that my browser reloads when I did that. That's where it says

Compiling...
Compiled successfully!

Secondly, I make an edit that triggers a warning. That's where it says:

Compiling...
Compiled with warnings.

./src/App.js
  Line 7:  'neverused' is assigned a value but never used  no-unused-vars

Search for the keywords to learn more about each warning.
To ignore, add // eslint-disable-next-line to the line before.

And lastly I make an edit by messing with the import line

Compiling...
Failed to compile.

./src/App.js
Module not found: Can't resolve './Apps.css' in '/app/src'

This is great! Isn't create-react-app wonderful?

Build build :)

There are many things you can do with the code you're building. Let's pretend that the intention is to build a single-page-app and then take the static assets (including the index.html) and upload them to a public CDN or something. To do that we need to generate the build directory.

The trick here is to run this with a volume mount so that when it creates /app/build (from the perspective) of the container, that directory effectively becomes visible in the host.

▶ docker container run -it -v $(pwd):/app react:app build
yarn run v1.3.2
$ react-scripts build
Creating an optimized production build...
Compiled successfully.

File sizes after gzip:

  35.59 KB  build/static/js/main.591fd843.js
  299 B     build/static/css/main.c17080f1.css

The project was built assuming it is hosted at the server root.
To override this, specify the homepage in your package.json.
For example, add this to build it for GitHub Pages:

  "homepage" : "http://myname.github.io/myapp",

The build folder is ready to be deployed.
You may serve it with a static server:

  yarn global add serve
  serve -s build

Done in 5.95s.

Now, on the host:

▶ tree build
build
├── asset-manifest.json
├── favicon.ico
├── index.html
├── manifest.json
├── service-worker.js
└── static
    ├── css
    │   ├── main.c17080f1.css
    │   └── main.c17080f1.css.map
    ├── js
    │   ├── main.591fd843.js
    │   └── main.591fd843.js.map
    └── media
        └── logo.5d5d9eef.svg

4 directories, 10 files

The contents of that file you can now upload to a CDN some public Nginx server that points to this as the root directory.

Running tests

This one is so easy and obvious now.

▶ docker container run -it -v $(pwd):/app react:app test

Note the that we're setting up a volume mount here again. Since the test runner is interactive it sits and waits for file changes and re-runs tests immediately, it's important to do the mount now.

All regular jest options work too. For example:

▶ docker container run -it -v $(pwd):/app react:app test --coverage
▶ docker container run -it -v $(pwd):/app react:app test --help

Debugging the node_modules

First of all, when I say "debugging the node_modules", in this context, I'm referring to messing with node_modules whilst running tests or running the dev server.

One way to debug the node_modules used is to enter a bash shell and literally mess with the files inside it. First, start the dev server (or start the test runner) and give the container a name:

▶ docker container run -it -p 3000:3000 -p 35729:35729 -v $(pwd):/app --name mydebugging react:app

Now, in a separate terminal start bash in the container:

▶ docker exec -it mydebugging bash

Once you're in you can install an editor and start editing files:

root@2bf8c877f788:/app# apt-get update && apt-get install jed
root@2bf8c877f788:/app# jed /node_modules/react/index.js

As soon as you make changes to any of the files, the dev server should notice and reload.

When you stop the container all your changes will be reset. So if you had to sprinkle the node_modules with console.log('WHAT THE HECK!') all of those disappear when the container is stopped.

NodeJS shell

This'll come as no surprise by now. You basically run bash and you're there:

▶ docker container run -it -v $(pwd):/app react:app bash
root@2a21e8206a1f:/app# node
> [] + 1
'1'

Conclusion

When I look back at all the commands above, I can definitely see how it's pretty intimidating and daunting. So many things to remember and it's got that nasty feeling where you feel like your controlling your development environment through unwieldy levers rather than your own hands.

But think of the fundamental advantages too! It's all encapsulated now. What you're working on will be based on the exact same version of everything as your teammate, your dev server and your production server are using.

Pros:

  • All packaged up and all team members get the exact same versions of everything, including Node and Yarn.
  • The node_modules directory gets out of your hair.
  • Perhaps some React code is just a small part of a large project. E.g. the frontend is React, the backend is Django. Then with some docker-compose magic you can have it all running with one command without needing to run the frontend in a separate terminal.

Cons:

  • Lack of color output in terminal.
  • The initial (or infrequent) wait for building the docker image is brutal on a slow network.
  • Lots of commands to remember. For example, How do you start a shell again?

In my (Mozilla Services) work, the projects I work on, I actually use docker-compose for all things. And I have a Makefile to help me remember all the various docker-compose commands (thanks Jannis & Will!). One definitely neat thing you can do with docker-compose is start multiple containers. Then you can, with one command, start a Django server and the create-react-app dev server with one command. Perhaps a blog post for another day.

Yet another Docker 'A ha!' moment

November 5, 2017
2 comments macOS, Docker

tl;dr; To build once and run Docker containers with different files use a volume mount. If that's not an option, like in CircleCI, avoid volume mount and rely on container build every time.

What the heck is a volume mount anyway?

Laugh all you like but after almost year of using Docker I'm still learning the basics. Apparently. This, now, feels laughable but there's a small chance someone else stumbles like I did and they might appreciate this.

If you have a volume mounted for a service in your docker-compose.yml it will basically take whatever you mount and lay that on top of what was in the Docker container. Doing a volume mount into the same working directory as your container is totally common. When you do that the files on the host (the files/directories mounted) get used between each run. If you don't do that, you're stuck with the files, from your host, from the last time you built.

Consider...:

# Dockerfile
FROM python:3.6-slim
LABEL maintainer="mail@peterbe.com"
COPY . /app
WORKDIR /app
CMD ["python", "run.py"]

and...:


#!/usr/bin/env python
if __name__ == '__main__':
    print("hello!")

Let's build it:

$ docker image build -t test:latest .
Sending build context to Docker daemon   5.12kB
Step 1/5 : FROM python:3.6-slim
 ---> 0f1dc0ba8e7b
Step 2/5 : LABEL maintainer "mail@peterbe.com"
 ---> Using cache
 ---> 70cf25f7396c
Step 3/5 : COPY . /app
 ---> 2e95935cbd52
Step 4/5 : WORKDIR /app
 ---> bc5be932c905
Removing intermediate container a66e27ecaab3
Step 5/5 : CMD python run.py
 ---> Running in d0cf9c546fee
 ---> ad930ce66a45
Removing intermediate container d0cf9c546fee
Successfully built ad930ce66a45
Successfully tagged test:latest

And run it:

$ docker container run test:latest
hello!

So basically my little run.py got copied into the container by the Dockerfile. Let's change the file:

$ sed -i.bak s/hello/allo/g run.py
$ python run.py
allo!

But it won't run like that if we run the container again:

$ docker container run test:latest
hello!

So, the container is now built based on a Python file from back of the time the container was built. Two options:

1) Rebuild, or
2) Volume mount in the host directory

This is it! That this is your choice.

Rebuild might take time. So, let's mount the current directory from the host:

$ docker container run -v `pwd`:/app test:latest
allo!

So yay! Now it runs the container with the latest file from my host directory.

The dark side of volume mounts

So, if it's more convenient to "refresh the files in the container" with a volume mount instead of container rebuild, why not always do it for everything?

For one thing, there might be files built inside the container that cease to be visible if you override that workspace with your own volume mount.

The other crucial thing I learned the hard way (seems to obvious now!) is that there isn't always a host directory to mount. In particular, in tecken we use a base ubuntu image and in the run parts of the CircleCI configuration we were using docker-compose run ... with directives (in the docker-compose.yml file) that uses volume mounts. So, the rather cryptic effect was that the files mounted into the container was not the files checked out from the git branch.

The resolution in this case, was to be explicit when running Docker commands in CircleCI to only do build followed by run without a volume mount. In particular, to us it meant changing from docker-compose run frontend lint to docker-compose run frontend-ci lint. Basically, it's a separate directive in the docker-compose.yml file that is exclusive to CI.

In conclusion

I feel dumb for not seeing this clearly before.

The mistake that triggered me was that when I ran docker-compose run test test (first test is the docker compose directive, the second test is the of the script sent to CMD) it didn't change the outputs when I edit the files in my editor. Adding a volume mount to that directive solved it for me locally on my laptop but didn't work in CircleCI for reasons (I can't remember how it errored).

So now we have this:

# In docker-compose.yml

  frontend:
    build:
      context: .
      dockerfile: Dockerfile.frontend
    environment:
      - NODE_ENV=development
    ports:
      - "3000:3000"
      - "35729:35729"
    volumes:
      - $PWD/frontend:/app
    command: start

  # Same as 'frontend' but no volumes or command
  frontend-ci:
    build:
      context: .
      dockerfile: Dockerfile.frontend

How to use django-cache-memoize

November 3, 2017
0 comments Python, Django

Last week I released django-memoize-function which is a library for Django developers to more conveniently use caching in function calls. This is a quick blog post to demonstrate that with an example.

The verbose traditional way to do it

Suppose you have a view function that takes in a request and returns a HttpResponse. Within, it does some expensive calculation that you know could be cached. Something like this:

No caching


def blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)

    related_posts = BlogPost.objects.exclude(
        id=post.id
    ).filter(
        # BlogPost.keywords is an ArrayField
        keywords__overlap=post.keywords
    ).order_by('-publish_date')

    context = {
        'post': post,
        'related_posts': related_posts,
    }
    return render(request, 'blogpost.html', context)

So far so good. Perhaps you know that lookup of related posts is slowish and can be cached for at least one hour. So you add this:

Caching


from django.core.cache import cache

def blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)

    cache_key = b'related_posts:{}'.format(post.id)
    related_posts = cache.get(cache_key)
    if related_posts is None:  # was not cached
        related_posts = BlogPost.objects.exclude(
            id=post.id
        ).filter(
            # BlogPost.keywords is an ArrayField
            keywords__overlap=post.keywords
        ).order_by('-publish_date')
        cache.set(cache_key, related_posts, 60 * 60)

    context = {
        'post': post,
        'related_posts': related_posts,
    }
    return render(request, 'blogpost.html', context)

Great progress. But now you want that cache to immediate reset as soon as the blog posts change.


@login_required
def update_blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)
    if request.method == 'POST':
        # BlogPostForm is a forms.ModelForm class for BlogPost
        form = BlogPostForm(request.POST, instance=post)
        if form.is_valid():
            form.save()
            cache_key = b'related_posts:{}'.format(post.id)
            cache.delete(cache_key)
            return redirect(reverse('blog_post', args=(post.id,))
    else:
        form = BlogPostForm(instance=post)

    context = {
        'post': post,
        'form': form,
    }
    return render(request, 'edit_blogpost.html', context)

Awesome. Now the cache is cleared as soon as the BlogPost is updated.

Problem; you have repeated the code generating the cache key in two places.

Use django-cache-memoize

First extract out the getting of related posts into its own function and then decorate it.


from cache_memoize import cache_memoize

@cache_memoize(60 * 60)
def get_related_posts(id):
    return BlogPost.objects.exclude(
        id=post.id
    ).filter(
        # BlogPost.keywords is an ArrayField
        keywords__overlap=post.keywords
    ).order_by('-publish_date')


def blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)

    related_posts = get_related_posts(post.id)

    context = {
        'post': post,
        'related_posts': related_posts,
    }
    return render(request, 'blogpost.html', context) 

Now, to do the cache invalidation you need to call that function get_related_posts one more time:


def update_blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)
    if request.method == 'POST':
        # BlogPostForm is a forms.ModelForm class for BlogPost
        form = BlogPostForm(request.POST, instance=post)
        if form.is_valid():
            form.save()

            # NOTE!
            get_related_posts.invalidate(post.id)

            return redirect(reverse('blog_post', args=(post.id,))
    else:
        form = BlogPostForm(instance=post)

    context = {
        'post': post,
        'form': form,
    }
    return render(request, 'edit_blogpost.html', context)

Now you're not repeating the code that constructs the cache key.

Getting fancy; hot cache

The above pattern, with or without django-cache-memoize, clears the cache when the blog post changes and then you basically wait till the next time the blog post is rendered, then the cache will be populated again.

A more "aggressive" pattern is to "heat the cache up" right after we've cleared it. A simple change is to call get_related_posts() again and let it cache. But to make sure it gets a fresh set of results we pass in the extra _refresh=True argument.


def update_blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)
    if request.method == 'POST':
        # BlogPostForm is a forms.ModelForm class for BlogPost
        form = BlogPostForm(request.POST, instance=post)
        if form.is_valid():
            form.save()

            # NOTE!
            # Refresh the cache here and now
            get_related_posts(post.id, _refresh=True)

            return redirect(reverse('blog_post', args=(post.id,))
    else:
        form = BlogPostForm(instance=post)

    context = {
        'post': post,
        'form': form,
    }
    return render(request, 'edit_blogpost.html', context)

What was the point of that?

The above example doesn't do a great job demonstrating how convenient it can be to use django-cache-memoize compared to "doing it manually". If your code base is peppered with lots of little blocks where you construct a cache key, check the cache, fall back on re-generation and write to cache again; then it can really add up to take away all of that mess and just use a decorator on anything that can be memoized.

Probably the biggest benefit with moving the cacheable functionality into its own function and decorating it is that all that hassle code with creating safe and unique cache keys is all in one place. You won't be violating the Don't Repeat Yourself principle. This becomes especially important once the cache keys that need to be constructed are getting complex and needs care.

Ultimately if you're able, your code will be free of various cache.set and cache.get code and yet a bunch of cacheable stuff gets cached nicely.

Why not use a regular @memoize or @functools.lru_cache?

The major difference between something like https://pypi.python.org/pypi/memoize/ and django-cache-memoize is that django-cache-memoize uses django.core.cache.cache which is a global state store (most likely backed by Redis or Memcached). If you use one of the other memoization solutions they'll be in-memory. Meaning, if your production code runs Gunicorn or uWSGI with, say, 8 workers then you'll have 8 copies of the same cache store. So if you're trying to protect an expensive function with @functools.lru_cache it will, worst case, be a cache miss 8 times on 8 different requests.

To enable Tracking Protection for performance

November 1, 2017
0 comments Web development, Mozilla

In Firefox it's really easy to switch Tracking Protection on and off. I usually have it off in my main browser because, as a web developer, it often gives me an insight into what others would see and that's often helpful.

Tracking Protection as a means for performance boost as been mentioned many times before to not just avoid leaving digital footprints but also a way to browse faster.

An Example of Performance

I just wanted to show such an example. In both examples I load a blog post on seriouseats.com which has real content and lots of images (11 non-ad images, totalling ~600KB).

First, without Tracking Protection

Without Tracking Protection

Next, with Tracking Protection

With Tracking Protection

This is the Network waterfall view of all the requests needed to make up the page. The numbers at the very bottom are the interesting ones.

Without Tracking Protection

  • 555 requests
  • 8.61MB transferred
  • Total loading time 59 seconds

With Tracking Protection

  • 48 requests
  • 1.87MB transferred
  • Total loading time 3.3 seconds

It should be pointed out that the first version, without Tracking Protection, actually is document complete at about 20 seconds. That means that page loading icon in the browser toolbar stops spinning. That's because internally it starts triggering more downloads (for ads) by JavaScript that executes when the page load event is executed. So you can already start reading the main content and all content related images are ready at this point but it's still ~30 seconds of excessive browser and network activity just to download the ads.

Enable Tracking Protection

Enabling Tracking Protection in Firefox is really easy. It's not an addon or anything else that needs to be installed. Just click Preferences, then click the Privacy & Security tab, scroll down a little and look for Tracking Protection. There choose the Always option. That's it.

Discussion

Tracking Protection is big and involved topic. It digs into the realm of privacy and the right to your digital footprint. It also digs into the realm of making it harder (or easier, depending on how you phrase it) for content creators to generate revenue to be able to keep create content.

The takeaway is that it can mean many things but for people who just want to browse the web much much faster it can be just about performance.

django-cache-memoize

October 27, 2017
3 comments Python, Django

Released a new package today: django-cache-memoize

It's actually quite simple; a Python memoize function that uses Django's cache plus the added trick that you can invalidate the cache my doing the same function call with the same parameters if you just add .invalidate to your function.

The history of it is from my recent Mozilla work on Symbols.

I originally copy and pasted the snippet out that in a blog post and today I extracted it out into its own project with tests, docs, CI and a setup.py.

I'm still amazed how long it takes to make a package with all the "fluff" around it. A lot of the bits in here (like setup.py and pytest.ini etc) are copied from other nicely maintained Python packages. For example, I straight up copied the tox.ini from Jannis Leidel's python-dockerflow. The ratio of actual code writing (including tests!) is far overpowered by the package sit-ups. But I "complain with a pinch of salt" because a lot of time spent was writing documentation and that's equally as important as the code probably.

Concurrent Gzip in Python

October 13, 2017
11 comments Python, Linux, Docker

Suppose you have a bunch of files you need to Gzip in Python; what's the optimal way to do that? In serial, to avoid saturating the GIL? In multiprocessing, to spread the load across CPU cores? Or with threads?

I needed to know this for symbols.mozilla.org since it does a lot of Gzip'ing. In symbols.mozilla.org clients upload a zip file full of files. A lot of them are plain text and when uploaded to S3 it's best to store them gzipped. Basically it does this:


def upload_sym_file(s3_client, payload, bucket_name, key_name):
    file_buffer = BytesIO()
    with gzip.GzipFile(fileobj=file_buffer, mode='w') as f:
        f.write(payload)
    file_buffer.seek(0, os.SEEK_END)
    size = file_buffer.tell()
    file_buffer.seek(0)
    s3_client.put_object(
        Bucket=bucket_name,
        Key=key_name,
        Body=file_buffer
    )
    print(f"Uploaded {size}")

Another important thing to consider before jumping into the benchmark is to appreciate the context of this application; the bundles of files I need to gzip are often many but smallish. The average file size of the files that need to be gzip'ed is ~300KB. And each bundle is between 5 to 25 files.

The Benchmark

For the sake of the benchmark, here, all it does it figure out the size of each gzipped buffer and reports that as a list.

f1 - Basic serial


def f1(payloads):
    sizes = []
    for payload in payloads:
        sizes.append(_get_size(payload))
    return sizes

f2 - Using multiprocessing.Pool


def f2(payloads):  # multiprocessing
    sizes = []
    with multiprocessing.Pool() as p:
        sizes = p.map(_get_size, payloads)
    return sizes

f3 - Using concurrent.futures.ThreadPoolExecutor


def f3(payloads):  # concurrent.futures.ThreadPoolExecutor
    sizes = []
    futures = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for payload in payloads:
            futures.append(
                executor.submit(
                    _get_size,
                    payload
                )
            )
        for future in concurrent.futures.as_completed(futures):
            sizes.append(future.result())
    return sizes

f4 - Using concurrent.futures.ProcessPoolExecutor


def f4(payloads):  # concurrent.futures.ProcessPoolExecutor
    sizes = []
    futures = []
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for payload in payloads:
            futures.append(
                executor.submit(
                    _get_size,
                    payload
                )
            )
        for future in concurrent.futures.as_completed(futures):
            sizes.append(future.result())
    return sizes

Note that when using asynchronous methods like this, the order of items returned is not the same as they're submitted. An easy remedy if you need the results back in order is to not use a list but to use a dictionary. Then you can track each key (or index if you like) to a value.

The Results

I ran this on three different .zip files of different sizes. To get some sanity in the benchmark I made it print out how many bytes it has to process and how many bytes the gzip will manage to do.

# files 66
Total bytes to gzip 140.69MB
Total bytes gzipped 14.96MB
Total bytes shaved off by gzip 125.73MB

# files 103
Total bytes to gzip 331.57MB
Total bytes gzipped 66.90MB
Total bytes shaved off by gzip 264.67MB

# files 26
Total bytes to gzip 86.91MB
Total bytes gzipped 8.28MB
Total bytes shaved off by gzip 78.63MB

Sorry for being eastetically handicapped when it comes to using Google Docs but here goes...


This demonstrates the median times it takes each function to complete, each of the three different files.

In all three files I tested, clearly doing it serially (f1) is the worst. Supposedly since my laptop has more than one CPU core and the others are not being used. Another pertinent thing to notice is that when the work is really big, (the middle 4 bars) the difference isn't as big doing things serially compared to concurrently.

That second zip file contained a single file that was 80MB. The largest in the other two files were 18MB and 22MB.


This is the mean across all medians grouped by function and each compared to the slowest.

I call this the "bestest graph". It's a combination across all different sizes and basically concludes which one is the best, which clearly is function f3 (the one using concurrent.futures.ThreadPoolExecutor).

CPU Usage

This is probably the best way to explain how the CPU is used; I ran each function repeatedly, then opened gtop and took a screenshot of the list of processes sorted by CPU percentage.

f1 - Serially

f1
No distractions but it takes 100% of one CPU to work.

f2 - multiprocessing.Pool

f2
My laptop has 8 CPU cores, but I don't know why I see 9 Python processes here.
I don't know why each CPU isn't 100% but I guess there's some administrative overhead to start processes by Python.

f3 - concurrent.futures.ThreadPoolExecutor

f3
One process, with roughly 5 x 8 = 40 threads GIL swapping back and forth but all in all it manages to keep itself very busy since threads are lightweight to share data to.

f4 - concurrent.futures.ProcessPoolExecutor

f4
This is actually kinda like multiprocessing.Pool but with a different (arguably easier) API.

Conclusion

By a small margin concurrent.futures.ThreadPoolExecutor won. That's despite not being able to use all CPU cores. This, pseudo scientifically, proves that the overhead of starting the threads is (remember average number of files in each .zip is ~65) more worth it than being able to use all CPUs.

Discussion

There's an interesting twist to this! At least for my use case...

In the application I'm working on, there's actually a lot more that needs to be done other than just gzip'ping some blobs of files. For each file I need to a HEAD query to AWS S3 and an PUT query to AWS S3 too. So what I actually need to do is create an instance of client = botocore.client.S3 that I use to call client.list_objects_v2 and client.put_object.

When you create an instance of botocore.client.S3, automatically botocore will instanciate itself with credentials from os.environ['AWS_ACCESS_KEY_ID'] etc. (or read from some /.aws file). Once created, if you ask it to do many different network operations, internally it relies on urllib3.poolmanager.PoolManager which is a list of 10 HTTP connections that get reused.

So when you run the serial version you can re-use the client instance for every file you process but you can only use one HTTP connection in the pool. With the concurrent.futures.ThreadPoolExecutor it can not only re-use the same instance of botocore.client.S3 it can cycle through all the HTTP connections in the pool.

The process based alternatives like multiprocessing.Pool and concurrent.futures.ProcessPoolExecutor can not re-use the botocore.client.S3 instance since it's not pickle'able. And it has to create a new HTTP connection for every single file.

So, the conclusion of the above rambling is that concurrent.futures.ThreadPoolExecutor is really awesome! Not only did it perform excellently in the Gzip benchmark, it has the added bonus that it can share instance objects and HTTP connections.

Simple or fancy UPSERT in PostgreSQL with Django

October 11, 2017
7 comments Python, Web development, Django, PostgreSQL

As of PostgreSQL 9.5 we have UPSERT support. Technically, it's ON CONFLICT, but it's basically a way to execute an UPDATE statement in case the INSERT triggers a conflict on some column value. By the way, here's a great blog post that demonstrates how to use ON CONFLICT.

In this Django app I have a model that has a field called hash which has a unique=True index on it. What I want to do is either insert a row, or if the hash is already in there, it should increment the count and the modified_at timestamp instead.

The Code(s)

Here's the basic version in "pure Django ORM":


if MissingSymbol.objects.filter(hash=hash_).exists():
    MissingSymbol.objects.filter(hash=hash_).update(
        count=F('count') + 1,
        modified_at=timezone.now()
    )
else:
    MissingSymbol.objects.create(
        hash=hash_,
        symbol=symbol,
        debugid=debugid,
        filename=filename,
        code_file=code_file or None,
        code_id=code_id or None,
    )

Here's that same code rewritten in "pure SQL":


from django.db import connection


with connection.cursor() as cursor:
    cursor.execute("""
        INSERT INTO download_missingsymbol (
            hash, symbol, debugid, filename, code_file, code_id,
            count, created_at, modified_at
        ) VALUES (
            %s, %s, %s, %s, %s, %s,
            1, CLOCK_TIMESTAMP(), CLOCK_TIMESTAMP()
          )
        ON CONFLICT (hash)
        DO UPDATE SET
            count = download_missingsymbol.count + 1,
            modified_at = CLOCK_TIMESTAMP()
        WHERE download_missingsymbol.hash = %s
        """, [
            hash_, symbol, debugid, filename,
            code_file or None, code_id or None,
            hash_
        ]
    )

Both work.

Note the use of CLOCK_TIMESTAMP() instead of NOW(). Since Django wraps all writes in transactions if you use NOW() it will be evaluated to the same value for the whole transaction, thus making unit testing really hard.

But which is fastest?

The Results

First of all, this hard to test locally because my Postgres is running locally in Docker so the network latency in talking to a network Postgres means that the latency is less and having to do two different executions would cost more if the network latency is more.

I ran a simple benchmark where it randomly picked one of the two code blocks (above) depending on a 50% chance.
The results are:

METHOD     MEAN       MEDIAN
SQL        6.99ms     6.61ms
ORM        10.28ms    9.86ms

So doing it with a block of raw SQL instead is 1.5 times faster. But this difference would surely grow when the network latency is higher.

Discussion

There's an alternative and that's to use django-postgres-extra but I'm personally hesitant. The above little raw SQL hack is the only thing I need and adding more libraries makes far-future maintenance harder.

Beyond the time optimization of being able to send only 1 SQL instruction to PostgreSQL, the biggest benefit is avoiding concurrency race conditions. From the documentation:

"ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome; provided there is no independent error, one of those two outcomes is guaranteed, even under high concurrency. This is also known as UPSERT — "UPDATE or INSERT"."

I'm going to keep this little hack. It's not beautiful but it works and saves time and gives me more comfort around race conditions.

"No space left on device" on OSX Docker

October 3, 2017
13 comments Web development, macOS, Docker

UPDATE 2020

As Greg Brown pointed out, the new way is:

docker container prune
docker image prune

Original blog post...


 

If you run out of disk space in your Docker containers on OSX, this is probably the best thing to run:

docker rm $(docker ps -q -f 'status=exited')
docker rmi $(docker images -q -f "dangling=true")

The Problem

This isn't the first time it's happened so I'm blogging about it to not forget. My postgres image in my docker-compose.yml didn't start and since it's linked its problem is "hidden". Running it in the foreground instead you can see what the problem is:

▶ docker-compose run db
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
initdb: could not create directory "/var/lib/postgresql/data/pg_xlog": No space left on device
initdb: removing contents of data directory "/var/lib/postgresql/data"

Docker on OSX

I admit that I have so much to learn about Docker and the learning is slow. Docker is amazing but I think I'm slow to learn because I'm just not that interested as long as it works and I can work on my apps.

It seems to me that there's a cap of all storage of all Docker containers in one big file in OSX. It's capped to 64GB:

▶ cd ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/

com.docker.docker/Data/com.docker.driver.amd64-linux
▶ ls -lh Docker.qcow2
-rw-r--r--@ 1 peterbe  staff    63G Oct  3 08:51 Docker.qcow2

If you run the above mentioned commands (docker rm ...) this file does not shrink but space is freed up. Just like how MongoDB (used to) allocates much more disk space than it actually uses.

If you delete that Docker.qcow2 and restart Docker the space problem goes away but then the problem is that you lose all your active containers which is especially annoying if you have useful data in database containers.

cache_memoize - a pretty decent cache decorator for Django

September 11, 2017
4 comments Python, Web development, Django

UPDATE - Oct 27, 2017 This snippet did now become its own PyPI package. See https://pypi.python.org/pypi/django-cache-memoize

This is something that's grown up organically when working on Mozilla Symbol Server. It has served me very well and perhaps it's worth extracting into its own lib.

Usage

Basically, you are probably used to this in Django:


from django.core.cache import cache

def compute_something(user, special=False):
    cache_key = 'meatycomputation:{}:special={}'.format(user.id, special)
    value = cache.get(cache_key)
    if value is None:
        value = _call_the_meat(user.id, special)  # some really slow function
        cache.set(cache_key, value, 60 * 5)
    return value

Here's instead how you can do exactly the same with cache_memoize:


from wherever.decorators import cache_memoize

@cache_memoize(60 * 5)
def compute_something(user, special=False):
    return _call_the_meat(user.id, special)  # some really slow function

Cache invalidation

If you ever need to do non-trivial caching you know it's important to be able to invalidate the cache. Usually, to be able to do that you need to involved in how the cache key was created.

Consider our two examples above, here's first the common thing to do:


def save_user(user):
    do_something_that_will_need_to_cache_invalidate(user)

    cache_key = 'meatycomputation:{}:special={}'.format(user.id, False)
    cache.delete(cache_key)
    # And when it was special=True
    cache_key = 'meatycomputation:{}:special={}'.format(user.id, True)
    cache.delete(cache_key)

This works but it involves repeating the code that generates the cache key. You could extract that into its own function of course.

Here's how you do it with the cache_memoize decorator:


def save_user(user):
    do_something_that_will_need_to_cache_invalidate(user)

    compute_something.invalidate(user, special=False)
    compute_something.invalidate(user, special=True)    

Other features

There are actually two ways to "invalidate" the cache. Calling the new myoriginalfunction.invalidate(...) function or passing a custom extra keyword argument called _refresh. For example: compute_something(user, _refresh=True).

You can pass callables that get called when the cache works in your favor or when it's a cache miss. For example:


def increment_hits(user, special=None):
    # use your imagination
    metrics.incr(user.email)

def cache_miss(user, special=None):
    print("cache miss on {}".format(user.email))

@cache_memoize(
    60 * 5,
    hit_callable=increment_hits,
    miss_callable=cache_miss,
)
def compute_something(user, special=False):
    return _call_the_meat(user.id, special)  # some really slow function

Sometimes you just want to use the memoizer to make sure something only gets called "once" (or once per time interval). In that case it might be smart to not flood your cache backend with the value of the function output if there is one. For example:


@cache_memoize(60 * 60, store_result=False)  # idempotent guard
def calculate_and_update(user):
    # do something expensive here that is best to only do once per hour

Internally cache_memoize will basically try to convert every argument and keyword argument to a string with, kinda, str(). That might not always be appropriate because you might know that you have two distinct objects whose __str__ will yield the same result. For that you can use the args_rewrite parameter. For example:


def simplify_special_objects(obj):
    # use your imagination
    return obj.hostname 

@cache_memoize(60 * 5, args_rewrite=simplify_special_objects)
def compute_something(special_obj):
    return _call_the_meat(special_obj.hostname)

In conclusion

I've uploaded the code as a gist.

It's quite possible that there's already a perfectly good lib that does exactly this. If so, thanks for letting me know. If not, perhaps I ought to wrap this up and publish it on PyPI. Again, that's for letting me know.

UPDATE

I found a bug in the original gist. Updated 2017-10-05.
The bug was that the calling of miss_callable and hit_callable was reversed.