mozjpeg installation and sample

October 10, 2015
3 comments Linux, Web development, Mozilla

I've written about mozjpeg before where I showed what it can do to a sample directory full of different kinds of JPEGs. But let's get more real. Let's actually install it and look at one thumbnail and one big photo.

To install, I used the pre-compiled binaries from this wonderful site. Like this:

# wget http://mozjpeg.codelove.de/bin/mozjpeg_3.1_amd64.deb
# dpkg -i mozjpeg_3.1_amd64.deb
# ls -l /opt/mozjpeg/bin/cjpeg
-rwxr-xr-x 1 root root 50784 Sep  3 19:03 /opt/mozjpeg/bin/cjpeg

I don't know why the binary executable becomes called cjpeg but that's fine. Let's put it in $PATH so other users can execute it:

# cd /usr/local/bin
# ln -s /opt/mozjpeg/bin/cjpeg

Now, let's actually use it for something. First we need a realistic lossy thumbnail that we can optimize.

$ wget http://www.peterbe.com/static/cache/eb/f0/ebf08e64e80170dc009e97f6f9681ceb.jpg

This was one of the thumbnails from a previous post called Panasonic Lumix from 2008 or a iPhone 5S from 2014.

Let's optimize!

$ jpeg -outfile ebf08e64e80170dc009e97f6f9681ceb.moz.jpg -optimise ebf08e64e80170dc009e97f6f9681ceb.jpg
$ ls -l ebf08e64e80170dc009e97f6f9681ceb.*
-rw-rw-r-- 1 django django 11391 Sep 26 17:04 ebf08e64e80170dc009e97f6f9681ceb.jpg
-rw-r--r-- 1 django django  9414 Oct 10 01:40 ebf08e64e80170dc009e97f6f9681ceb.moz.jpg

Yay! It's 17.4% smaller. Saving 1.93Kb.

So what do they look like? See for yourself:

I have to zoom in (⌘-+) 3 times until I can see any difference. But remember, the saving isn't massive but the usecase here is a thumbnail.

So, let's do the same with a non-thumbnail. Some huge JPEG.

$ time cjpeg -outfile Lumix-2.moz.jpg -optimise Lumix-2.jpg
real    0m3.285s
user    0m3.122s
sys     0m0.080s
$ ls -l Lumix*
-rw-rw-r-- 1 django django 4880446 Sep 26 17:20 Lumix-2.jpg
-rw-rw-r-- 1 django django 1546978 Oct 10 02:02 Lumix-2.moz.jpg
$ ls -lh Lumix*
-rw-rw-r-- 1 django django 4.7M Sep 26 17:20 Lumix-2.jpg
-rw-rw-r-- 1 django django 1.5M Oct 10 02:02 Lumix-2.moz.jpg

In other words, from 4.7Mb to 1.5Mb. It's 68.3% the size of the original. And the visual difference?

Again, I have to zoom in 3 times to be able to tell any difference and even when I've done that it's hard to tell which is which.

In conclusion, let's go ahead and use mozjpeg to optimize thumbnails.

localStorage is not async, but it's FAST!

October 6, 2015
7 comments Web development, AngularJS, JavaScript

A long time I go I wrote an angular app that was pleasantly straight forward. It loads all records from the server in one big fat AJAX GET. The data is large, ~550Kb as a string of JSON, but that's OK because it's a fat-client app and it's extremely unlikely to grow any multiples of this. Yes, it'll some day go up to 1Mb but even that is fine.

Once ALL records are loaded with AJAX from the server, you can filter the whole set and paginate etc. It feels really nice and snappy. However, the app is slightly smarter than that. It has two cool additional features...

  1. Every 10 seconds it does an AJAX query to ask "Have any records been modified since {{insert latest modify date of all known records}}?" and if there's stuff, it updates.

  2. All AJAX queries from the server are cached in the browser's local storage (note, I didn't write localStorage, "local storage" encompasses multiple techniques). The purpose of that is to that on the next full load of the app, we can at least display what we had last time whilst we wait for the server to return the latest and greatest via a slowish network request.

  3. Suppose we have brand new browser with no local storage, because the default sort order is always known, instead of doing a full AJAX get of all records, it does a small one first: "Give me the top 20 records ordered by modify date" and once that's in, it does the big full AJAX request for all records. Thus bringing data to the eyes faster.

All of these these optimization tricks are accompanied with a flash message at the top that says: <img src="spinner.gif"> Currently using cached data. Loading all remaining records from server....

When I built this I decided to use localForage which is a convenience wrapper over localStorage AND IndexedDB that does it all asynchronously and with proper promises. And to make it work in AngularJS I used angular-localForage so it would work with Angular's cycle updates without custom $scope.$apply() stuff. I thought the advantage of this is that it being async means that the main event can continue doing important rendering stuff whilst the browser saves things to "disk" in the background.

Also, I was once told that localStorage, which is inherently blocking, has the risk that calling it the first time in a while might cause the browser to have to take a major break to boot data from actual disk into the browsers allocated memory. Turns out, that is extremely unlikely to be a problem (more about this is a future blog post). The warming up of fetching from disk and storing into the browser's memory happens when you start the browser the very first time. Chrome might be slightly different but I'm confident that this is how things work in Firefox and it has for many many months.

What's very important to note is that, by default, localForage will use IndexedDB as the storage backend. It has the advantage that it's async to boot and it supports much large data blobs.

So I timed, how long does it take for localForage to SET and GET the ~500Kb JSON data? I did that like this, for example:


var t0 = performance.now();
$localForage.getItem('eventmanager')
.then(function(data) {
    var t1 = performance.now();
    console.log('GET took', t1 - t0, 'ms');
    ...

The results are as follows:

Operation Iterations Average time
SET 4 341.0ms
GET 4 184.0ms

In all fairness, it doesn't actually matter how long it takes to save because my app actually doesn't depend on waiting for that promise to resolve. But it's an interesting number nevertheless.

So, here's what I did. I decided to drop all of that fancy localForage stuff and go back to basics. All I really need is these two operations:


// set stuff
localStorage.setItem('mykey', JSON.stringify(data))
// get stuff
var data = JSON.parse(localStorage.getItem('mykey') || '{}')

So, after I've refactored my code and deleted (6.33Kb + 22.3Kb) of extra .js files and put some performance measurements in:

Operation Iterations Average time
SET 4 5.9ms
GET 4 3.3ms

Just WOW!
That is so much faster. Sure the write operation is now blocking, but it's only taking 6 milliseconds. And the reason it took IndexedDB less than half a second also probably means more hard work for it to sweat CPU over.

Sold? I am :)

django-pipeline + django-jinja

October 4, 2015
2 comments Django

Do you have django-jinja in your Django 1.8 project to help you with your Jinja2 integration, and you use django-pipeline for your static assets?
If so, you need to tie them together by passing pipeline.templatetags.ext.PipelineExtension "to your Jinja2 environment". But how? Here's how:


# in your settings.py


from django_jinja.builtins import DEFAULT_EXTENSIONS

TEMPLATES = [
    {
        'BACKEND': 'django_jinja.backend.Jinja2',
        'APP_DIRS': True,
        'OPTIONS': {
            'match_extension': '.jinja',
            'context_processors': [
                ...
            ],
            'extensions': DEFAULT_EXTENSIONS + [
                'pipeline.templatetags.ext.PipelineExtension',
            ],
        }
    },
    ...

Now, in your template you simply use the {% stylesheet '...' %} or {% javascript '...' %} tags in your .jinja templates without the {% load pipeline %} stuff.

It took me a little while to figure that out so I hope it helps someone else googling around for a solution alike.

Using Lovefield as an AJAX proxy maybe

September 30, 2015
1 comment Web development, JavaScript

Lovefield, by Arthur Hsu at Google, is a cool little Javascript browser abstraction on top of IndexedDB. IndexedDB is this amazingly powerful asynchronous framework for storing data in the browser tied to the domain you're visiting. Unlike it's much "simpler" sibling localStorage, with IndexedDB you can store individual keys in a schema, use indexes for faster retrieval, asynchronous querying and much larger memory capacity than general DOM storage (e.g. localStorage and sessionStorage).

What Lovefield brings is best described by watching this video. But to save you time let me try...:

  • "Lovefield is a relational database for web apps"
  • Adds structural query capability to IndexedDB without having to work with any callbacks
  • It's not plain SQL strings that get executed but instead something similar to an ORM (e.g. SQLAlchemy)
  • Supports indexes for speedier lookups
  • Supports doing joins across different "tables"
  • Works in IE, Chrome, Firefox and Safari (although I don't know about iOS Safari)

Anyway, it sounds really cool and I'm looking forward to using it for something. But before I do I thought I'd try using it as a "AJAX proxy".

So what's an AJAX proxy, you ask. Well, it can mean many things but what I have in mind is a pattern where a web app's MVC is tied to a local storage and the local storage is tied to AJAX. That means two immediate benefits:

  • The MVC part of the web app is divorced from understanding network APIs.
  • The web app becomes offline capable to boot. Being able to retrieve fresh data from the network (or send!) is a luxury that you automatically get if you have a network connection.

Another subtle benefit is a "corner case" of that offline capability; when you load up the app you can read from local storage much much faster than from the network meaning you can display user data on the screen sooner. (With the obvious caveat that it might be stale and that the data will change once the network read is completed later)

So, to take this idea for a spin (the use of a local storage to remember the loaded data last time first) I extended my AJAX or Not playground with a hybrid that uses React to render the data, but render the data from Lovefield (and from localStorage too). Remember, it's an experiment so it's a bit clunky and perhaps contrieved. The objective is to notice how soon after loading the page, that data because available for your eyes to consume.

Here is the playground test page

You have to load it at least once to fill your IndexedDB with some data from an AJAX request. Then, reload the page and it'll display what it has locally (in IndexedDB extracted with the Lovefield API). Then, after it's loaded, try refreshing the browser repeatedly. With or without a Wifi connection.

Basically, it works. However, perhaps I've chosen the worst possible test bed for playing with Lovefield. Because it is super slow. If you open the web console, you'll see it reports how long it takes to extract the data out of Lovefield. The code looks like this:


...
getPostsFromDB: function() {
  return schemaBuilder.connect().then(function(db) {
    var table = db.getSchema().table('post');
    return db.select().from(table).exec();
  });
},
...
var t0 = performance.now();
this.getPostsFromDB()
.then(function(results) {
  var t1 = performance.now();
  console.log(results.length, 'records extracted');
  console.log((t1 - t0).toFixed(2) + 'ms to extract');
  ...

You can see the source here in full.

So out of curiousity, I forked this experiment. Kept almost all the React code but replaced the Lovefield stuff with good old JSON.parse(localStorage.getItem('posts') || '[]'). See code here.
This only takes 1-2 milliseconds. Lovefield repeatedly takes about 400-550 milliseconds on my Firefox version 43.

By the way, load up the localStorage fork and after a first load, try refreshing it over and over and notice how amazingly fast it is. Yay localStorage!

Panasonic Lumix from 2008 or a iPhone 5S from 2014

September 26, 2015
5 comments Photos

Rummaging through an old box I found an old digital camera I bought in 2008, the Panasonic Lumix DMC-LX3. It was hot stuff when it came out and I loved it. So much lighter and smaller than my previous Nikon DSLR behemoth.

But how does this 7 year old camera compare to my iPhone 5S?? Without any scientific rigor I went to the park and took one picture with each "camera" (the iPhone is not really a camera, it just has a (good) camera).

Note! The thumbnails shown below are heavily optimized for web use. You have to click to see the original.

Here are the pictures taken with the Lumix:

Lumix 1

Lumix 2

Lumix 3


Lumix 4

Lumix 5



And here are the pictures taken with the iPhone 5S:

iPhone 5S 1

iPhone 5S 2

iPhone 5S 3


iPhone 5S 4

iPhone 5S 5



To compare, the best thing you can do is to open one of each so to say in separate tabs, or download, and zoom in and stare it down.

The total pixel area across all 5 images is about the same. The iPhone 5S pictures are slightly smaller in terms of dimension. The Lumix pictures are all 3,648x2,736 pixels. The iPhone 5S pictures are 3,264x2,448 pixels.

The 5 Lumix pictures weigh 19.1Mb and the iPhone 5S pictures weigh 11.6Mb.

Observations

  • The color on the Braille sign, the 4th one, is significantly different. I can't tell or remember which one is "right".
  • The last one, of the wall clock in my garage, was taken with flash on both. Interestingly, with the iPhone 5S, the picture was actually better without flash.
  • The iPhone 5S pictures adds a yellowish glow which feels artificial but looks quite nice and warm.

In conclusion

I don't know which is better. The Lumix weighs more and is bigger volume than the iPhone and it doesn't have a web browser, GPS or WiFi. So if the pictures are about the same, the iPhone wins.

What do you think? If we ignore the practical aspect of carrying the Lumix, which pictures do you prefer?

ElasticSearch, snowball analyzer and stop words

September 25, 2015
1 comment Python

Disclaimer: I'm an ElasticSearch noob. Go easy on me

I have an application that uses ElasticSearch's more_like_this query to find related content. It basically works like this:

>>> index(index, doc_type, {'id': 1, 'title': 'Your cool title is here'})
>>> index(index, doc_type, {'id': 2, 'title': 'About is a cool headline'})
>>> index(index, doc_type, {'id': 3, 'title': 'Titles are your big thing'})

Then you can pick one ID (1, 2 or 3) and find related ones.
We can tell by looking at these three silly examples, the 1 and 2 have the words "is" and "cool" in common. 1 and 3 have "title" (stemming taken into account) and "your" in common. However, is there much value in connected these documents on the words "is" and "your"? I think not. Those are stop words. E.g. words like "the", "this", "from", "she" etc. Basically words that are commonly used as "glue" between more unique and specific words.

Anyway, if you index something in ElasticSearch as a text field you get, by default, the "standard" analyzer to analyze the incoming stuff to be indexed. The standard analyzer just splits the words on whitespace. A more compelling analyzer is the Snowball analyzer (original here) which supports intelligent stemming (turning "wife" ~= "wives") and stop words.

The problem is that the snowball analyzer has a very different set of stop words. We did some digging and thought this was the list it bases its English stop words on. But this was wrong. Note that that list has words like "your" and "about" listed there.

The way to find out how your analyzer treats a string and turns it into token is to the the _analyze tool. For example:

curl -XGET 'localhost:9200/{myindexname}/_analyze?analyzer=snowball' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 5,
      "token": "about",
      "type": "<ALPHANUM>",
      "start_offset": 0,
      "position": 1
    },
    {
      "end_offset": 10,
      "token": "your",
      "type": "<ALPHANUM>",
      "start_offset": 6,
      "position": 2
    },
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

So what you can see is that it finds the tokens "about", "your", "special" and "word". But it stop word ignored "is", "a" and "the". Hmm... I'm not happy with that. I don't think "about" and "your" are particularly helpful words.

So, how do you define your own stop words and override the one in the Snowball analyzer? Well, let me show you.

In code, I use pyelasticsearch so the index creation is done in Python.


STOPWORDS = (
    "a able about across after all almost also am among an and "
    "any are as at be because been but by can cannot could dear "
    "did do does either else ever every for from get got had has "
    "have he her hers him his how however i if in into is it its "
    "just least let like likely may me might most must my "
    "neither no nor not of off often on only or other our own "
    "rather said say says she should since so some than that the "
    "their them then there these they this tis to too twas us "
    "wants was we were what when where which while who whom why "
    "will with would yet you your".split()
)

def create():
    es = get_connection()
    index = get_index()
    es.create_index(index, settings={
        'settings': {
            'analysis': {
                'analyzer': {
                    'extended_snowball_analyzer': {
                        'type': 'snowball',
                        'stopwords': STOPWORDS,
                    },
                },
            },
        },
        'mappings': {
            doc_type: {
                'properties': {
                    'title': {
                        'type': 'string',
                        'analyzer': 'extended_snowball_analyzer',
                    },
                }
            }
        }
    })

With that in place, now delete your index and re-create it. Now you can use the _analyze tool again to see how it analyzes text on this particular field. But note, to do this we need to know the name of the index we used. (so replace {myindexname} in the URL):

$ curl -XGET 'localhost:9200/{myindexname}/_analyze?field=title' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

Cool! Now we see that it considers "about" and "your" as stop words. Much better. This is handy too because you might have certain words that are globally not very common but within your application it's very repeated and not very useful.

Thank you willkg and Erik Rose for your support in tracking this down!

django-semanticui-form

September 14, 2015
2 comments Python, Django

I'm working on a (side)project in Django that uses the awesome Semantic UI CSS framework. This project has some Django forms that are rendered on the server and so I can't let Django render the form HTML or else the CSS framework can't do its magic.

The project is called django-semanticui-form and it's a fork from django-bootstrap-form.

It doesn't come with the Semantic UI CSS files at all. That's up to you. Semantic UI is available as a big fat bundle (i.e. one big .css file) but generally you just pick the components you want/need. To use it in your Django templates simply, create a django.forms.Form instance and render it like this:


{% load semanticui %}

<form>
  {{ myform | semanticui }}
</form>

The project is very quickly put together. The elements I intend to render seem to work but you might find that certain input elements don't work as nicely. However, if you want to help on the project, it's really easy to write tests and run tests. And Travis and automatic PyPI deployment is all set up so pull requests should be easy.

peepin - a great companion to peep

September 10, 2015
0 comments Python

I actually wrote peepin several months ago but forgot to blog about it.
It's a great library that accompanies peep which is a wrapper on top of pip. Actually, it's for pip install. When you normally do pip install -r requirements.txt the only check it does is on the version number, assuming your requirements.txt has lines in it like Django==1.8.4. With peep it does a checksum comparison of the wheel, tarball or zip file. It basically means that the installer will get EXACTLY the same package files as was used by the developer who decides to add it to requirements.txt.

If you're using pip and want strong reliability and much higher security, I strongly recommend you consider switching to peep.

Anyway, what peepin is, is a executable use to modify your requirements.txt automatically for you. It can do two things. At least one.

1) Automatically figure out what the right checksums should be.
2) It can figure out what is the latest version on PyPI.

For example:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form $)$ peepin --verbose django-bootstrap-form
* Latest version for 3.2
https://pypi.python.org/pypi/django-bootstrap-form/3.2
* Found URL https://pypi.python.org/packages/source/d/django-bootstrap-form/django-bootstrap-form-3.2.tar.gz#md5=1e95b05a12362fe17e91b962c41d139e
*   Re-using /var/folders/1x/2hf5hbs902q54g3bgby5bzt40000gn/T/django-bootstrap-form-3.2.tar.gz
*   Hash AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
* Editing requirements.txt

And once that's done...:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form *$)$ git diff
diff --git a/requirements.txt b/requirements.txt
index a6600f1..5f1374c 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -83,8 +83,8 @@ BeautifulSoup==3.2.1
 django_compressor==1.4
 # sha256: F3KVsUQkAMks22fo4Y-f9ZRvtEL4WBO50IN4I3IuoI0
 django-cronjobs==0.2.3
-# sha256: 2G3HpwzvCTy3dc1YE7H4XQH6ZN8M3gWpkVFR28OOsNE
-django-bootstrap-form==3.1
+# sha256: AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
+django-bootstrap-form==3.2
 # sha256: jiOPwzhIDdvXgwiOhFgqN6dfB8mSdTNzMsmjmbIBkfI
 regex==2014.12.24
 # sha256: ZY2auoUzi-jB0VMsn7WAezgdxxZuRp_w9i_KpCQNnrg
 

If you want to you can open up and inspect the downloaded package and check that no hacker has meddled with the package. Or, if you don't have time to do that, at least use the package locally and run your tests etc. If you now feel comfortable with the installed package you can be 100% certain that will be installed on your server once the code goes into production.

Be careful with using dict() to create a copy

September 9, 2015
9 comments Python

Everyone who's done Python for a while soon learns that dicts are mutable. I.e. that they can change.

One way of "forking" a dictionary into two different ones is to create a new dictionary object with dict(). E.g:


>>> first = {'key': 'value'}
>>> second = dict(first)
>>> second['key'] = 'other'
>>> first
{'key': 'value'}
>>> second
{'key': 'other'}

See, you can change the value of a key without affecting the dictionary it came from.

But, if one of the values is also mutable, beware!


>>> first = {'key': ['value']}
>>> second = dict(first)
>>> second['key'].append('second value')
>>> first
{'key': ['value', 'second value']}
>>> second
{'key': ['value', 'second value']}

This is where you need to use the built in copy.deepcopy.


>>> import copy
>>> first = {'key': ['value']}
>>> second = copy.deepcopy(first)
>>> second['key'].append('second value')
>>> first
{'key': ['value']}
>>> second
{'key': ['value', 'second value']}

Yay! Hope it helps someone avoid some possibly confusing bugs some day.

UPDATE

As ëRiC reminded me, there are actually three ways to make a "shallow copy" of a dictionary:

1) some_copy = dict(some_dict)

2) some_copy = some_dict.copy()

3) some_copy = copy.copy(some_dict) # after importing 'copy'

Examples of mozjpeg savings

September 1, 2015
5 comments Web development, Mozilla

I'm currently working on a Django library that uses mozjpeg to optimize thumbnails that are generated from stored images. I first wanted to get a feel for how good mozjpeg really is.

In my ~/Downloads directory I have all sorts of "junk" from all sorts of saves and experiments. It'll work as a good testbed of relatively random JPEG images of all sorts of sizes and qualities. Without further ado, here's the results:

FILENAME                                          OPTIMIZE   ORIGINAL     SAVING  PERCENT
-----------------------------------------------------------------------------------------
180697_1836563311933_3364808_n.jpg                  45.2Kb     50.4Kb      5.1Kb    10.2%
2014-03-20 17.35.39.jpg                           2040.1Kb   2207.8Kb    167.7Kb     7.6%
2015-03-04 21.18.16.jpg                           1521.5Kb   1629.2Kb    107.7Kb     6.6%
2015-03-04 21.19.16.jpg                           1602.4Kb   1720.0Kb    117.6Kb     6.8%
2015-03-04 21.23.16.jpg                           1181.7Kb   1272.1Kb     90.4Kb     7.1%
2015-03-05 06.03.00.jpg                           1426.7Kb   1557.7Kb    131.0Kb     8.4%
20150626_200629_001.jpg                           1566.4Kb   1717.3Kb    151.0Kb     8.8%
20150626_200631.jpg                               2157.6Kb   2319.6Kb    162.0Kb     7.0%
Boba_Fett_by_RobD4E.jpg                             96.2Kb    104.3Kb      8.1Kb     7.8%
Horse_Play.jpg                                     170.4Kb    185.2Kb     14.9Kb     8.0%
Image (107).jpg                                    344.9Kb    390.6Kb     45.7Kb    11.7%
Misc Candle Holder NECA FOTR Balrog Dec2002.jpg     37.1Kb     37.7Kb      0.6Kb     1.5%
Mozilla_Lightbeam.jpg                               55.1Kb     79.7Kb     24.6Kb    30.8%
Photo on 12-17-14 at 5.55 PM.jpg                   168.5Kb    187.7Kb     19.2Kb    10.2%
dev.jpg                                             17.5Kb     30.8Kb     13.3Kb    43.2%
dev2.jpg                                            41.1Kb     54.3Kb     13.3Kb    24.4%
dev3.jpg                                            35.3Kb     49.0Kb     13.7Kb    28.0%
dev4.jpg                                            42.0Kb     56.0Kb     14.0Kb    25.0%
dev5.jpg                                            24.6Kb     37.9Kb     13.2Kb    35.0%
dev6.jpg                                            28.9Kb     42.8Kb     13.9Kb    32.4%
hr_0570_220_135__0570220135006.jpg                3124.3Kb   3467.8Kb    343.5Kb     9.9%
hr_0570_220_158__0570220158006.jpg                3010.0Kb   3319.1Kb    309.1Kb     9.3%
hr_0570_220_175__0570220175006.jpg                2245.5Kb   2442.6Kb    197.0Kb     8.1%
hr_0570_227_599__0570227599006.jpg                2561.7Kb   2809.8Kb    248.1Kb     8.8%
hr_0596_622_701__0596622701006.jpg                3238.8Kb   3453.6Kb    214.7Kb     6.2%
hr_0596_623_849__0596623849006.jpg                2902.9Kb   3102.1Kb    199.3Kb     6.4%
hr_0622_219_873__0622219873006.jpg                 985.3Kb   1066.9Kb     81.7Kb     7.7%
logo.jpg                                            43.5Kb     51.2Kb      7.7Kb    15.1%
mvm-header.jpg                                       8.5Kb     12.4Kb      3.9Kb    31.6%
mvm-postcard-picture.jpg                            72.2Kb     73.4Kb      1.3Kb     1.7%
overhang_pixels.jpg                               3014.3Kb   3370.8Kb    356.4Kb    10.6%
peterbe copy.jpg                                     4.2Kb     10.4Kb      6.2Kb    59.7%
peterbe.jpg                                         36.7Kb     44.3Kb      7.5Kb    17.0%
pjt-mcguinty-2.jpg                                  96.8Kb    101.6Kb      4.8Kb     4.8%
sl1.jpg                                             28.7Kb     35.4Kb      6.7Kb    18.9%

That's an median of 9.3% (average of 15.3%) savings.

It's not very fast though. Some of the large files take more than a second. In total it took 23.7 seconds to create all of those optimized files. Do what you want with that fact, bear in mind that these are hopefully "once in a lifetime" operations (depending on the ephemerality of your thumbnail storage). Mind you, the really large JPEGs skew that since the median is 72.1 milliseconds and average is 527.0 milliseconds. Also, when I look through the numbers I find that the large JPGs take the longest but had the least benefit in terms of byte savings.

UPDATE

Chris Adams, in the comment below, inspired me to compare my trials with jpegoptim and jpegrescan. So, I took my script that generated a directory of 45 JPEGs and changed it to use jpegoptim and jpegrescan.

The mozjpeg total size of that output directory is 34.1Mb and it took a total of 23.3 seconds (median 76.4 milliseconds).

The jpegoptim & jpegrescan total size of that output directory is 35.6Mb and it took a total of 4.6 seconds (median 32.1 milliseconds).

In other words, roughly speaking mozjpeg is 4.2% more space effective and 58% slower than jpegoptim & jpegrescan.