So I have a massive chunk of JSON that a Django view is sending to a piece of Angular that displays it nicely on the page. It's big. 674Kb actually. And it's likely going to be bigger in the near future. It's basically a list of dicts. It looks something like this:
>>> pprint(d['events'][0])
{u'archive_time': None,
u'archive_url': u'/manage/events/archive/1113/',
u'channels': [u'Main'],
u'duplicate_url': u'/manage/events/duplicate/1113/',
u'id': 1113,
u'is_upcoming': True,
u'location': u'Cyberspace - Pacific Time',
u'modified': u'2014-08-06T22:04:11.727733+00:00',
u'privacy': u'public',
u'privacy_display': u'Public',
u'slug': u'bugzilla-development-meeting-20141115',
u'start_time': u'15 Nov 2014 02:00PM',
u'start_time_iso': u'2014-11-15T14:00:00-08:00',
u'status': u'scheduled',
u'status_display': u'Scheduled',
u'thumbnail': {u'height': 32,
u'url': u'/media/cache/e7/1a/e71a58099a0b4cf1621ef3a9fe5ba121.png',
u'width': 32},
u'title': u'Bugzilla Development Meeting'}
So I thought one hackish simplification would be to convert each of these dicts into an list with a known sort order. Something like this:
>>> event = d['events'][0]
>>> pprint([event[k] for k in sorted(event)])
[None,
u'/manage/events/archive/1113/',
[u'Main'],
u'/manage/events/duplicate/1113/',
1113,
True,
u'Cyberspace - Pacific Time',
u'2014-08-06T22:04:11.727733+00:00',
u'public',
u'Public',
u'bugzilla-development-meeting-20141115',
u'15 Nov 2014 02:00PM',
u'2014-11-15T14:00:00-08:00',
u'scheduled',
u'Scheduled',
{u'height': 32,
u'url': u'/media/cache/e7/1a/e71a58099a0b4cf1621ef3a9fe5ba121.png',
u'width': 32},
u'Bugzilla Development Meeting']
So I converted my sample events.json
file like that:
$ l -h events*
-rw-r--r-- 1 peterbe wheel 674K Aug 8 14:08 events.json
-rw-r--r-- 1 peterbe wheel 423K Aug 8 15:06 events.optimized.json
Excitingly the file is now 250Kb smaller because it no longer contains all those keys.
Now, I'd also send the order of the keys so I could do something like this in the AngularJS code:
.success(function(response) {
events = []
response.events.forEach(function(event) {
var new_event = {}
response.keys.forEach(function(key, i) {
new_event[k] = event[i]
})
})
})
Yuck! Nested loops! It was just getting more and more complicated.
Also, if there are keys that are not present in every element, it means I'd have to replace them with None
.
At this point I stopped and I could smell the hackish stink of sulfur of the hole I was digging myself into.
Then it occurred to me, gzip is really good at compressing repeated things which is something we have plenty of in a document store type data structure that a list of dicts is.
So I packed them manually to see what we could get:
$ apack events.json.gz events.json
$ apack events.optimized.json.gz events.optimized.json
And without further ado...
$ l -h events*
-rw-r--r-- 1 peterbe wheel 674K Aug 8 14:08 events.json
-rw-r--r-- 1 peterbe wheel 90K Aug 8 14:20 events.json.gz
-rw-r--r-- 1 peterbe wheel 423K Aug 8 15:06 events.optimized.json
-rw-r--r-- 1 peterbe wheel 81K Aug 8 15:07 events.optimized.json.gz
Basically, all that complicated and slow hoopla for saving 10Kb. No thank you.
Thank you gzip for existing!