Real minimal example of going from setState to MobX

May 4, 2018
0 comments JavaScript, React

This is not meant as a tutorial on MobX but hopefully it can be inspirational for people who have grokked how React's setState works but now feel they need to move the state management in their React app out of the components.

Store.js
To jump right in, here is a changeset that demonstrates how to replace setState with a MobX store:
https://github.com/peterbe/workon/commit/c1846ce782ce7c9da16f44b10c48f0be1337ae41

It's a really simple Todo list application based on create-react-app. Not much to read into at this point.

Here are some caveats to be aware if you look at the diff and wonder...

  • As part of this change, I moved the logic from the App component and created a new sub-component (that App renders) called TodoList. This was not necessary to add MobX.
  • There are a bunch of little unrelated edits in that such as deleting some commented out code.
  • store.items.sort((a, b) => b.id - a.id); doesn't actually work. You're supposed to do store.items.replace(store.items.sort((a, b) => b.id - a.id));.
  • Later I made the Item component also be an observer and not just the TodoList component.
  • The exported store is called store and, in this version, is an instance of the TodoStore class. The intention is to make store be an instance of combined different store classes, with TodoStore being just one of them.

Caveat last but not least... This diff does not much other than adding more library dependencies and fancy "observable arrays" that are hard to introspect with console.log debugging.
However, the intention is to...

  1. Add react-router to the mix so opening the Todo list is just one of many possible views.
  2. Now the Store.js file can be all about data. Data retrieval, storage, manipulation, mutation etc. The other components will be more simple since their only job is to render that's in the store and send events back to the store based on user actions.
  3. Note that the store is also put into window. That means I can open the web console and type store.items[2].text = "Test change" and simply by hitting enter the app re-renders to this change.

gtop is best

May 2, 2018
0 comments Linux, macOS, JavaScript

To me, using top inside a Linux server via SSH is all muscle-memory and it's definitely good enough. On my Macbook when working on some long-running code that is resource intensive the best tool I know of is: gtop

gtop in action
gtop in action

I like it because it has the graphs I want and need. It splits up the work of each CPU which is awesome. That's useful for understanding how well a program is able to leverage more than one CPU process.

And it's really nice to have the list of Processes there to be able to quickly compare which programs are running and how that might affect the use of the CPUs.

Instead of listing alternatives I've tried before, hopefully this Reddit discussion has good links to other alternatives

Fastest Python datetime parser

May 2, 2018
0 comments Python

tl;dr; It's ciso8601.

I have this Python app that I'm working on. It has a cron job that downloads a listing of every single file listed in an S3 bucket. AWS S3 publishes a manifest of .csv.gz files. You download the manifest and for each hashhashash.csv.gz you download those files. Then my program reads these CSV files and it is able to ignore certain rows based on them being beyond the retention period. It basically parses the ISO formatted datetime as a string, compares it with a cutoff datetime.datetime instance and is able to quickly skip or allow it in for full processing.

At the time of writing, it's roughly 160 .csv.gz files weighing a total of about 2GB. In total it's about 50 millions rows of CSV. That means it's 50 million datetime parsings.

I admit, this cron job doesn't have to be super fast and it's OK if it takes an hour since it's just a cron job running on a server in the cloud somewhere. But I would like to know, is there a way to speed up the date parsing because that feels expensive to do in Python 50 million times per day.

Here's the benchmark:


import csv
import datetime
import random
import statistics
import time

import ciso8601


def f1(datestr):
    return datetime.datetime.strptime(datestr, '%Y-%m-%dT%H:%M:%S.%fZ')


def f2(datestr):
    return ciso8601.parse_datetime(datestr)


def f3(datestr):
    return datetime.datetime(
        int(datestr[:4]),
        int(datestr[5:7]),
        int(datestr[8:10]),
        int(datestr[11:13]),
        int(datestr[14:16]),
        int(datestr[17:19]),
    )


# Assertions
assert f1(
    '2017-09-21T12:54:24.000Z'
).strftime('%Y%m%d%H%M') == f2(
    '2017-09-21T12:54:24.000Z'
).strftime('%Y%m%d%H%M') == f3(
    '2017-09-21T12:54:24.000Z'
).strftime('%Y%m%d%H%M') == '201709211254'


functions = f1, f2, f3
times = {f.__name__: [] for f in functions}

with open('046444ae07279c115edfc23ba1cd8a19.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        func = random.choice(functions)
        t0 = time.clock()
        func(row[3])
        t1 = time.clock()
        times[func.__name__].append((t1 - t0) * 1000)


def ms(number):
    return '{:.5f}ms'.format(number)


for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', format(len(numbers), ','), 'times')
    print('\tBEST  ', ms(min(numbers)))
    print('\tMEDIAN', ms(statistics.median(numbers)))
    print('\tMEAN  ', ms(statistics.mean(numbers)))
    print('\tSTDEV ', ms(statistics.stdev(numbers)))

Yeah, it's a bit ugly but it works. Here's the output:

FUNCTION: f1 Used 111,475 times
    BEST   0.01300ms
    MEDIAN 0.01500ms
    MEAN   0.01685ms
    STDEV  0.00706ms
FUNCTION: f2 Used 111,764 times
    BEST   0.00100ms
    MEDIAN 0.00200ms
    MEAN   0.00197ms
    STDEV  0.00167ms
FUNCTION: f3 Used 111,362 times
    BEST   0.00300ms
    MEDIAN 0.00400ms
    MEAN   0.00409ms
    STDEV  0.00225ms

In summary:

  • f1: 0.01300 milliseconds
  • f2: 0.00100 milliseconds
  • f3: 0.00300 milliseconds

Or, if you compare to the slowest (f1):

  • f1: baseline
  • f2: 13 times faster
  • f3: 6 times faster

UPDATE

If you know with confidence that you don't want or need timezone aware datetime instances, you can use csiso8601.parse_datetime_unaware instead.

from the README:

"Please note that it takes more time to parse aware datetimes, especially if they're not in UTC. If you don't care about time zone information, use the parse_datetime_unaware method, which will discard any time zone information and is faster."

In my benchmark the strings I use look like this 2017-09-21T12:54:24.000Z. I added another function to the benmark that uses ciso8601.parse_datetime_unaware and it clocked in at the exact same time as f2.

The impressive first-meaningful-paint improvement of using minimalcss

April 24, 2018
3 comments Web development, JavaScript

tl;dr; The critical CSS solution, using minimalcss, yields a 40% improvement in First Meaningful Paint and 90% improvement in the Time to Start Render.

About a month ago I enabled minimalcss here on my personal blog to properly test it in production. In that blog post I only looked at the difference in file size. This time, I'm testing the impact of it using Webpagetest.org. I picked two blog posts (that don't have images), but both have some syntax highlighting of code CSS. One and Two. Doesn't matter what's in those two pages but they're relatively similar in shape and size.

In three Webpagetest.org experiments I compare, on 4G Chome...

  1. both optimized
  2. first one optimized only
  3. second one optimized only
  4. both not optimized

Note! In the following screenshots from Webpagetest the "Thumbnail interval" is set to 0.5 seconds.

Note2! These pages used HTTP/2 and the CSS stylesheets are loaded from a CDN.

BOTH pages optimized

Both pages optimized

Timings both optimized

They are about 0.3 seconds apart in their first meaningful paint.

This is the baseline comparison. It's not perfectly the same first-meaningful-paint but close enough. The point is what difference it makes later.

Only FIRST page optimized

First page optimized

Timings first page optimized

The optimized page is 0.7 seconds faster.

Only SECOND page optimized

Second page optimized

Timings second page optimized

The optimized page is 1.4 seconds faster.

Bonus! Both pages NOT optimized

Just to check that it all holds up. Here, both pages are compared with out the critical CSS optimization.

Both pages NOT optimized

Timings both NOT optimized

If we roughly average out the first paint on the sample where both were optimized (2.1 seconds), this time it's 2.8 seconds. So the optimization of the critical CSS with minimalcss roughly makes the first paint makes it 40% faster.

Discussion

In Webpagetest.org the "First Meaningful Paint" is just one of many ways of measuring "success". I put that last word in quotation marks because this stuff is not trivial. Just because you manage to show anything to the user doesn't necessarily mean the user is happy if the user can't do what they want to do. And if you front-load a bunch of things with every trick in the book, you might have a lot of load in the background that might affect the scrolling with yank or flashing content.

I never really know which one to live by as a measure of success but the visual comparison timeline from Webpagetest definitely is fruitful and easy to understand. There is also "First Contentful Paint" which shows a slightly bigger difference between the optimized and the not-optimized pages.

For now, I'm going to call this a success. Adding minimalcss was a mixed bag of challenges. The execution of the script, on a server, requires the right amount of tooling and safeguards. After all, it depends on real web browser running inside a web server spawned from a background asynchronous message queue task with retries, logging, and metrics.

For one thing, if you just look at the visual comparisons and focus only on the rendered title the difference is not 40% faster. It's about 100% faster. That difference is explained by the word "meaningful" in First Meaningful Paint. The rest of the content is at the mercy of the banner ad and the remote loaded web fonts. For example, if you instead compare the "Time to Start Render" average timings of the optimized pages was 1.6 seconds vs 3.1 seconds for the not optimized pages. In other words, the improvement of First Meaningul Paint was 40% and the improvement of Time to Start Render was 93%.

Lastly, remember that what minimalcss (and other critical path CSS optimization tools) does is that it copies CSS from the .css files and includes it in the HTML document. That copying means the HTML document weighs 27KB more and still it wins.

Best EXPLAIN ANALYZE benchmark script

April 19, 2018
0 comments Python, PostgreSQL

tl;dr; Use best-explain-analyze.py to benchmark a SQL query in Postgres.

I often benchmark SQL by extracting the relevant SQL string, prefix it with EXPLAIN ANALYZE, putting it into a file (e.g. benchmark.sql) and then running psql mydatabase < benchmark.sql. That spits out something like this:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using main_song_ca949605 on main_song  (cost=0.43..237.62 rows=1 width=4) (actual time=1.586..1.586 rows=0 loops=1)
   Index Cond: (artist_id = 27451)
   Filter: (((name)::text % 'Facing The Abyss'::text) AND (id <> 2856345))
   Rows Removed by Filter: 170
 Planning time: 3.335 ms
 Execution time: 1.701 ms
(6 rows)

Cool. So you study the steps of the query plan and look for "Seq Scan" and various sub-optimal uses of heaps and indices etc. But often, you really want to just look at the Execution time milliseconds number. Especially if you might have to slightly different SQL queries to compare and contrast.

However, as you might have noticed, the number on the Execution time varies between runs. You might think nothing's changed but Postgres might have warmed up some internal caches or your host might be more busy or less busy. To remedy this, you run the EXPLAIN ANALYZE select ... a couple of times to get a feeling for an average. But there's a much better way!

best-explain-analyze.py

Check this out: best-explain-analyze.py

Download it into your ~/bin/ and chmod +x ~/bin/best-explain-analyze.py. I wrote it just this morning so don't judge!

Now, when you run it it runs that thing 10 times (by default) and reports the best Execution time, its mean and its median. Example output:

▶ best-explain-analyze.py songsearch dummy.sql
EXECUTION TIME
    BEST    1.229ms
    MEAN    1.489ms
    MEDIAN  1.409ms
PLANNING TIME
    BEST    1.994ms
    MEAN    4.557ms
    MEDIAN  2.292ms

The "BEST" is an important metric. More important than mean or median.

Raymond Hettinger explains it better than I do. His context is for benchmarking Python code but it's equally applicable:

"Use the min() rather than the average of the timings. That is a recommendation from me, from Tim Peters, and from Guido van Rossum. The fastest time represents the best an algorithm can perform when the caches are loaded and the system isn't busy with other tasks. All the timings are noisy -- the fastest time is the least noisy. It is easy to show that the fastest timings are the most reproducible and therefore the most useful when timing two different implementations."

Efficient many-to-many field lookup in Django REST Framework

April 11, 2018
1 comment Python, Django, PostgreSQL

The basic setup

Suppose you have these models:


from django.db import models


class Category(models.Model):
    name = models.CharField(max_length=100)


class Blogpost(models.Model):
    title = models.CharField(max_length=100)
    categories = models.ManyToManyField(Category)

Suppose you hook these up Django REST Framework and list all Blogpost items. Something like this:


# urls.py
from rest_framework import routers
from . import views


router = routers.DefaultRouter()
router.register(r'blogposts', views.BlogpostViewSet)

# views.py
from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer

What's the problem?

Then, if you execute this list (e.g. curl http://localhost:8000/api/blogposts/) what will happen, on the database, is something like this:


SELECT "app_blogpost"."id", "app_blogpost"."title" FROM "app_blogpost";

SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1025;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 193;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 757;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 853;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1116;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1126;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 964;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 591;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1112;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1034;
...

Obviously, it depends on how you define that serializers.BlogpostSerializer class, but basically, as it loops over the Blogpost, for each and every one, it needs to make a query to the many-to-many table (app_blogpost_categories in this example).

That's not going to be performant. In fact, it might be dangerous on your database if the query of blogposts gets big, like requesting a 100 or 1,000 records. Fetching 1,000 rows from the app_blogpost table might be cheap'ish but doing 1,000 selects with JOIN is never going to be cheap. It adds up horribly.

How you solve it

The trick is to only do 1 query on the many-to-many field's table, 1 query on the app_blogpost table and 1 query on the app_category table.

First you have to override the ViewSet.list method. Then, in there you can do exactly what you need.

Here's the framework for this change:


# views.py
from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    # queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer

    def get_queryset(self):
        # Chances are, you're doing something more advanced here 
        # like filtering.
        Blogpost.objects.all().order_by('date')

    def list(self, request, *args, **kwargs):
        response = super().list(request, *args, **kwargs)
        # Where the magic happens!

        return response

Next, we need to make a mapping of all Category.id -1-> Category.name. But we want to make sure we do only on the categories that are involved in the Blogpost records that matter. You could do something like this:


category_names = {}
for category in Category.objects.all():
    category_names[category.id] = category.name

But to avoid doing a lookup of category names for those you never need, use the query set on Blogpost. I.e.


qs = self.get_queryset()
all_categories = Category.objects.filter(
    id__in=Blogpost.categories.through.objects.filter(
        blogpost__in=qs
    ).values('category_id')
)
category_names = {}
for category in all_categories:
    category_names[category.id] = category.name

Now you have a dictionary of all the Category IDs that matter.

Note! The above "optimization" assumes that it's worth it. Meaning, if the number of Category records in your database is huge, and the Blogpost queryset is very filtered, then it's worth only extracting a subset. Alternatively, if you only have like 100 different categories in your database, just do the first variant were you look them up "simplestly" without any fancy joins.

Next, is the mapping of Blogpost.id -N-> Category.name. To do that you need to build up a dictionary (int to list of strings). Like this:


categories_map = defaultdict(list)
for m2m in Blogpost.categories.through.objects.filter(blogpost__in=qs):
    categories_map[m2m.blogpost_id].append(
        category_names[m2m.category_id]
    )

So what we have now is a dictionary whose keys are the IDs in self.get_queryset() and each value is a list of a strings. E.g. ['Category X', 'Category Z'] etc.

Lastly, we need to put these back into the serialized response. This feels a little hackish but it works:


for each in response.data:
    each['categories'] = categories_map.get(each['id'], [])

The whole solution looks something like this:


# views.py
from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    # queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer

    def get_queryset(self):
        # Chances are, you're doing something more advanced here 
        # like filtering.
        Blogpost.objects.all().order_by('date')

    def list(self, request, *args, **kwargs):
        response = super().list(request, *args, **kwargs)
        qs = self.get_queryset()
        all_categories = Category.objects.filter(
            id__in=Blogpost.categories.through.objects.filter(
                blogpost__in=qs
            ).values('category_id')
        )
        category_names = {}
        for category in all_categories:
            category_names[category.id] = category.name

        categories_map = defaultdict(list)
        for m2m in Blogpost.categories.through.objects.filter(blogpost__in=qs):
            categories_map[m2m.blogpost_id].append(
                category_names[m2m.category_id]
            )

        for each in response.data:
            each['categories'] = categories_map.get(each['id'], [])

        return response

It's arguably not very pretty but doing 3 tight queries instead of doing as many queries as you have records is much better. O(c) is better than O(n).

Discussion

Perhaps the best solution is to not run into this problem. Like, don't serialize any many-to-many fields.

Or, if you use pagination very conservatively, and only allow like 10 items per page then it won't be so expensive to do one query per every many-to-many field.

hashin 0.12.0 is much much faster

March 20, 2018
0 comments Python

tl;dr; The new 0.12.0 version of hashin is between 6 and 30 times faster.

Version 0.12.0 is exciting because it switches from using https://pypi.python.org/pypi/<package>/json to https://pypi.org/pypi/<package>/json so it's using the new PyPI. I only last week found out about the JSON containing .digest.sha256 as part of the JSON even though apparently it's been there for almost a year!

Prior to version 0.12.0, what hashin used to do is download every tarball and .whl file and run pip on it, in Python, to get the checksum hash. Now, if you use the default sha256, that checksum value is immediately available right there in the JSON, for every file per release. This is especially important for binary packages (lxml for example) where it has to download a lot.

To test this, I cleared my temp directory of any previously downloaded Django-* and lxml-* files and used hashin 0.11.5 to fill a requirements.txt for Django and lxml:

▶ hashin --version
0.11.5
▶ time hashin Django
hashin Django  0.48s user 0.14s system 12% cpu 5.123 total
▶ time hashin lxml
hashin lxml  1.61s user 0.59s system 8% cpu 25.361 total

In other words, 5.1 seconds to get the hashes for Django and 25.4 seconds for lxml.
Now, let's do it with the new 0.12.0

▶ hashin --version
0.12.0
▶ mv requirements.txt 0.11.5-requirements.txt ; touch requirements.txt
▶ time hashin Django
hashin Django  0.34s user 0.06s system 46% cpu 0.860 total
▶ time hashin lxml
hashin lxml  0.35s user 0.06s system 44% cpu 0.909 total

So, instead of 5.1 seconds, now it only takes 0.9 seconds. And instead of 25.4 seconds, now it only takes 0.9 seconds.

Yay!

Note, the old code that downloads and runs pip is still there. It kicks in when you request a digest checksum that is not included in the JSON. For example...:

▶ hashin --version
0.12.0
▶ time hashin --algorithm sha512 lxml
hashin --algorithm sha512 lxml  1.56s user 0.64s system 5% cpu 38.171 total

(The reason this took 38 seconds instead of 25 in the run above is because of the unpredictability of the speed of my home broadband)

Worlds simplest web scraper bot in Python

March 16, 2018
1 comment Python

I just needed a little script to click around a bunch of pages synchronously. It just needed to load the URLs. Not actually do much with the content. Here's what I hacked up:


import random
import requests
from pyquery import PyQuery as pq
from urllib.parse import urljoin


session = requests.Session()
urls = []


def run(url):
    if len(urls) > 100:
        return
    urls.append(url)
    html = session.get(url).decode('utf-8')
    try:
        d = pq(html)
    except ValueError:
        # Possibly weird Unicode errors on OSX due to lxml.
        return
    new_urls = []
    for a in d('a[href]').items():
        uri = a.attr('href')
        if uri.startswith('/') and not uri.startswith('//'):
            new_url = urljoin(url, uri)
            if new_url not in urls:
                new_urls.append(new_url)
    random.shuffle(new_urls)
    [run(x) for x in new_urls]

run('http://localhost:8000/')

If you want to do this when the user is signed in, go to the site in your browser, open the Network tab on your Web Console and copy the value of the Cookie request header.
Change that session.get(url) to something like:


html = session.get(url, headers={'Cookie': 'sessionid=i49q3o66anhvpdaxgldeftsul78bvrpk'}).decode('utf-8')

Now it can spider bot around on your site for a little while as if you're logged in.

Dirty. Simple. Fast.

filterToQueryString - JavaScript function to turn current filter into a query string

March 15, 2018
1 comment Web development, JavaScript, React

tl;dr; this function:


export const filterToQueryString = (filterObj, overrides) => {
  const copy = Object.assign(overrides || {}, filterObj)
  const searchParams = new URLSearchParams()
  Object.entries(copy).forEach(([key, value]) => {
    if (Array.isArray(value) && value.length) {
      value.forEach(v => searchParams.append(key, v))
    } else if (value) {
      searchParams.set(key, value)
    }
  })
  searchParams.sort()
  return searchParams.toString()
}

I have a React project that used to use query-string to serialize and deserialize objects between React state and URL query strings. Yesterday version 6.0.0 came out and now I'm getting this error during yarn run build:

yarn run v1.5.1
$ react-scripts build
Creating an optimized production build...
Failed to compile.

Failed to minify the code from this file: 

    ./node_modules/query-string/index.js:8 

Read more here: http://bit.ly/2tRViJ9

error An unexpected error occurred: "Command failed.
Exit code: 1

Perhaps this is the wake up call to switch to URLSearchParams (documentation here). Yes it is. Let's do it.

My use case is that I store a dictionary of filters in React this.state. The filter object is updated by submitting a form that looks like this:

Fitler form

Since the form inputs might be empty strings my filter dictionary in this.state might look like this:


{
  user: '@mozilla.com', 
  created_at: 'yesterday', 
  size: '>= 1m, <300G', 
  uploaded_at: ''
}

What I want that to become is: created_at=yesterday&size=>%3D+1m%2C+<300G&user=%40mozilla.com
So it's important to be able to skip falsy values (empty strings or possibly empty arrays).

Sometimes there are other key-values that needs to be added that isn't part of what the user chose. So it needs to be easy to squeeze in additional key-values. Here's the function:


export const filterToQueryString = (filterObj, overrides) => {
  const copy = Object.assign(overrides || {}, filterObj)
  const searchParams = new URLSearchParams()
  Object.entries(copy).forEach(([key, value]) => {
    if (Array.isArray(value) && value.length) {
      value.forEach(v => searchParams.append(key, v))
    } else if (value) {
      searchParams.set(key, value)
    }
  })
  searchParams.sort()
  return searchParams.toString()
}

I use it like this:


_fetchUploadsNewCountLoop = () => {
  const qs = filterToQueryString(this.state.filter, {
    created_at: '>' + this.state.latestUpload
  })
  const url = '/api/uploads?' + qs
  ...
  fetch(...)
}

UPDATE - May 2018

In the original blog post (now edited and corrected) I copied the wrong code and didn't discover the subtle mistake until now.
What was wrong as the order of the arguments to Object.assign().

Wrong


const copy = Object.assign(filterObj, overrides || {})

Correct


const copy = Object.assign(overrides || {}, filterObj)

The old version was dangerous because it mutated the filterObj passed in. So if you did something like


const qs = filterToQueryString(this.state.filter, {
  created_at: '>' + this.state.latestUpload
})

it would potentially mutate this.state.filter which isn't desirable.

Now using minimalcss

March 12, 2018
0 comments Python, Web development, JavaScript, Node

tl;dr; minimalcss is much better than mincss to slew out the minimal CSS your page needs to render. More accurate and more powerful features. This site now uses minimalcss in inline the minimum CSS needed to render the page.

I started minimalcss back in August 2017 and its goal was ultimately to replace mincss.

The major difference between minimalcss and mincss isn't that one is Node and one is Python, but that minimalcss is based on a full headless browser to handle all the CSS downloading and the proper rendering of the DOM. The other major difference is that mincss was based in regular expressions to analyze the CSS and minimalcss is based on proper abstract syntax tree ("AST") implemented by csso.

Because minimalcss is AST based, it can do a lot more. Smarter. For example, it's able to analyze the CSS to correctly and confidently figure out if any/which keyframe animations and font-face at-rules are actually needed.
Also, because minimalcss is based on csso, when it minifies the CSS it's able to restructure the CSS in a safe and smart way. I.e. p { color: blue; } h2 { color: blue; } becomes p,h2{color:blue}.

So, now I use minimalcss here on this blog. The pages are rendered in Django and a piece of middleware sniffs all outgoing HTML responses and depending on the right conditions it dumps the HTML as a file on disk as path/in/url/index.html. Then, that newly created file is sent to a background worker in Celery which starts post-processing it. Every index.html file is accompanied with the full absolute URL that it belongs to and that's the URL that gets sent to minimalcss which returns the absolute minimal CSS the page needs to load and lastly, a piece of Python script basically does something like this:

From...


<!-- before -->
<link rel="stylesheet" href="/file.css"/>

To...


<!-- after -->
<noscript><link rel="stylesheet" href="/file.css"/></noscript>
<style> ... /* minimal CSS selectors for rendering from /file.css */ ... </style>

There is also a new JavaScript dependency which is the cssrelpreload.js from the loadCSS project. So all the full (original) CSS is still downloaded and inserted into the CSSOM but it happens much later which ultimately means the page can be rendered and useful much sooner than if we'd have to wait to download and parse all of the .css URLs.

I can go into more details if there's interest and others want to do this too. Because this site is all Python and minimalcss is all Node, the integration is done over HTTP on localhost with minimalcss-server.

The results

Unfortunately, this change was mixed in with other smaller optimizations that makes the comparison unfair. (Hey! my personal blog is just a side-project after all). But I downloaded a file before and after the upgrade and compared:

ls -lh *.html
-rw-r--r--  1 peterbe  wheel    19K Mar  7 13:22 after.html
-rw-r--r--  1 peterbe  wheel    96K Mar  7 13:21 before.html

If I extract out the inline style block from both pages and compare it looks like this:
https://gist.github.com/peterbe/fc2fdddd5721fb35a99dc1a50c2b5311

So, downloading the initial HTML document is now 19KB instead of previous 96KB. And visually there's absolutely no difference.

Granted, in the after.html version, a piece of JavaScript kicks in and downloads /static/css/base.min.91f6fc577a60.css and /static/css/base-dynamic.min.e335b9bfa0b1.css from the CDN. So you have to download these too:

ls -lh *.css.gz
-rw-r--r--  1 peterbe  wheel   5.0K Mar  7 10:24 base-dynamic.min.e335b9bfa0b1.css.gz
-rw-r--r--  1 peterbe  wheel    95K Mar  7 10:24 base.min.91f6fc577a60.css.gz

The reason the difference appears to be huge is because I changed a couple of other things around the same time. Sorry. For example, certain DOM nodes were rendered as HTML but made hidden until some jQuery script made it not hidden anymore. For example, the "dimmer" effect over a comment textarea after you hit the submit button. Now, I've changed the jQuery code to build up the DOM when it needs it rather than relying on it being there (hidden). This means that certain base64 embedded font-faces are no longer needed in the minimal CSS payload.

Why this approach is better

So the old approach was to run mincss on the HTML and inject that as an inline style block and throw away the original (relevant) <link rel="stylesheet" href="..."> tags.
That had the annoying drawback that there was CSS in the stylesheets that I knew was going to be needed by some XHR or JavaScript later. For example, if you post a comment some jQuery code changes the DOM and that new DOM needs these CSS selectors later. So I had to do things like this:


.project a.perm { /* no mincss */
    font-size: 0.7em;
    padding-left: 8px;
}
.project a.perm:link { /* no mincss */
    color: rgb(151,151,151);
}
.project a.perm:hover { /* no mincss */
    color: rgb(51,51,51);
}

This was to inform mincss to leave those untouched even though no DOM node uses them right now. With minimalcss this is no longer needed.

What's next?

Keep working on minimalcss and make it even better.

Also, the scripting I used to modify the HTML file is a hack and should probably be put into the minimalcss project.

Last but not least, every time I put in some effort to web performance optimize my blog pages my Google ranking goes up and I usually see an increase in Google referrals in my Google Analytics because it's pretty obvious that Google loves fast sites. So I'm optimistically waiting for that effect.