Optimize inlined SVG on developer.mozilla.org

April 4, 2019
0 comments Web development, Web Performance

tl;dr We could make the initial HTML document 40% smaller if moved from inline SVG to external, optimized, .svg static assets. But there are lots of caveats unless the SVG can be used as an image.

One of the many goals of MDN Web Docs this year is to make it faster. That makes users happier and as a side-effect, it makes Google happier. And hopefully, being faster will mean Google ranks us higher.

I'm still new to the MDN code base and there are many things we can do. One thing I noticed is that the site uses inline SVG. E.g.


<a href="/en-US/docs/Learn">References &amp; Guides
    <svg class="icon icon-caret-down" xmlns="http://www.w3.org/2000/svg" width="16" height="28" viewBox="0 0 16 28">
      <path d="M16 11a.99.99 0 0 1-.297.703l-7 7C8.516 18.89 8.265 19 8 19s-.516-.109-.703-.297l-7-7A.996.996 0 0 1 0 11c0-.547.453-1 1-1h14c.547 0 1 .453 1 1z"/>
    </svg>
</a>

The site uses HTTP/2 so the argument of reducing the number of requests is not valid. Well, with caveats. Browser support for HTTP/2 is getting really good. Definitely good enough to make it worth betting on.
It used to be that there's a trade-off for making static assets external: you can potentially avoid downloads, at all, by browser caching and the initial HTML document becomes smaller. But all, at the cost of more requests.

There are other, more subtle, differences with SVG. For example, the content of the SVG might depend on dynamic data. There might be others that I'm not aware and I'm quick to admit that I don't know much about use and stuff but this article might be full of those details.

The Experiment

I wrote a script that opens https://developer.mozilla.org/en-US/ and extracts every <svg> tag and puts them on disk. E.g. svg.icon.icon-smile_fbf6292.svg. They have a hash checksum on the content in case two <svg>s are different (but with the same classList). Then it does the following:

  1. Run svgo on each .svg to create a .min.svg.
  2. Run zopfli on each .min.svg to create a .min.svg.gz
  3. Run brotli on each .min.svg to create a .min.svg.br
  4. Sum all inline ones total size, sum the size of all .min.svg, sum the size of all .min.svg.gz, sum the size of all .min.svg.br.

Results

Technique Number Total Bytes
Inline 27 22,142 (21.6KB)
Optimized with svgo 15 14,566 (14.2KB)
Zopfli compressed 15 6,236 (6.1KB)
Brotli compressed 15 5,789 (5.7KB)

Conclusions and Caveats

For every single MDN page, we stand to make the initial HTML document 22KB smaller. Every time. Most web developers I know often Google for something and end up on MDN and do so frequently enough that there's a good chance for a warm browser cache.

But! This 22KB is uncompressed. Since the HTML documents are served gzipped, at a ratio of about 1:4, the total inline SVGs is roughly 5.6KB. At the time of writing the MDN home page is 58,496 bytes decompressed and 14,570 bytes gzipped. So that means that we stand to potentially strip away 40% of the document size!

Second but! There are some non-trivial differences in usage of SVG. You can't simply replace...


<a href="/en-US/docs/Learn">References &amp; Guides
  <svg class="icon icon-caret-down" xmlns="http://www.w3.org/2000/svg" width="16" height="28" viewBox="0 0 16 28">
    <path d="M16 11a.99.99 0 0 1-.297.703l-7 7C8.516 18.89 8.265 19 8 19s-.516-.109-.703-.297l-7-7A.996.996 0 0 1 0 11c0-.547.453-1 1-1h14c.547 0 1 .453 1 1z"/>
  </svg>
</a>

...with...


<a href="/en-US/docs/Learn">References &amp; Guides
  <img src="/static/icon-caret.914d0e4.min.svg">
</a>

(Compare this and this)

You can, instead, use <svg><use xlink:href="/static/icon-caret.914d0e4.min.svg".></svg> but it comes with its own host of challenges and problems (styling and IE support) and you still need this <svg> tag to do the wrapping in the first place which adds bytes.

It's not always worth compressing tiny static assets. And it might be worth experimenting with what the CPU cost is for a low-performance mobile device to decompress the asset versus just eating the extra network download cost of leaving it uncompressed.

HTTP/2 is great in that it allows the browser to download external assets earlier, on the same open connection, as the initial HTML document. But it's not without risks and costs that need to be carefully considered.

Best way to count distinct indexed things in PostgreSQL

March 21, 2019
3 comments Django, PostgreSQL

tl;dr; SELECT COUNT(*) FROM (SELECT DISTINCT my_not_unique_indexed_column FROM my_table) t;

I have a table that looks like this:

songsearch=# \d main_songtexthash
            Table "public.main_songtexthash"
  Column   |           Type           | Collation | Nullable |
-----------+--------------------------+-----------+----------+
 id        | integer                  |           | not null |
 text_hash | character varying(32)    |           | not null |
 created   | timestamp with time zone |           | not null |
 modified  | timestamp with time zone |           | not null |
 song_id   | integer                  |           | not null |
Indexes:
    "main_songtexthash_pkey" PRIMARY KEY, btree (id)
    "main_songtexthash_song_id_key" UNIQUE CONSTRAINT, btree (song_id)
    "main_songtexthash_text_hash_c2771f1f" btree (text_hash)
    "main_songtexthash_text_hash_c2771f1f_like" btree (text_hash varchar_pattern_ops)
Foreign-key constraints:
    ...snip...

And the data looks something like this:

songsearch=# select text_hash, song_id from main_songtexthash limit 10;
            text_hash             | song_id
----------------------------------+---------
 6f98e1945e64353bead9d6ab47a7f176 | 2565031
 0c6662363aa4a340fea5efa24c98db76 |  486091
 a25af539b183cbc338409c7acecc6828 |     212
 5aaf561b38c251e7d863aae61fe1363f | 2141077
 6a221df60f7cbb8a4e604f87c9e3aec0 |  245186
 d2a0b5b3b33cdf5e03a75cfbf4963a6f | 1453382
 95c395dd78679120269518b19187ca80 |  981402
 8ab19b32b3be2d592aa69e4417b732cd |  616848
 8ab19b32b3be2d592aa69e4417b732cd |  243393
 01568f1f57aeb7a97e2544978fc93b4c |     333
(10 rows)

If you look carefully, you'll notice that every song_id has a different text_hash except two of them.
Song IDs 616848 and 243393 both have the same text_hash of value 8ab19b32b3be2d592aa69e4417b732cd.

Also, if you imagine this table only has 10 rows, you could quickly and easily conclude that there are 10 different song_id but 9 different distinct text_hash. However, how do you do this counting if the tables are large??

The Wrong Way

songsearch=# select count(distinct text_hash) from main_songtexthash;
  count
---------
 1825983
(1 row)

And the explanation and cost analysis is:

songsearch=# explain analyze select count(distinct text_hash) from main_songtexthash;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=44942.09..44942.10 rows=1 width=8) (actual time=40029.225..40029.226 rows=1 loops=1)
   ->  Seq Scan on main_songtexthash  (cost=0.00..40233.87 rows=1883287 width=33) (actual time=0.029..193.653 rows=1879521 loops=1)
 Planning Time: 0.059 ms
 Execution Time: 40029.250 ms
(4 rows)

Oh noes! A Sec Scan! Run!

The Right Way

Better explained in this blog post but basically, cutting to the chase, here's how you count on an indexed field:

songsearch=# select count(*) from (select distinct text_hash from main_songtexthash) t;
  count
---------
 1825983
(1 row)

And the explanation and cost analysis is:

songsearch=# explain analyze select count(*) from (select distinct text_hash from main_songtexthash) t;
                                                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=193871.20..193871.21 rows=1 width=8) (actual time=4894.555..4894.556 rows=1 loops=1)
   ->  Unique  (cost=0.55..172861.54 rows=1680773 width=33) (actual time=0.075..4704.741 rows=1825983 loops=1)
         ->  Index Only Scan using main_songtexthash_text_hash_c2771f1f on main_songtexthash  (cost=0.55..168153.32 rows=1883287 width=33) (actual time=0.074..4132.822 rows=1879521 loops=1)
               Heap Fetches: 1879521
 Planning Time: 0.082 ms
 Execution Time: 4894.581 ms
(6 rows)

Same exact result but ~5s instead of ~40s. I'll take that, thank you very much.

The Django Way

As a bonus: Django is smart. Here's how they do it:


>>> SongTextHash.objects.values('text_hash').distinct().count()
1825983

And, the SQL it generates to make that count looks very familiar:


SELECT COUNT(*) FROM (SELECT DISTINCT "main_songtexthash"."text_hash" AS Col1 FROM "main_songtexthash") subquery

Conclusion

  • Avoid "sequential scans" like the plague if you care about performance (...or not just killing your resources).
  • Trust in Django.

Format numbers with numberWithCommas() or Number.toLocaleString()

March 5, 2019
1 comment Web development, JavaScript

In a highly unscientific survey of exactly 2 French native friends, I asked them what they think about formatting large numbers the "French way" versus doing it the "English way". In particular, if the rest of content/app is English, would it be jarring if the formatting of numbers was French. Both Adrian and Mathieu said they prefer displaying the number the French way even if the app/content is French.

If you have an English browser opening https://codepen.io/peterbe/pen/xBRGoN means it's going to display the two numbers the same way. But if you have a French locale in your browser it'll look like this:

French

Number.toLocaleString() is now universally supported so that's no longer a worry.

For years I was using a function like this:


/* http://stackoverflow.com/a/2901298 */
function numberWithCommas(x) {
  var parts = x.toString().split('.');
  parts[0] = parts[0].replace(/\B(?=(\d{3})+(?!\d))/g, ',');
  return parts.join('.');
}

numberWithCommas(100000000);
// "100,000,000"

jsperf
and it works and is reasonably fast too but it's so tempting to not use and instead stand on the shoulder-of-browsers to supply this functionality instead. But consider the alternative:


(100000000).toLocaleString();
// "100,000,000"

The thing that's always worried me is; What will someone's reaction be if the texts are in one locale and the formatting of numbers (and dates, etc!) are in another locale?

All 2 of the people I asked say they don't mind the mixing but admit that it's weird but that ultimately they prefer "their" format.

If you're non-English browser, what do you prefer? If you're a web usability expert, please, too, drop a comment to share what you think.

Django ORM optimization story on selecting the least possible

February 22, 2019
16 comments Web development, Django, Python, PostgreSQL

This an optimization story that should not surprise anyone using the Django ORM. But I thought I'd share because I have numbers now! The origin of this came from a real requirement. For a given parent model, I'd like to extract the value of the name column of all its child models, and the turn all these name strings into 1 MD5 checksum string.

Variants

The first attempted looked like this:


artist = Artist.objects.get(name="Bad Religion")
names = []
for song in Song.objects.filter(artist=artist):
    names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

The SQL used to generate this is as follows:


SELECT "main_song"."id", "main_song"."artist_id", "main_song"."name", 
"main_song"."text", "main_song"."language", "main_song"."key_phrases", 
"main_song"."popularity", "main_song"."text_length", "main_song"."metadata", 
"main_song"."created", "main_song"."modified", 
"main_song"."has_lastfm_listeners", "main_song"."has_spotify_popularity" 
FROM "main_song" WHERE "main_song"."artist_id" = 22729;

Clearly, I don't need anything but just the name column, version 2:


artist = Artist.objects.get(name="Bad Religion")
names = []
for song in Song.objects.filter(artist=artist).only("name"):
    names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

Now, the SQL used is:


SELECT "main_song"."id", "main_song"."name" 
FROM "main_song" WHERE "main_song"."artist_id" = 22729;

But still, since I don't really need instances of model class Song I can use the .values() method which gives back a list of dictionaries. This is version 3:


names = []
for song in Song.objects.filter(artist=a).values("name"):
    names.append(song["name"])
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

This time Django figures it doesn't even need the primary key value so it looks like this:


SELECT "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729;

Last but not least; there is an even faster one. values_list(). This time it doesn't even bother to map the column name to the value in a dictionary. And since I only need 1 column's value, I can set flat=True. Version 4 looks like this:


names = []
for name in Song.objects.filter(artist=a).values_list("name", flat=True):
    names.append(name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

Same SQL gets used this time as in version 3.

The benchmark

Hopefully this little benchmark script speaks for itself:


from songsearch.main.models import *

import hashlib


def f1(a):
    names = []
    for song in Song.objects.filter(artist=a):
        names.append(song.name)
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


def f2(a):
    names = []
    for song in Song.objects.filter(artist=a).only("name"):
        names.append(song.name)
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


def f3(a):
    names = []
    for song in Song.objects.filter(artist=a).values("name"):
        names.append(song["name"])
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


def f4(a):
    names = []
    for name in Song.objects.filter(artist=a).values_list("name", flat=True):
        names.append(name)
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


artist = Artist.objects.get(name="Bad Religion")
print(Song.objects.filter(artist=artist).count())

print(f1(artist) == f2(artist))
print(f2(artist) == f3(artist))
print(f3(artist) == f4(artist))

# Reporting
import time
import random
import statistics

functions = f1, f2, f3, f4
times = {f.__name__: [] for f in functions}

for i in range(500):
    func = random.choice(functions)
    t0 = time.time()
    func(artist)
    t1 = time.time()
    times[func.__name__].append((t1 - t0) * 1000)

for name in sorted(times):
    numbers = times[name]
    print("FUNCTION:", name, "Used", len(numbers), "times")
    print("\tBEST", min(numbers))
    print("\tMEDIAN", statistics.median(numbers))
    print("\tMEAN  ", statistics.mean(numbers))
    print("\tSTDEV ", statistics.stdev(numbers))

I ran this on my PostgreSQL 11.1 on my MacBook Pro with Django 2.1.7. So the database is on localhost.

The results

276
True
True
True
FUNCTION: f1 Used 135 times
    BEST 6.309986114501953
    MEDIAN 7.531881332397461
    MEAN   7.834429211086697
    STDEV  2.03779968066591
FUNCTION: f2 Used 135 times
    BEST 3.039121627807617
    MEDIAN 3.7298202514648438
    MEAN   4.012803678159361
    STDEV  1.8498943539073027
FUNCTION: f3 Used 110 times
    BEST 0.9920597076416016
    MEDIAN 1.4405250549316406
    MEAN   1.5053835782137783
    STDEV  0.3523240470133114
FUNCTION: f4 Used 120 times
    BEST 0.9369850158691406
    MEDIAN 1.3251304626464844
    MEAN   1.4017681280771892
    STDEV  0.3391019435930447

Bar chart

Discussion

I guess the hashlib.md5("".join(names).encode("utf-8")).hexdigest() stuff is a bit "off-topic" but I checked and it's roughly 300 times faster than building up the names list.

It's clearly better to ask less of Python and PostgreSQL to get a better total time. No surprise there. What was interesting was the proportion of these differences. Memorize that and you'll be better equipped if it's worth the hassle of not using the Django ORM in the most basic form.

Also, do take note that this is only relevant in when dealing with many records. The slowest variant (f1) takes, on average, 7 milliseconds.

Summarizing the difference with percentages compared to the fastest variant:

  • f1 - 573% slower
  • f2 - 225% slower
  • f3 - 6% slower
  • f4 - 0% slower

UPDATE Feb 25 2019

James suggested, although a bit "missing the point", that it could be even faster if all the aggregation is pushed into the PostgreSQL server and then the only thing that needs to transfer from PostgreSQL to Python is the final result.

By the way, name column in this particular benchmark, when concatenated into one big string, is ~4KB. So, with variant f5 it only needs to transfer 32 bytes which will/would make a bigger difference if the network latency is higher.

Here's the whole script: https://gist.github.com/peterbe/b2b7ed95d422ab25a65639cb8412e75e

And the results:

276
True
True
True
False
False
FUNCTION: f1 Used 92 times
    BEST 5.928993225097656
    MEDIAN 7.311463356018066
    MEAN   7.594626882801885
    STDEV  2.2027017044658423
FUNCTION: f2 Used 75 times
    BEST 2.878904342651367
    MEDIAN 3.3979415893554688
    MEAN   3.4774907430013022
    STDEV  0.5120246550765524
FUNCTION: f3 Used 88 times
    BEST 0.9310245513916016
    MEDIAN 1.1944770812988281
    MEAN   1.3105544176968662
    STDEV  0.35922655625999383
FUNCTION: f4 Used 71 times
    BEST 0.7879734039306641
    MEDIAN 1.1661052703857422
    MEAN   1.2262606284987758
    STDEV  0.3561764250427344
FUNCTION: f5 Used 90 times
    BEST 0.7929801940917969
    MEDIAN 1.0334253311157227
    MEAN   1.1836051940917969
    STDEV  0.4001442703048186
FUNCTION: f6 Used 84 times
    BEST 0.80108642578125
    MEDIAN 1.1119842529296875
    MEAN   1.2281338373819988
    STDEV  0.37146893005516973

Result: f5 is takes 0.793ms and (the previous "winner") f4 takes 0.788ms.

I'm not entirely sure why f5 isn't faster but I suspect it's because the dataset is too small for it all to matter.

Compare:

songsearch=# explain analyze SELECT "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using main_song_ca949605 on main_song  (cost=0.43..229.33 rows=56 width=16) (actual time=0.014..0.208 rows=276 loops=1)
   Index Cond: (artist_id = 22729)
 Planning Time: 0.113 ms
 Execution Time: 0.242 ms
(4 rows)

with...

songsearch=# explain analyze SELECT md5(STRING_AGG("main_song"."name", '')) AS "names_hash" FROM "main_song" WHERE "main_song"."artist_id" = 22729;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=229.47..229.48 rows=1 width=32) (actual time=0.278..0.278 rows=1 loops=1)
   ->  Index Scan using main_song_ca949605 on main_song  (cost=0.43..229.33 rows=56 width=16) (actual time=0.019..0.204 rows=276 loops=1)
         Index Cond: (artist_id = 22729)
 Planning Time: 0.115 ms
 Execution Time: 0.315 ms
(5 rows)

I ran these two SQL statements about 100 times each and recorded their best possible execution times:

1) The plain SELECT - 0.99ms
2) The STRING_AGG - 1.06ms

So that accounts from ~0.1ms difference only! Which kinda matches the results seen above. All in all, I think the dataset is too small to demonstrate this technique. But, considering the chance that the complexity might not be linear with the performance benefit, it's still interesting.

Even though this tangent is a big off-topic, it is often a great idea to push as much work into the database as you can if applicable. Especially if it means you can transfer a lot less data eventually.

Experimenting with Nginx worker_processes

February 14, 2019
0 comments Web development, Nginx, macOS, Linux

I have Nginx 1.15.8 installed with Homebrew on my macOS. By default the /usr/local/etc/nginx/nginx.conf it set to...:

worker_processes  1;

But, from the documentation, it says:

"The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it)." (bold emphasis mine)

What is the ideal number for me? The performance of Nginx on my laptop doesn't really matter. But for my side-projects it's important to have a fast Nginx since it serves static HTML and lots of static assets. However, on my personal servers I have a bunch of other resource hungry stuff going on that I know is more likely to need the resources, like Elasticsearch and uwsgi.

To figure this out, I wrote a benchmark program that requested a small index.html about 10,000 times across 10 concurrent clients with hey.

hey -n 10000 -c 10 http://peterbecom.local/plog/variable_cache_control/awspa

I ran this 10 times between changing the worker_processes in the nginx.conf file. Here's the output:

1 WORKER PROCESSES
BEST  : 13,607.24 reqs/s

2 WORKER PROCESSES
BEST  : 17,422.76 reqs/s

3 WORKER PROCESSES
BEST  : 18,886.60 reqs/s

4 WORKER PROCESSES
BEST  : 19,417.35 reqs/s

5 WORKER PROCESSES
BEST  : 19,094.18 reqs/s

6 WORKER PROCESSES
BEST  : 19,855.32 reqs/s

7 WORKER PROCESSES
BEST  : 19,824.86 reqs/s

8 WORKER PROCESSES
BEST  : 20,118.25 reqs/s

Or, as a graph:

Graph

Now note, this is done here on my MacBook Pro. Not on my Ubuntu DigitalOcean servers. For now, I just want to get a feeling for how these numbers correlate.

Conclusion

The benchmark isn't good enough. The numbers are pretty stable but I'm doing this on my laptop with multiple browsers idling, Slack, and Spotify running. Clearly, the throughput goes up a bit when you allocate more workers but if anything can be learned from this, start with going beyond 1 for a quick fix and from there start poking and more exhaustive benchmarks. And don't forget, if you have time to go deeper on this, to look at the combination of worker_connections and worker_processes.

create-react-app, SCSS, and Bulmaswatch

February 12, 2019
2 comments Web development, React, JavaScript

1. Create a create-react-app first:

create-react-app myapp

2. Enter it and install node-sass and bulmaswatch

cd myapp
yarn add bulma bulmaswatch node-sass

3. Edit the src/index.js to import index.scss instead:


-import "./index.css";
+import "./index.scss";

4. "Rename" the index.css file:

git rm src/index.css 
touch src/index.scss
git add src/index.scss

5. Now edit the src/index.scss to look like this:


@import "node_modules/bulmaswatch/darkly/bulmaswatch";

This assumes your favorite theme was the darkly one. You can obviously change that later.

6. Run the app:

BROWSER=none yarn start

7. Open the browser at http://localhost:3000

CRA start

That's it! However, the create-react-app default look doesn't expose any of the cool stuff that Bulma can style. So let's rewrite our src/App.js by copying the minimal starter HTML from the Bulma documentation. So make the src/App.js component look something like this:


class App extends Component {
  render() {
    return (
      <section className="section">
        <div className="container">
          <h1 className="title">Hello World</h1>
          <p className="subtitle">
            My first website with <strong>Bulma</strong>!
          </p>
        </div>
      </section>
    );
  }
}

Now it'll look like this:

Bulma starter template

Yes, it's not much but it's a great start. Over to you to take this to infinity and beyond!

Not So Secret Sauce

In the rushed instructions above the choice of theme was darkly. But what you need to do next is go to https://jenil.github.io/bulmaswatch/, click around and eventually pick the one you like. Suppose you like spacelab, then you just change that @import ... line to be:


@import "node_modules/bulmaswatch/spacelab/bulmaswatch";

TEMPORARY:

h1 {  
    color: red;
    font-size: 5em;
}

TEST CHANGE 3.

Optimize DOM selector lookups by pre-warming by selectors' parents

February 11, 2019
0 comments Web development, Node, Web Performance, JavaScript

tl;dr; minimalcss 0.8.2 introduces a 20% post-processing optimization by lumping many CSS selectors to their parent CSS selectors as a pre-emptive cache.

In minimalcss the general core of it is that it downloads a DOM tree, as HTML, parses it and parses all the CSS stylesheets associated. These might be from <link ref="stylesheet"> or <style> tags.
Once the CSS stylesheets are turned into an AST it loops over each and every CSS selector and asks a simple question; "Does this CSS selector exist in the DOM?". The equivalent is to open your browser's Web Console and type:

>>> document.querySelectorAll('div.foo span.bar b').length > 0
false

For each of these lookups (which is done with cheerio by the way), minimalcss reduces the CSS, as an AST, and eventually spits the AST back out as a CSS string. The only problem is; it's slow. In the case of view-source:https://semantic-ui.com/ in the CSS it uses, there are 6,784 of them. What to do?

First of all, there isn't a lot you can do. This is the work that needs to be done. But one thing you can do is be smart about which selectors you look at and use a "decision cache" to pre-emptively draw conclusions. So, if this is what you have to check:

  1. #example .alternate.stripe
  2. #example .theming.stripe
  3. #example .solid .column p b
  4. #example .solid .column p

As you process the first one you extract that the parent CSS selector is #example and if that doesn't exist in the DOM, you can efficiently draw conclusion about all preceeding selectors that all start with #example .... Granted, if they call exist you will pay a penalty of doing an extra lookup. But that's the trade-off that this optimization is worth.

Check out the comments where I tested a bloated page that uses Semantic-UI before and after. Instead of doing 3,285 of these document.querySelector(selector) calls, it's now able too come to the exact same conclusion with just 1,563 lookups.

Sadly, the majority of the time spent processing lies in network I/O and other overheads but this work did reduce something that used to take 6.3s (median) too 5.1s (median).

Hooks tip! Avoid infinite recursion in React.useEffect()

February 6, 2019
1 comment React, JavaScript

React 16.8.0 with Hooks was released today. A big deal. Executive summary; components as functions is all the rage now.

What used to be this:


class MyComponent extends React.Component {
  ...

  componentDidMount() {
    ...
  }
  componentDidUpdate() {
    ...
  }

  render() { STUFF }
}

...is now this:


function MyComponent() {
  ...

  React.useEffect(() => {
    ...
  })

  return STUFF
}

Inside the useEffect "side-effect callback" you can actually update state. But if you do, and this is no different that old React.Component.componentDidUpdate, it will re-run the side-effect callback. Here's a simple way to cause an infinite recursion:


// DON'T DO THIS

function MyComponent() {
  const [counter, setCounter] = React.useState(0);

  React.useEffect(() => {
    setCounter(counter + 1);
  })

  return <p>Forever!</p>
}

The trick is to pass a second argument to React.useEffect that is a list of states to exclusively run on.

Here's how to fix the example above:


function MyComponent() {
  const [counter, setCounter] = React.useState(0);
  const [times, setTimes] = React.useState(0);

  React.useEffect(
    () => {
      if (times % 3 === 0) {
        setCounter(counter + 1);
      }
    },
    [times]  // <--- THIS RIGHT HERE IS THE KEY!
  );

  return (
    <div>
      <p>
        Divisible by 3: {counter}
        <br />
        Times: {times}
      </p>
      <button type="button" onClick={e => setTimes(times + 1)}>
        +1
      </button>
    </div>
  );
}

You can see it in this demo.

Note, this isn't just about avoiding infinite recursion. It can also be used to fit your business logic and/or an optimization to avoid executing the effect too often.

Displaying fetch() errors and unwanted responses in React

February 6, 2019
0 comments Web development, React, JavaScript

tl;dr; You can use error instanceof window.Response to distinguish between fetch exceptions and fetch responses.

When you do something like...


const response = await fetch(URL);

...two bad things can happen.

  1. The XHR request fails entirely. I.e. there's not even a response with a HTTP status code.
  2. The response "worked" but the HTTP status code was not to your liking.

Either way, your React app needs to deal with this. Ideally in a not-too-clunky way. So here is one take on this challenge/opportunity which I hope can inspire you to extend it the way you need it to go.

The trick is to "clump" exceptions with responses. Then you can do this:


function ShowServerError({ error }) {
  if (!error) {
    return null;
  }
  return (
    <div className="alert">
      <h3>Server Error</h3>
      {error instanceof window.Response ? (
        <p>
          <b>{error.status}</b> on <b>{error.url}</b>
          <br />
          <small>{error.statusText}</small>
        </p>
      ) : (
        <p>
          <code>{error.toString()}</code>
        </p>
      )}
    </div>
  );
}

The greatest trick the devil ever pulled was to use if (error instanceof window.Reponse) {. Then you know that error thing is the outcome of THIS = await fetch(URL) (or fetch(URL).then(THIS) if you prefer). Another good trick the devil pulled was to be aware that exceptions, when asked to render in React does not naturally call its .toString() so you have to do that yourself with {error.toString()}.

This codesandbox demonstrates it quite well. (Although, at the time of writing, codesandbox will spew warnings related to testing React components in the console log. Ignore that.)

If you can't open that codesandbox, here's the gist of it:


React.useEffect(() => {
  url &&
    (async () => {
      let response;
      try {
        response = await fetch(url);
      } catch (ex) {
        return setServerError(ex);
      }
      if (!response.ok) {
        return setServerError(response);
      }
      // do something here with `await response.json()`
    })(url);
}, [url]);

By the way, another important trick is to be subtle with how you put the try { and } catch(ex) {.


// DON'T DO THIS

try {
  const response = await fetch(url);
  if (!response.ok) {
    setServerError(response);
  }
  // do something here with `await response.json()`
} catch (ex) {
  setServerError(ex);
}

Instead...


// DO THIS

let response;
try {
  response = await fetch(url);
} catch (ex) {
  return setServerError(ex);
}
if (!response.ok) {
  return setServerError(response);
}
// do something here with `await response.json()`

If you don't do that you risk catching other exceptions that aren't exclusively the fetch() call. Also, notice the use of return inside the catch block which will exit the function early leaving you the rest of the code (de-dented 1 level) to deal with the happy-path response object.

Be aware that the test if (!response.ok) is simplistic. It's just a shorthand for checking if the "status in the range 200 to 299, inclusive". Realistically getting a response.status === 400 isn't an "error" really. It might just be a validation error hint from a server, and likely the await response.json() will work and contain useful information. No need to throw up a toast or a flash message that the communication with the server failed.

Conclusion

The details matter. You might want to deal with exceptions entirely differently from successful responses with bad HTTP status codes. It's nevertheless important to appreciate two things:

  1. Handle complete fetch() failures and feed your UI or your retry mechanisms.

  2. You can, in one component distinguish between a "successful" fetch() call and thrown JavaScript exceptions.

Format thousands in Python

February 1, 2019
9 comments Python

tl;dr; Use f"{number:,}" to thousands format an integer to a string.

I keep forgetting and having to look this up every time. Hopefully by blogging about it, this time it'll stick in my memory. And hopefully in yours too :)

Suppose you have a number, like 1234567890 and you want to display it, here's how you do it:


>>> number = 1234567890
>>> f"{number:,}"
'1,234,567,890'

In the past, before Python 3.6, I've been using:


>>> number = 1234567890
>>> format(number, ",")
'1,234,567,890'

All of this and more detail can be found in PEP 378 -- Format Specifier for Thousands Separator. For example, you can do this beast too:


>>> number = 1234567890
>>> f"{number:020,.2f}"
'0,001,234,567,890.00'

which demonstrates (1) how to do zero-padding (of length 20), (2) the thousands comma, (3) round to 2 significant figures. All useful weapons to be able to draw from the top of your head.

UPDATE

Also, incredibly useful is the equivalent of somestring.ljust(10):


>>> mystr = "peter"
>>> f"{mystr:10}"
'peter     '
>>> f"{mystr:>10}"
'     peter'