Display current React version

January 7, 2018
1 comment JavaScript, React

Usually you know what version of React your app is using by opening the package.json, or poking around in node_modules/react/index.js. But perhaps there are many packaging abstractions in between your command line and the server. Especially if you have a continous integration server that builds your static assets and if that CI uses caching. It might get scary.

If you really want to print out what version of React is rendering your app here's one way to do that:


import React from 'react'

class Introspection extends React.Component {
  render() {
    return <div>
      Currently using React {React.version}
    </div>
  }
}

Suppose that you want this display to depend on the app being in dev or prod mode:


import React from 'react'

class Introspection extends React.Component {
  render() {
    return <div>
      {
        process.env.NODE_ENV === 'development' ?
        <p>Currently using React {React.version}</p> : null
      }
    </div>
  }
}

Note that there's no need to import process.

See this CodeSandbox snippet for a live example.

Whatsdeployed facelift

January 5, 2018
0 comments Python, Web development, Mozilla, Docker

tl;dr; Whatsdeployed.io is an impressively simple web app to help web developers and web ops people quickly see what GitHub commits have made it into your Dev, Stage or Prod environment. Today it got a facelift.

The code is now more than 5 years old and has served me well. It's weird to talk too positively about the app because I actually wrote it but because it's so simple in terms of design and effort it feels less personal to talk about it.

Here's what's in the facelift

  • Upgraded to Bootstrap 4.
  • Instead of relying on downloading a heavy Glyphicon web font, just to display a single checkmark, that's now a simple image.
  • Ability to use a GitHub developer personal token to avoid rate limitations on GitHub's API.
  • The first lookup to get all commits is now done via the Flask app to use my auth token to avoid the rate limit.
  • Much better error handling if any of the underlying requests.get() that the Flask app does, fails. Also includes which URL it failed on.
  • Basic validation to prevent submitting the main form without typing anything in.
  • You can hack on it with Docker. Thanks @willkg.
  • Improved the code that extracts Bugzilla bug numbers out of commit messages. Thanks @edmorely.
  • Refreshed screenshots in the README.md
  • A brand new introduction text on the home page for people who end up on the site not knowing what it is.
  • If any XHR errors happen figuring out the "culprits", you now get a pretty error describing this instead of swallowing it all.

Please let me know if there's anything broken or missing.

How to rotate a video on OSX with ffmpeg

January 3, 2018
5 comments Linux, macOS

Every now and then, I take a video with my iPhone and even though I hold the camera in landscape mode, the video gets recorded in portrait mode. Probably because it somehow started in portrait and didn't notice that I rotated the phone.

So I'm stuck with a 90° video. Here's how I rotate it:

ffmpeg -i thatvideo.mov -vf "transpose=2" ~/Desktop/thatvideo.mov

then I check that ~/Desktop/thatvideo.mov looks like it should.

I can't remember where I got this command originally but I've been relying on my bash history for a looong time so it's best to write this down.
The "transpose=2" means 90° counter clockwise. "transpose=1" means 90° clockwise.

What is ffmpeg??

If you're here because you Googled it and you don't know what ffmpeg is, it's a command line program where you can "programmatically" do almost anything to videos such as conversion between formats, put text in, chop and trim videos. To install it, install Homebrew then type:

brew install ffmpeg

Fastest way to uniquify a list in Python >=3.6

December 23, 2017
7 comments Python

This is an update to a old blog post from 2006 called Fastest way to uniquify a list in Python. But this, time for Python 3.6. Why, because Python 3.6 preserves the order when inserting keys to a dictionary. How, because the way dicts are implemented in 3.6, the way it does that is different and as an implementation detail the order gets preserved. Then, in Python 3.7, which isn't released at the time of writing, that order preserving is guaranteed.

Anyway, Raymond Hettinger just shared a neat little way to uniqify a list. I thought I'd update my old post from 2006 to add list(dict.fromkeys('abracadabra')).

Functions

Reminder, there are two ways to uniqify a list. Order preserving and not order preserving. For example, the unique letters in peter is p, e, t, r in their "original order". As opposed to t, e, p, r.


def f1(seq):  # Raymond Hettinger
    hash_ = {}
    [hash_.__setitem__(x, 1) for x in seq]
    return hash_.keys()

def f3(seq):
    # Not order preserving
    keys = {}
    for e in seq:
        keys[e] = 1
    return keys.keys()

def f5(seq, idfun=None):  # Alex Martelli ******* order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen:
            continue
        seen[marker] = 1
        result.append(item)
    return result

def f5b(seq, idfun=None):  # Alex Martelli ******* order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker not in seen:
            seen[marker] = 1
            result.append(item)

    return result

def f7(seq):
    # Not order preserving
    return list(set(seq))

def f8(seq):  # Dave Kirby
    # Order preserving
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

def f9(seq):
    # Not order preserving, even in Py >=3.6
    return {}.fromkeys(seq).keys()

def f10(seq, idfun=None):  # Andrew Dalke
    # Order preserving
    return list(_f10(seq, idfun))

def _f10(seq, idfun=None):
    seen = set()
    if idfun is None:
        for x in seq:
            if x in seen:
                continue
            seen.add(x)
            yield x
    else:
        for x in seq:
            x = idfun(x)
            if x in seen:
                continue
            seen.add(x)
            yield x

def f11(seq):  # f10 but simpler
    # Order preserving
    return list(_f10(seq))

def f12(seq):
    # Raymond Hettinger
    # https://twitter.com/raymondh/status/944125570534621185
    return list(dict.fromkeys(seq))

Results

FUNCTION        ORDER PRESERVING     MEAN       MEDIAN
f12             yes                  111.0      112.2
f8              yes                  266.3      266.4
f10             yes                  304.0      299.1
f11             yes                  314.3      312.9
f5              yes                  486.8      479.7
f5b             yes                  494.7      498.0
f7              no                   95.8       95.1
f9              no                   100.8      100.9
f3              no                   143.7      142.2
f1              no                   406.4      408.4

Not order preserving

Order preserving

Conclusion

The fastest way to uniqify a list of hashable objects (basically immutable things) is:


list(set(seq))

And the fastest way, if the order is important is:


list(dict.fromkeys(seq))

Now we know.

CSS selector simplifier regular expression in JavaScript

December 20, 2017
0 comments Web development, JavaScript

The Problem

I'm working on a project where it needs to evaluate CSS as a string. Basically, it compares CSS selectors against a DOM to see if the CSS selector is used in the DOM.

But CSS has pseudo classes. A common one a lot of people are familiar with is: a:hover { text-decoration: crazy }. So that :hover part is not relevant when evaluating the CSS selector against the DOM. So you chop off the :hover bit and is left with a which you can then look for in the DOM.

But there are some tricks and make this less trivial. Consider, this from Bootstrap 3


a[href^="#"]:after,
a[href^="javascript:"]:after {
    content: "";
}

In this case we can't simply split on a : character.

Another non-trivial example comes from Semantic UI:


.ui[class*="4:3"].embed {
  padding-bottom: 75%;
}
.ui[class*="16:9"].embed {
  padding-bottom: 56.25%;
}
.ui[class*="21:9"].embed {
  padding-bottom: 42.85714286%;
}

Basically, if you just split the selectors (e.g. a:hover) on the first : and keep everything to the left (e.g. a), with these non-trivial CSS selectors you'd get this:

a[href^="javascript

and

.ui[class*="4

etc. These CSS selectors will fail. Both Firefox and Chrome seem to swallow any errors but cheerio will raise a SyntaxError and not just that but the problem is that the CSS selector is just the wrong one to look for.

The Solution

The solution has to be to split by the : character when it's not between two quotation marks.

This Stackoverflow post helped me with the regex. It was trivial to extend now my final solution looks like this:


/**
 * Reduce a CSS selector to be without any pseudo class parts.
 * For example, from 'a:hover' return 'a'. And from 'input::-moz-focus-inner'
 * to 'input'.
 * Also, more advanced ones like 'a[href^="javascript:"]:after' to
 * 'a[href^="javascript:"]'.
 * The last example works too if the input was 'a[href^='javascript:']:after'
 * instead (using ' instead of ").
 *
 * @param {string} selector
 * @return {string}
 */
const reduceCSSSelector = selector => {
  return selector.split(
    /:(?=([^"'\\]*(\\.|["']([^"'\\]*\\.)*[^"'\\]*['"]))*[^"']*$)/g
  )[0]
}

Extra; About regexes

I've been coding for about 20 years and would like to think I know my way around writing regular expressions in various languages. However, I'm also eager to admit that I often fumble and rely on googling/stackoverflow more than actually understanding what the heck I'm doing. That's why I found this comment so amusing:

Thank you! Didn't think it was possible. I understand 100% of the theory, about 60% of the regex, and I'm down to 0% when it comes to writing it on my own. Oh, well, maybe one of these days. – Azmisov

Msgpack vs JSON (with gzip)

December 19, 2017
17 comments Python, Web development

tl;dr; I see no reason worth switching to Msgpack instead of good old JSON.

I was curious, how much more efficient is Msgpack at packing a bunch of data into a file I can emit from a web service.

In this experiment I take a massive JSON file that is used in a single-page-app I worked on. If I download the file locally as a .json file, the file is 2.1MB.

Converting it to Msgpack:


>>> import json, msgpack
>>> with open('events.json') as f:
...   events=json.load(f)
...
>>> len(events)
3
>>> events.keys()
dict_keys(['max_modified', 'events', 'urls'])
>>> with open('events.msgpack', 'wb') as f:
...   f.write(msgpack.packb(events))
...
1880266

events.json vs events.msgpack
Now, let's compared the two file formats, as seen on disk:

▶ ls -lh events*
-rw-r--r--  1 peterbe  wheel   2.1M Dec 19 10:16 events.json
-rw-r--r--  1 peterbe  wheel   1.8M Dec 19 10:19 events.msgpack

But! How well does it compress?

More common than not your web server can return content encoded in Gzip as content-encoding: gzip. So, let's compare that:

▶ gzip events.json ; gzip events.msgpack
▶ ls -l events*
-rw-r--r--  1 peterbe  wheel  304416 Dec 19 10:16 events.json.gz
-rw-r--r--  1 peterbe  wheel  305905 Dec 19 10:19 events.msgpack.gz

Msgpack vs JSON (with gzip)

Oh my! When you gzip the files the .json file ultimately becomes smaller. By a whopping 0.5%!

What about speed?

First let's open the files a bunch of times and see how long it takes to unpack:


def f1():
    with open('events.json') as f:
        s = f.read()
    t0 = time.time()
    events = json.loads(s)
    t1 = time.time()
    assert len(events['events']) == 4365
    return t1 - t0


def f2():
    with open('events.msgpack', 'rb') as f:
        s = f.read()
    t0 = time.time()
    events = msgpack.unpackb(s, encoding='utf-8')
    t1 = time.time()
    assert len(events['events']) == 4365
    return t1 - t0


def f3():
    with open('events.json') as f:
        s = f.read()
    t0 = time.time()
    events = ujson.loads(s)
    t1 = time.time()
    assert len(events['events']) == 4365
    return t1 - t0

(Note that the timing is around the json.loads() etc without measuring how long it takes to get the files to strings)

json.loads() vs msgpack.unpack() vs. ujson.loads()
Result (using Python 3.6.1): All about the same.

FUNCTION: f1 Used 56 times
    MEDIAN 30.509352684020996
    MEAN   31.09178798539298
    STDEV  3.5620914333233595
FUNCTION: f2 Used 68 times
    MEDIAN 27.882099151611328
    MEAN   28.704492484821994
    STDEV  3.353800228776872
FUNCTION: f3 Used 76 times
    MEDIAN 27.746915817260742
    MEAN   27.920340236864593
    STDEV  2.21554251130519

Same benchmark using PyPy 3.5.3, but skipping the f3() which uses ujson:

FUNCTION: f1 Used 99 times
    MEDIAN 20.905017852783203
    MEAN   22.13949386519615
    STDEV  5.142071370453135
FUNCTION: f2 Used 101 times
    MEDIAN 36.96393966674805
    MEAN   40.54664857316725
    STDEV  17.833577642246738

Dicussion and conclusion

One of the benefits of Msgpack is that it can used for streaming. "Streaming unpacking" as they call it. But, to be honest, I've never used it. That can useful when you have structured data trickling in and you don't want to wait for it all before using the data.

Another cool feature Msgpack has is ability to encode custom types. E.g. datetime.datetime. Like bson can do. With JSON you have to, for datetime objects do string conversions back and forth and the formats are never perfectly predictable so you kinda have to control both ends.

But beyond some feature differences, it seems that JSON compressed just as well as Msgpack when Gzipped. And unlike Msgpack JSON is not binary so it's easy to poke around with any tool. And decompressing JSON is just as fast. Almost. But if you need to squeeze out a couple of extra free milliseconds from your JSON files you can use ujson.

Conclusion; JSON is fine. It's bigger but if you're going to Gzip anyway, it's just as small as Msgpack.

Bonus! BSON

Another binary encoding format that supports custom types is BSON. This one is a pure Python implementation. BSON is used by MongoDB but this bson module is not what PyMongo uses.

Size comparison:

▶ ls -l events*son
-rw-r--r--  1 peterbe  wheel  2315798 Dec 19 11:07 events.bson
-rw-r--r--  1 peterbe  wheel  2171439 Dec 19 10:16 events.json

So it's 7% larger than JSON uncompressed.

▶ ls -l events*son.gz
-rw-r--r--  1 peterbe  wheel  341595 Dec 19 11:07 events.bson.gz
-rw-r--r--  1 peterbe  wheel  304416 Dec 19 10:16 events.json.gz

Meaning it's 12% fatter than JSON when Gzipped.

Doing a quick benchmark with this:


def f4():
    with open('events.bson', 'rb') as f:
        s = f.read()
    t0 = time.time()
    events = bson.loads(s)
    t1 = time.time()
    assert len(events['events']) == 4365
    return t1 - t0

Compared to the original f1() function:

FUNCTION: f1 Used 106 times
    MEDIAN 29.58393096923828
    MEAN   30.289863640407347
    STDEV  3.4766612593557173
FUNCTION: f4 Used 94 times
    MEDIAN 231.00042343139648
    MEAN   231.40889786659403
    STDEV  8.947746458066405

In other words, bson is about 600% slower than json.

This blog post was supposed to be about how well the individual formats size up against each other on disk but it certainly would be interesting to do a speed benchmark comparing Msgpack and JSON (and maybe BSON) where you have a bunch of datetimes or decimal.Decimal objects and see if the difference is favoring the binary formats.

Another win for Tracking Protection in Firefox

December 13, 2017
0 comments Web development, Mozilla

When I read Swedish news I usually open DN.se. At least I used to. Opening it used to feel like an "investment". It takes too long to load and it makes the browser all janky because of the weight of all ad videos and whatnot.

With Tracking Protection, which is built in to recent versions of Firefox, all of those load speed concerns goes away pretty much.

Without Tracking Protection

Without

  • 322 requests
  • 8.78MB transferred
  • 16.12s to finish fully loading

With Tracking Protection

With Tracking Protection

  • 67 requests
  • 1.15MB transferred
  • 8.42s to finish fully loading

I've blogged about this before but I'm just so excited I felt like repeating it. With Tracking Protection now enabled it actually feels fun to open heavy-on-ads news sites again. Perhaps even worth paying for.

How's My WiFi?

December 8, 2017
2 comments macOS, JavaScript, Node

This was one of those late-evening-after-the-kids-are-asleep project. Followed by some next-morning-sober-readme-fixes-and-npmjs-paperwork.

It's a little Node script that will open https://fast.com with puppeteer, and record, using document.querySelector('#speed-value') what my current Internet speed is according to that app. It currently only works on OSX but it should be easy to fix for someone handy on Linux or Windows.

You can either run it just once and get a readout. That's basically as useful as opening fast.com in a new browser tab.
The other way is to run it in a loop howsmywifi --loop and sit and watch as it tries to figure out what your Internet speed is after multiple measurements.

Screenshot

That's it!

The whole point of this was for me to get an understanding of what my Internet speed is and if I'm being screwed by Comcast. The measurements are very erratic and they might sporadically depend on channel noise on the WiFi or just packet crowding when other devices is overcrowding the pipes with heavy downloads such as video chatting or watching movies or whatever.

I've seen 98 Mbps with my iPhone on this network. Not so much today.

And Screenshots!

As a bonus, it will take a screenshot (if you pass the --screenshots flag) of the fast.com page each time it has successfully measured. Not sure what to do with this. If you have ideas, let me know.

Really simple Django view function timer decorator

December 8, 2017
2 comments Python, Django

I use this sometimes to get insight into how long some view functions take. Perhaps you find it useful too:


def view_function_timer(prefix='', writeto=print):

    def decorator(func):
        @functools.wraps(func)
        def inner(*args, **kwargs):
            try:
                t0 = time.time()
                return func(*args, **kwargs)
            finally:
                t1 = time.time()
                writeto(
                    'View Function',
                    '({})'.format(prefix) if prefix else '',
                    func.__name__,
                    args[1:],
                    'Took',
                    '{:.2f}ms'.format(1000 * (t1 - t0)),
                    args[0].build_absolute_uri(),
                )
        return inner

    return decorator

And to use it:


from wherever import view_function_timer


@view_function_timer()
def homepage(request, thing):
    ...
    return render(request, template, context)

And then it prints something like this:

View Function  homepage ('valueofthing',) Took 23.22ms http://localhost:8000/home/valueofthing

It's useful when you don't want a full-blown solution to measure all view functions with a middleware or something.
It can be useful also to see how a cache decorator might work:


from django.views.decorators.cache import cache_page
from wherever import view_function_timer


@view_function_timer('possibly cached')
@cache_page(60 * 60 * 2)  # two hours cache
@view_function_timer('not cached')
def homepage(request, thing):
    ...
    return render(request, template, context)

That way you can trace that, with tail -f or something, to see how/if the cacheing decorator works.

There are many better solutions that are more robust but might be a bigger investment. For example, I would recommend markus which, if you don't have a statsd server you can configure to logger.info call the timings.

Synonyms with elasticsearch-dsl

December 5, 2017
0 comments Python, Web development, PostgreSQL

The documentation about how to use synonyms in Elasticsearch is good but because it's such an advanced topic, even if you read the documentation carefully, you're still left with lots of questions. Let me show you some things I've learned about how to use synonyms in Python with elasticsearch-dsl.

What's the nature of your documents?

I'm originally from Sweden but moved to London, UK in 1999 and started blogging a few years after. So I wrote most of my English with British English spelling. E.g. "centre" instead of "center". Later I moved to California in the US and slowly started to change my own English over to American English. I kept blogging but now I would prefer to write "center" instead of "centre".

Another example... Certain technical words or namings are tricky. For example, is it "go" or is it "golang"? Is it "React" or is it "ReactJS"? Is it "PostgreSQL" or "Postgres". I never know. Not only is it sometimes hard to know which is right because people use them differently, but also sometimes "brands" like that change over time since inception, the creator might have preferred something but the masses of people call it something else.

So with all that in mind, not only has the nature of my documents (my blog post texts) changed in terminology over the years. My visitors are also coming both from British English and American English. Or, suppose that I knew the perfect way to phrase that relational database that starts with "Postg...". Even if my text is always spelled one particular way, perfectly, my visitors will most likely refer to it as "postgres" sometimes and "postgresql" sometimes.

The simple solution, match all!

Create a custom analyzer

Let's jump straight into the code. People who have used elasticsearch_dsl should be familiar with most of this:


from elasticsearch_dsl import (
    DocType,
    Text,
    Index,
    analyzer,
    Keyword,
    token_filter,
)
from django.conf import settings


index = Index(settings.ES_INDEX)
index.settings(**settings.ES_INDEX_SETTINGS)


synonym_tokenfilter = token_filter(
    'synonym_tokenfilter',
    'synonym',
    synonyms=[
        'reactjs, react',  # <-- important
    ],
)

text_analyzer = analyzer(
    'text_analyzer',
    tokenizer='standard',
    filter=[
        # The ORDER is important here.
        'standard',
        'lowercase',
        'stop',
        synonym_tokenfilter,
        # Note! 'snowball' comes after 'synonym_tokenfilter'
        'snowball',
    ],
    char_filter=['html_strip']
)

class BlogItemDoc(DocType):
    oid = Keyword(required=True)
    title = Text(
        required=True, 
        analyzer=text_analyzer
    )
    text = Text(analyzer=text_analyzer)

index.doc_type(BlogItemDoc)

This code above is copied from the "real code" but a lot of distracting things that aren't important to the point, have been removed.

The magic sauce here is that you create a token_filter and you can call it whatever you want. I called mine synonym_tokenfilter and that's also what the instance variable is called.

Notice the list of synonyms. It's a plain list of strings. Specifically, it's a list of 1 string reactjs, react.

Let's see how Elasticsearch analyzes this:
First with the text react.

$ curl -XGET 'http://127.0.0.1:9200/peterbecom/_analyze?analyzer=text_analyzer&text=react&pretty=1'
{
  "tokens" : [
    {
      "token" : "react",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "reactj",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

Note that the analyzer snowball, converted reactjs to reactj which is wrong in a sense, because there's not plural "reacts", but it ultimately doesn't matter much. At least not in this particular case.

Secondly, analyze it with the text reactjs:

$ curl -XGET 'http://127.0.0.1:9200/peterbecom/_analyze?analyzer=text_analyzer&text=reactjs&pretty=1'
{
  "tokens" : [
    {
      "token" : "reactj",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "react",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

Same tokens! Just different order.

Test it for reals

Now, the real proof is in actually doing a search on this. Look at these two screenshots:

Search for 'react'

Search for 'reactjs'

It worked! Different ways of phrasing your search but ultimately found all the documents that matched independent of different people or different authors might prefer to spell it.

Try it for yourself:

What it looked like before

Check out these two screenshots of how it would look like before, when synonyms for postgres and postgresql had not been set up yet:

Searching for 'postgresql'

Searching for 'postgres'

One immediate thought I have is what a mess I've been in blogging about that database. Clearly I struggled to pick one way to spell it consistently.

And here's what it would look like once that synonym has been set up:

Synonym set up for 'postgres' and 'postgresql'

"go" versus "golang"

Go is a programming language. That term, too, struggles with a name ambiguity. Granted, I rarely hear people say "golang", but it's definitely a written word that turns up a lot.

The problem with setting up a synonym for go == golang is that "go" is common English word. It's also the stem of the word "going" and such. So if you set up a synonym, like I did for react and reactjs above, this is what happens:

Search for 'golang'

This is now the exact search results as if I had searched for go. But look what it matched! It matched "Go" (good) but also "Going real simple..." (bad) and "...I should go" (bad).

If someone searches for the simple term "go" they probably intend to search for the Go programming language. All that snowball stemming is critical for a bunch of other non-computer-term searches so we can't remove the stemming.

The solution is to use what's called "Simple Contraction". And it looks like this:


all_synonyms = [
    'go => golang',
    'react => reactjs',
    'postgres => postgresql',
]

That basically means that a search for go is a search for golang. And a document that uses the word go (alone) is indexed as golang.

What happens is that the word go gets converted to golang which doesn't get stemming converted down to any other forms.

However, this is no silver bullet. Any search for the term go is ultimately a search for the word golang and the regular English word go. So the benefit of all of this was that we got rid of search results matching on going and gone.

What you have to decide...

The case for go is similar to the case for react. Both of these words are nouns but they're also verbs.

Should people find "reacting to events" when they search for "react"? If so, use react, reactjs in the synonyms list.

Should people only find documents related to noun "React" when they search for "event handing in react"? If so, use react => reactjs in the synonyms list.

It's up to you and your documents and what your users tend to search for.

Bonus! For American vs British English

AVKO.org publishes a list of all British to American English synonyms. You can download the whole list here. Unfortunately I can't find a license for this file but the compiled synonyms file is part of this repo which is licensed under MIT.

I download this list and keep it in the repo. Then when setting up the analyzer and token filters I load it in like this:


synonyms_root = os.path.join(
    settings.BASE_DIR, 'peterbecom/es-synonyms'
)
american_british_syns_fn = os.path.join(
    synonyms_root, 'be-ae.synonyms'
)

with open(american_british_syns_fn) as f:
    for line in f:
        if (
            '=>' not in line or 
             line.strip().startswith('#')
         ):
            continue
        all_synonyms.append(line.strip())

Now I can finally enjoy not having to worry about the fact that sometimes I spell it "license" and sometimes I spell it "licence". It's all the same now. Brits and Americans, rejoice on common ground!

Bonus! For terrible spellers

Although I don't have a big problem with this on my techy blog but you can use the Simple Contraction technique to list unambiguously bad spelling. Add dont => don't to the list of synonyms and a search for dont is a search for don't.

Last but not least, the official Elasticsearch documentation is the place to go. This blog post hopefully phrases it in more approachable terms. Especially for Python peeps.