How to resolve a git conflict in poetry.lock

February 7, 2020
8 comments Python

We use poetry in MDN Kuma. That means there's a pyproject.toml and a poetry.lock file. To add or remove dependencies, you don't touch either file in an editor. For example, to add a package:

poetry add --dev black

It changes pyproject.toml and poetry.lock for you. (Same with yarn add somelib which edits package.json and yarn.lock).

Suppose that you make a pull request to add a new dependency, but someone sneaks a new pull request in before you and have theirs landed in master before. Well, that's how you end up in this place:

Conflicting files

So how do you resolve that?

So, you go back to your branch and run something like:

git checkout master 
git pull origin master
git checkout my-branch
git merge master

Now you get this in git status:

Unmerged paths:
  (use "git add <file>..." to mark resolution)
    both modified:   poetry.lock

And the contents of poetry.lock looks something like this:

Conflict

I wish there just was a way poetry itself could just figure fix this.

What you need to do is to run:

# Get poetry.lock to look like it does in master
git checkout --theirs poetry.lock
# Rewrite the lock file
poetry lock --no-update

Now, your poetry.lock file should correctly reflect the pyproject.toml that has been merged from master.

To finish up, resolve the conflict:

git add poetry.lock
git commit -a -m "conflict resolved"

# and most likely needed
poetry install

content-hash

Inside the poetry.lock file there's the lock file's hash. It looks like this:

[metadata]
content-hash = "875b6a3628489658b323851ce6fe8dafacd5f69e5150d8bb92b8c53da954c1be"

So, as can be seen in my screenshot, when git conflicted on this it looks like this:


 [metadata]
+<<<<<<< HEAD
+content-hash = "6658b1379d6153dd603bbc27d04668e5e93068212c50e76bd068e9f10c0bec59"
+=======
 content-hash = "5c00dce18ddffd5d6f797dfa14e4d56bf32bbc3769d7b761a2b1b3ff14bce287"
+>>>>>>> master

Basically, the content-hash = "5c00dce1... is what you'd find in master and content-hash = "6658b137... is what you would see in your branch before the conflict.

When you run that poetry lock you can validate that the new locking worked because it should be a hash. One that is neither 5c00dce1... or 6658b137....

Notes

I'm still new to poetry and I'm learning. This was just some loud note-to-self so I can remember for next time.

I don't yet know what else can be automated if there's a conflict in pyproject.toml too. And what do you do if there are serious underlying conflicts in Python packages, like they added a package that requires somelib<=0.99 and you added something that requires somelib>=1.11.

Also, perhaps there are ongoing efforts within the poetry project to help out with this.

UPDATE Feb 12, 2020

My colleague informed me that this change was actually NOT what I wanted. poetry lock actually updates some dependencies as it makes a completely new lock file. I didn't immediately notice that in my case because the lock file is large. See this open issue which is about the ability to update the lock file without upgrading any other dependencies.

UPDATE June 24, 2021

To re-lock the file, use poetry lock --no-update after you've run git checkout --theirs poetry.lock.

"ld: library not found for -lssl" trying to install mysqlclient in Python on macOS

February 5, 2020
1 comment Python, macOS

I don't know how many times I've encountered this but by blogging about it, hopefully, next time it'll help me, and you!, find this sooner.

If you get this:

clang -bundle -undefined dynamic_lookup -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/Users/peterbe/.pyenv/versions/3.8.0/lib -L/opt/boxen/homebrew/lib -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/Users/peterbe/.pyenv/versions/3.8.0/lib -L/opt/boxen/homebrew/lib -L/opt/boxen/homebrew/lib -I/opt/boxen/homebrew/include build/temp.macosx-10.14-x86_64-3.8/MySQLdb/_mysql.o -L/usr/local/Cellar/mysql/8.0.18_1/lib -lmysqlclient -lssl -lcrypto -o build/lib.macosx-10.14-x86_64-3.8/MySQLdb/_mysql.cpython-38-darwin.so
    ld: library not found for -lssl
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command 'clang' failed with exit status 1

(The most important line is the ld: library not found for -lssl)

On most macOS systems, when trying to install a Python package that requires a binary compile step based on the system openssl (which I think comes from the OS), you'll get this.

The solution is simple, run this first:


export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I/usr/local/opt/openssl/include"

Depending on your install of things, you might need to adjust this accordingly. For me, I have:

ls -l /usr/local/opt/openssl/
total 1272
-rw-r--r--   1 peterbe  staff     717 Sep 10 09:13 AUTHORS
-rw-r--r--   1 peterbe  staff  582924 Dec 19 11:32 CHANGES
-rw-r--r--   1 peterbe  staff     743 Dec 19 11:32 INSTALL_RECEIPT.json
-rw-r--r--   1 peterbe  staff    6121 Sep 10 09:13 LICENSE
-rw-r--r--   1 peterbe  staff   42183 Sep 10 09:13 NEWS
-rw-r--r--   1 peterbe  staff    3158 Sep 10 09:13 README
drwxr-xr-x   4 peterbe  staff     128 Dec 19 11:32 bin
drwxr-xr-x   3 peterbe  staff      96 Sep 10 09:13 include
drwxr-xr-x  10 peterbe  staff     320 Sep 10 09:13 lib
drwxr-xr-x   4 peterbe  staff     128 Sep 10 09:13 share

Now, with those things set you should hopefully be able to do things like:

pip install mysqlclient

Performance of truth checking a JavaScript object

February 3, 2020
0 comments Node, JavaScript

I'm working on a Node project that involves large transformations of large sets of data here and there. For example:


if (!Object.keys(this.allTitles).length) {
  ...

In my case, that this.allTitles is a plain object with about 30,000 key/value pairs. That particular line of code actually only runs 1 single time so if it's hundreds of milliseconds, it's really doesn't matter that much. However, that's not a guarantee! What if you had something like this:


for (const thing of things) {
  if (!Object.keys(someObj).length) {
    // mutate someObj
  }
}

then, you'd potentially have a performance degradation once someObj becomes considerably large. And it gets particularly degraded if the length of things is considerably large as it would do the operation many times.

Actually, consider this:


const obj = {};
[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

On my macBook with Node 13.5, this outputs:

Truthcheck obj: 260.564ms

Maps

The MDN page on Map has a nice comparison, in terms of performance, between Map and regular object. Consider this super simple benchmark:


const obj = {};
const map = new Map();

[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
  map.set(i, i);
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

console.time("Truthcheck map");
[...Array(100)].forEach((_, i) => {
  return !!map.size;
});
console.timeEnd("Truthcheck map");

So, fill a Map instance and a plain object with 30,000 keys and values. Then, for each in turn, check if the thing is truthy 100 times. The output I get:

Truthcheck obj: 235.017ms
Truthcheck map: 0.029ms

That's not unexpected. The map instance maintains a size counter, which increments on .set (if the key is new), so doing that "truthy" check just takes O(1) seconds.

Conclusion

Don't run to rewrite everything to Maps!

In fact, I took the above mentioned little benchmark and changed the times to be a 3,000 item map and obj (instead of 30,000) and only did 10 iterations (instead of 100) and then the numbers are:

Truthcheck obj: 0.991ms
Truthcheck map: 0.044ms

These kinds of small numbers are very unlikely to matter in the scope of other things going on.

Anyway, consider using Map if you fear that you might be working with really reeeeally large mappings.

How to pad/fill a string by a variable in Python using f-strings

January 24, 2020
9 comments Python

I often find myself Googling for this. Always a little bit embarrassed that I can't remember the incantation (syntax).

Suppose you have a string mystr that you want to fill with with spaces so it's 10 characters wide:


>>> mystr = 'peter'
>>> mystr.ljust(10)
'peter     '
>>> mystr.rjust(10)
'     peter'

Now, with "f-strings" you do:


>>> mystr = 'peter'
>>> f'{mystr:<10}'
'peter     '
>>> f'{mystr:>10}'
'     peter'

What also trips me up is, suppose that the number 10 is variable. I.e. it's not hardcoded into the f-string but a variable from somewhere else. Here's how you do it:


>>> width = 10
>>> f'{mystr:<{width}}'
'peter     '
>>> f'{mystr:>{width}}'
'     peter'

What I haven't figured out yet, is how you specify a different character than a simple single whitespace. I.e. does anybody know how to do this, but with f-strings:


>>> width = 10
>>> mystr.ljust(width, '*')
'peter*****'

UPDATE

First of all, I left two questions unanswered. One was how do you make the filler something other than ' '. The answer is:


>>> f'{"peter":*<10}'
'peter*****'

The question question was, what if you don't know what the filler character should be. In the above example, * was hardcoded inside the f-string. The solution is stunningly simple actually.


>>> width = 10
>>> filler = '*'
>>> f'{"peter":{filler}<{width}}'
'peter*****'

But note, it has to be a single length string. This is what happens if you try to make it a longer string:


>>> filler = 'xxx'
>>> f'{"peter":{filler}<{width}}'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format specifier

JavaScript destructuring like Python kwargs with defaults

January 18, 2020
1 comment Python, JavaScript

In Python

I'm sure it's been blogged about a buncha times before but, I couldn't find it, and I had to search too hard to find an example of this. Basically, what I'm trying to do is what Python does in this case, but in JavaScript:


def do_something(arg="notset", **kwargs):
    print(f"arg='{arg.upper()}'")

do_something(arg="peter")
do_something(something="else")
do_something()

In Python, the output of all this is:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

It could also have been implemented in a more verbose way:


def do_something(**kwargs):
    arg = kwargs.get("arg", "notset")
    print(f"arg='{arg.upper()}'")

This more verbose format has the disadvantage that you can't quickly skim it and see and what the default is. That thing (arg = kwargs.get("arg", "notset")) might happen far away deeper in the function, making it hard work to spot the default.

In JavaScript

Here's the equivalent in JavaScript (ES6?):


function doSomething({ arg = "notset", ...kwargs } = {}) {
  return `arg='${arg.toUpperCase()}'`;
}

console.log(doSomething({ arg: "peter" }));
console.log(doSomething({ something: "else" }));
console.log(doSomething());

Same output as in Python:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

Notes

I'm still not convinced I like this syntax. It feels a bit too "hip" and too one-liner'y. But it's also pretty useful.

Mind you, the examples here are contrived because they're so short in terms of the number of arguments used in the function.
A more realistic thing like be a function that lists, upfront, all the possible parameters and for some of them, it wants to point out some defaults. E.g.


function processFolder({
  source,
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder({ source: "/user", quiet: true }));

One could maybe argue that arguments that don't have a default are expected to always be supplied so they can be regular arguments like:


function processFolder(source, {
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder("/user", { quiet: true }));

But, I quite like keeping all arguments in an object. It makes it easier to write wrapper functions and I find this:


setProfile(
  "My biography here",
  false,
  193.5,
  230,
  ["anders", "bengt"],
  "South Carolina"
);

...harder to read than...


setProfile({
  bio: "My biography here",
  dead: false,
  height: 193.5,
  weight: 230,
  middlenames: ["anders", "bengt"],
  state: "South Carolina"
});

How depend on a local Node package without npmjs.com

January 15, 2020
0 comments JavaScript

Suppose that you're working on ~/dev/my-cool-project and inside ~/dev/my-cool-project/package.json you might have something like this:

"dependencies": {
     "that-cool-lib": "1.2.3",
     ...

But that that-cool-lib is one of your own projects. You're also working on that project and it's over at ~/dev/that-cool-lib. Within that-cool-lib you might be in a git branch or perhaps you're preparing a 2.0.0 release.

Now you're interested if that-cool-lib@2.0.0 is going to work here inside my-cool-project.

What you could do

First, you release this fancy that-cool-lib@2.0.0 to npmjs.com with that project's npm publish procedure. Then as soon as that's done and you can see that the release made it onto https://www.npmjs.com/package/that-cool-lib/v/2.0.0.

Then you go over to my-cool-project and start a new git branch to try the upgrade, npm install that-cool-project@2.0.0 --save so you have this:

"dependencies": {
-    "that-cool-lib": "1.2.3",
+    "that-cool-lib": "2.0.0",
     ...

Now you can try it that new version of my-cool-project and if that-cool-lib had any of its own entry point executables or post/pre install steps, they'd be fully resolved.

What you should do

Instead, use install-local. Don't use npm link because it might not install entry point executables and I also don't like the fact that I need to go into that-cool-lib and install it (globally?) first (when you do cd that-cool-lib && npm link). Also, see "What's wrong with npm-link?".

Here's how you do it:

npx install-local ~/dev/that-cool-lib

and it acts pretty much exactly as if you had gotten it from npmjs.com the normal way.

Notes

I almost never use npm these days. Go yarn! So, perhaps I've misinterpreted something.

Also, I try my very hardest to never use npm install -g ... (or yarn global ... for that matter) now that we have npx. Perhaps if you'd install it locally it'd speed up the use of local-install by 1-3 seconds each time you run this. Again, my skillset of modern npm is fading so I don't think I understand why it takes me 14 seconds the first time I run npx install that-cool-lib and then it takes 14 seconds again when I run the exact same command again. Does it not benefit from any caching? How much of that time is spent on npmjs.com resolving other sub-dependencies that that-cool-lib requires?

Hopefully, this helps other people stuck in a similar boat.

How to have default/initial values in a Django form that is bound and rendered

January 10, 2020
11 comments Web development, Django, Python

Django's Form framework is excellent. It's intuitive and versatile and, best of all, easy to use. However, one little thing that is not so intuitive is how do you render a bound form with default/initial values when the form is never rendered unbound.

If you do this in Django:


class MyForm(forms.Form):
    name = forms.CharField(required=False)

def view(request):
    form = MyForm(initial={'name': 'Peter'})
    return render(request, 'page.html', form=form)

# Imagine, in 'page.html' that it does this:
#  <label>Name:</label>
#  {{ form.name }}

...it will render out this:


<label>Name:</label>
<input type="text" name="name" value="Peter">

The whole initial trick is something you can set on the whole form or individual fields. But it's only used in UN-bound forms when rendered.

If you change your view function to this:


def view(request):
    form = MyForm(request.GET, initial={'name': 'Peter'}) # data passed!
    if form.is_valid():  # makes it bound!
        print(form.cleaned_data['name'])
    return render(request, 'page.html', form=form)

Now, the form is bound and the initial stuff is essentially ignored.
Because name is not present in request.GET. And if it was present, but an empty string, it wouldn't be able to benefit for the default value.

My solution

I tried many suggestions and tricks (based on rapid Stackoverflow searching) and nothing worked.

I knew one thing: Only the view should know the actual initial values.

Here's what works:


import copy


class MyForm(forms.Form):
    name = forms.CharField(required=False)

    def __init__(self, data, **kwargs):
        initial = kwargs.get('initial', {})
        data = {**initial, **data}
        super().__init__(data, **kwargs)

Now, suppose you don't have ?name=something in request.GET the line print(form.cleaned_data['name']) will print Peter and the rendered form will look like this:


<label>Name:</label>
<input type="text" name="name" value="Peter">

And, as expected, if you have ?name=Ashley in request.GET it will print Ashley and produce this rendered HTML too:


<label>Name:</label>
<input type="text" name="name" value="Ashley">

UPDATE June 2020

If data is a QueryDict object (e.g. <QueryDict: {'days': ['90']}>), and initial is a plain dict (e.g. {'days': 30}),
then you can merge these with {**data, **initial} because it produces a plain dict of value {'days': [90]} which Django's form stuff doesn't know is supposed to be "flattened".

The solution is to use:


from django.utils.datastructures import MultiValueDict

...

    def __init__(self, data, **kwargs):
        initial = kwargs.get("initial", {})
        data = MultiValueDict({**{k: [v] for k, v in initial.items()}, **data})
        super().__init__(data, **kwargs)

(To be honest; this might work in the app I'm currently working on but I don't feel confident that this is covering all cases)

How to split a block of HTML with Cheerio in NodeJS

January 3, 2020
2 comments Node, JavaScript

cheerio is a great Node library for processing HTML. It's faster than JSDOM and years and years of jQuery usage makes the API feel yummily familiar.

What if you have a piece of HTML that you want to split up into multiple blocks? For example, you have this:


<div>Prelude</div>

<h2>First Header</h2>

<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>

<h2 id="second">Second Header</h2>

<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>

and you want to get this split by the <h2> tags so you end up with 3 (in this example) distinct blocks of HTML, like this:

first one


<div>Prelude</div>

second one


<h2>First Header</h2>

<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>

third one


<h2 id="second">Second Header</h2>

<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>

You could try to cast the regex spell on that and try to, I don't know, split the string by the </h2>. But it's risky and error prone because (although a bit unlikely in this simple example) get caught up in <h2>...</h2> tags that are nested inside something else. Also, proper parsing almost always wins in the long run over regexes.

Use cheerio

This is how I solved it and hopefully A) you can copy and benefit, or B) someone tells me there's already a much better way.

What you do is walk the DOM root nodes, one by one, and keep filling a buffer and then yield individual new cheerio instances.


const html = `
<div>Prelude</div>

<h2>First Header</h2>
<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>
<!-- comment -->

<h2 id="second">Second Header</h2>
<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>
`;

// load the raw HTML
// it needs to all be wrapped in *one* big wrapper
const $ = cheerio.load(`<div id="_body">${html}</div>`);

// the end goal
const blocks = [];

// the buffer
const section = cheerio
  .load("<div></div>", { decodeEntities: false })("div")
  .eq(0);

const iterable = [...$("#_body")[0].childNodes];
let c = 0;
iterable.forEach(child => {
  if (child.tagName === "h2") {
    if (c) {
      blocks.push(section.clone());
      section.empty();
      c = 0; // reset the counter
    }
  }
  c++;
  section.append(child);
});
if (c) {
  // stragglers
  blocks.push(section.clone());
}

// Test the result
const blocksAsStrings = blocks.map(block => block.html());
console.log(blocksAsStrings.length);
// 3
console.log(blocksAsStrings);
// [
//   '\n<div>Prelude</div>\n\n',
//   '<h2>First Header</h2>\n' +
//     '<p>Paragraph <b>here</b>.</p>\n' +
//     '<p>Another paragraph.</p>\n' +
//     '<!-- comment -->\n' +
//     '\n',
//   '<h2 id="second">Second Header</h2>\n' +
//     '<ul>\n' +
//     '  <li>One</li>\n' +
//     '  <li>Two</li>\n' +
//     '</ul>\n' +
//     '<blockquote>End quote!</blockquote>\n'
// ]

In this particular implementation the choice of splitting is by the every h2 tag. If you want to split by anything else, go ahead and adjust the conditional there where it's currently doing if (child.tagName === "h2") {.

Also, what you do with the blocks is up to you. Perhaps you need them as strings, then you use the blocks.map(block => block.html()). Otherwise, if it serves your needs they can remain as individual cheerio instances that you can do whatever with.

A Python and Preact app deployed on Heroku

December 13, 2019
2 comments Web development, Django, Python, Docker, JavaScript

Heroku is great but it's sometimes painful when your app isn't just in one single language. What I have is a project where the backend is Python (Django) and the frontend is JavaScript (Preact). The folder structure looks like this:

/
  - README.md
  - manage.py
  - requirements.txt
  - my_django_app/
     - settings.py
     - asgi.py
     - api/
        - urls.py
        - views.py
  - frontend/
     - package.json
     - yarn.lock
     - preact.config.js
     - build/
        ...
     - src/
        ...

A bunch of things omitted for brevity but people familiar with Django and preact-cli/create-create-app should be familiar.
The point is that the root is a Python app and the front-end is exclusively inside a sub folder.

When you do local development, you start two servers:

  • ./manage.py runserver - starts http://localhost:8000
  • cd frontend && yarn start - starts http://localhost:3000

The latter is what you open in your browser. That preact app will do things like:


const response = await fetch('/api/search');

and, in preact.config.js I have this:


export default (config, env, helpers) => {

  if (config.devServer) {
    config.devServer.proxy = [
      {
        path: "/api/**",
        target: "http://localhost:8000"
      }
    ];
  }

};

...which is hopefully self-explanatory. So, calls like GET http://localhost:3000/api/search actually goes to http://localhost:8000/api/search.

That's when doing development. The interesting thing is going into production.

Before we get into Heroku, let's first "merge" the two systems into one and the trick used is Whitenoise. Basically, Django's web server will be responsibly not only for things like /api/search but also static assets such as / --> frontend/build/index.html and /bundle.17ae4.js --> frontend/build/bundle.17ae4.js.

This is basically all you need in settings.py to make that happen:


MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "whitenoise.middleware.WhiteNoiseMiddleware",
    ...
]

WHITENOISE_INDEX_FILE = True

STATIC_URL = "/"
STATIC_ROOT = BASE_DIR / "frontend" / "build"

However, this isn't quite enough because the preact app uses preact-router which uses pushState() and other code-splitting magic so you might have a URL, that users see, like this: https://myapp.example.com/that/thing/special and there's nothing about that in any of the Django urls.py files. Nor is there any file called frontend/build/that/thing/special/index.html or something like that.
So for URLs like that, we have to take a gamble on the Django side and basically hope that the preact-router config knows how to deal with it. So, to make that happen with Whitenoise we need to write a custom middleware that looks like this:


from whitenoise.middleware import WhiteNoiseMiddleware


class CustomWhiteNoiseMiddleware(WhiteNoiseMiddleware):
    def process_request(self, request):
        if self.autorefresh:
            static_file = self.find_file(request.path_info)
        else:
            static_file = self.files.get(request.path_info)

            # These two lines is the magic.
            # Basically, the URL didn't lead to a file (e.g. `/manifest.json`)
            # it's either a API path or it's a custom browser path that only
            # makes sense within preact-router. If that's the case, we just don't
            # know but we'll give the client-side preact-router code the benefit
            # of the doubt and let it through.
            if not static_file and not request.path_info.startswith("/api"):
                static_file = self.files.get("/")

        if static_file is not None:
            return self.serve(static_file, request)

And in settings.py this change:


MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
-   "whitenoise.middleware.WhiteNoiseMiddleware",
+   "my_django_app.middleware.CustomWhiteNoiseMiddleware",
    ...
]

Now, all traffic goes through Django. Regular Django view functions, static assets, and everything else fall back to frontend/build/index.html.

Heroku

Heroku tries to make everything so simple for you. You basically, create the app (via the cli or the Heroku web app) and when you're ready you just do git push heroku master. However that won't be enough because there's more to this than Python.

Unfortunately, I didn't take notes of my hair-pulling excruciating journey of trying to add buildpacks and hacks and Procfiles and custom buildpacks. Nothing seemed to work. Perhaps the answer was somewhere in this issue: "Support running an app from a subdirectory" but I just couldn't figure it out. I still find buildpacks confusing when it's beyond Hello World. Also, I didn't want to run Node as a service, I just wanted it as part of the "build process".

Docker to the rescue

Finally I get a chance to try "Deploying with Docker" in Heroku which is a relatively new feature. And the only thing that scared me was that now I need to write a heroku.yml file which was confusing because all I had was a Dockerfile. We'll get back to that in a minute!

So here's how I made a Dockerfile that mixes Python and Node:


FROM node:12 as frontend

COPY . /app
WORKDIR /app
RUN cd frontend && yarn install && yarn build


FROM python:3.8-slim

WORKDIR /app

RUN groupadd --gid 10001 app && useradd -g app --uid 10001 --shell /usr/sbin/nologin app
RUN chown app:app /tmp

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    gcc apt-transport-https python-dev

# Gotta try moving this to poetry instead!
COPY ./requirements.txt /app/requirements.txt
RUN pip install --upgrade --no-cache-dir -r requirements.txt

COPY . /app
COPY --from=frontend /app/frontend/build /app/frontend/build

USER app

ENV PORT=8000
EXPOSE $PORT

CMD uvicorn gitbusy.asgi:application --host 0.0.0.0 --port $PORT

If you're not familiar with it, the critical trick is on the first line where it builds some Node with as frontend. That gives me a thing I can then copy from into the Python image with COPY --from=frontend /app/frontend/build /app/frontend/build.

Now, at the very end, it starts a uvicorn server with all the static .js, index.html, and favicon.ico etc. available to uvicorn which ultimately runs whitenoise.

To run and build:

docker build . -t my_app
docker run -t -i --rm --env-file .env -p 8000:8000 my_app

Now, opening http://localhost:8000/ is a production grade app that mixes Python (runtime) and JavaScript (static).

Heroku + Docker

Heroku says to create a heroku.yml file and that makes sense but what didn't make sense is why I would add cmd line in there when it's already in the Dockerfile. The solution is simple: omit it. Here's what my final heroku.yml file looks like:


build:
  docker:
    web: Dockerfile

Check in the heroku.yml file and git push heroku master and voila, it works!

To see a complete demo of all of this check out https://github.com/peterbe/gitbusy and https://gitbusy.herokuapp.com/

MDN Documents Size Tree Map

November 14, 2019
0 comments MDN, Web development

Recently I've been playing with the content of MDN as a whole. MDN has ~140k documents in its Wiki. About ~70k of them are redirects which is the result of many years of switching tech and switching information architecture and at the same time being good Internet citizens and avoiding 404s. So, out of the ~70k documents, how do they spread? To answer that I wrote a Python script that evaluates size as a matter of the sum of all the files in sub-trees including pictures.

Here are the screenshots:

All locales

All locales

Specifically en-US

Specifically en-US

The code that puts this together uses Toast UI which seems cool but I didn't spend much time worrying about how to use it.

Be warned! Opening this link will make your browser sweat: https://8mw9v.csb.app/

You can fork it here: https://codesandbox.io/s/zen-swirles-8mw9v