Filtered by JavaScript

Page 7

Reset

findMatchesInText - Find line and column of matches in a text, in JavaScript

June 22, 2020
0 comments Node, JavaScript

I need this function to relate to open-editor which is a Node program that can open your $EDITOR from Node and jump to a specific file, to a specific line, to a specific column.

Here's the code:


function* findMatchesInText(needle, haystack, { inQuotes = false } = {}) {
  const escaped = needle.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  let rex;
  if (inQuotes) {
    rex = new RegExp(`['"](${escaped})['"]`, "g");
  } else {
    rex = new RegExp(`(${escaped})`, "g");
  }
  for (const match of haystack.matchAll(rex)) {
    const left = haystack.slice(0, match.index);
    const line = (left.match(/\n/g) || []).length + 1;
    const lastIndexOf = left.lastIndexOf("\n") + 1;
    const column = match.index - lastIndexOf + 1;
    yield { line, column };
  }
}

And you use it like this:


const text = ` bravo
Abra
cadabra

bravo
`;

console.log(Array.from(findMatchesInText("bra", text)));

Which prints:


[
  { line: 1, column: 2 },
  { line: 2, column: 2 },
  { line: 3, column: 5 },
  { line: 5, column: 1 }
]

The inQuotes option is because a lot of times this function is going to be used for finding the href value in unstructured documents that contain HTML <a> tags.

Benchmark compare Highlight.js vs. Prism

May 19, 2020
0 comments Node, JavaScript

tl;dr; I wanted to see which is fastest, in Node, Highlight.js or Prism. The result is; they're both plenty fast but Prism is 9% faster.

The context is all the thousands of little snippets of CSS, HTML, and JavaScript code on MDN.
I first wrote a script that stored almost 9,000 snippets of code. 60% is Javascript and 22% is CSS and rest is HTML.
The mean snippet size was 400 bytes and the median 300 bytes. All ASCII.

Then I wrote three functions:

  1. f1 - opens the snippet, extracts the payload, and saves it in a different place. This measures the baseline for how long the disk I/O read and the disk I/O write takes.
  2. f2 - same as f1 but uses const html = Prism.highlight(payload, Prism.languages[name], name); before saving.
  3. f3 - same as f1 but uses const html = hljs.highlight(name, payload).value; before saving.

The experiment

You can see the hacky benchmark code here: https://github.com/peterbe/syntax-highlight-node-benchmark/blob/master/index.js

Results

The results are (after running each 12 times each):

f1 0.947s   fastest
f2 1.361s   43.6% slower
f3 1.494s   57.7% slower

Memory

In terms of memory usage, Prism maxes heap memory at 60MB (the f1 baseline was 18MB), and Highlight.js maxes heap memory at 60MB too.

Disk space in HTML

Each library produces different HTML. Examples:

Prism


<span class="token selector">.item::after</span> <span class="token punctuation">{</span>
    <span class="token property">content</span><span class="token punctuation">:</span> <span class="token string">"This is my content."</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

Highlight.js


<span class="hljs-selector-class">.item</span><span class="hljs-selector-pseudo">::after</span> {
    <span class="hljs-attribute">content</span>: <span class="hljs-string">"This is my content."</span>;
}

Yes, not only does it mean they look different, they use up a different amount of disk space when saved. That matters for web performance and also has an impact on build resources.

  • f1 - baseline "HTML" files amounts to 11.9MB (across 3,025 files)
  • f2 - Prism: 17.6MB
  • f3 - Highlight.js: 13.6MB

Conclusion

Prism is plenty fast for Node. If you're already using Prism, don't worry about having to switch to Highlight.js for added performance.

RAM memory consumption is about the same.

Final HTML from Prism is 30% larger than Highlight.js but when the rendered HTML is included in a full HTML page, the HTML compresses very well because of all the repetition so this is not a good comparison. Or rather, not a lot to worry about.

Well, speed is just one dimension. The features differ too. MDN already uses Prism but does so in the browser. The ultimate context for this blog post is; the speed if we were to do all the syntax highlighting in the server as a build step.

Throw JavaScript errors with extra information

May 12, 2020
0 comments Node, JavaScript

Did you know, if you can create your own new Error instance and attach your own custom properties on that? This can come in very handy when you, from the caller, want to get more structured information from the error without relying on the error message.


// WRONG ⛔️

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      throw new Error(`Failed at ${i}`);
    }
  }
} catch (err) {
  const iteration = parseInt(err.toString().match(/Failed at (\d+)/)[1]);
  console.warn(`Made it to ${iteration}`);
}

// RIGHT ✅

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
  }
} catch (err) {
  const iteration = err.iteration;
  console.warn(`Made it to ${iteration}`);
}

The above examples are obviously a bit contrived but you have to imagine that whatever code can throw an error might be "far away" from where you deal with errors thrown. For example, imagine you start off a build and you want to get extra information about what the context was. In Python, you use exception classes as a form of natural filtering but JavaScript doesn't have that. Using custom error properties can be a great tool to separate unexpected errors from expected errors.

Bonus - Checking for the custom property

Imagine this refactoring:


try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
    if (Math.random() < 0.001) {
      throw new Error("something else is wrong");
    }
  }
} catch (err) {
  const iteration = err.iteration;
  console.warn(`Made it to ${iteration}`);
}

With that code it's very possible you'd get Made it to undefined. So here's how you'd make the distinction:


try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
    if (Math.random() < 0.001) {
      throw new Error("something else is wrong");
    }
  }
} catch (err) {
  if (err.hasOwnProperty("iteration")) {
    const iteration = err.iteration;
    console.warn(`Made it to ${iteration}`);
  } else {
    throw err;
  }
}

```

How to use minimalcss without a server

April 24, 2020
0 comments Web development, Node, JavaScript

minimalcss requires that you have your HTML in a serving HTTP web page so that puppeteer can open it to find out the CSS within. Suppose, in your build system, you don't yet really have a server. Well, what you can do is start one on-the-fly and shut it down as soon as you're done.

Suppose you have .html file

First install all the stuff:

yarn add minimalcss http-server

Then run it:


const path = require("path");

const minimalcss = require("minimalcss");
const httpServer = require("http-server");

const HTML_FILE = "index.html";  // THIS IS YOURS

(async () => {
  const server = httpServer.createServer({
    root: path.dirname(path.resolve(HTML_FILE)),
  });
  server.listen(8080);

  let result;
  try {
    result = await minimalcss.minimize({
      urls: ["http://0.0.0.0:8080/" + HTML_FILE],
    });
  } catch (err) {
    console.error(err);
    throw err;
  } finally {
    server.close();
  }

  console.log(result.finalCss);
})();

And the index.html file:


<!DOCTYPE html>
<html>
    <head>
        <link rel="stylesheet" href="styles.css">
    </head>
    <body>
        <p>Hi @peterbe</p>
    </body>
</html>

And the styles.css file:


h1 {
  color: red;
}
p,
li {
  font-weight: bold;
}

And the output from running that Node script:

p{font-weight:700}

It works!

Suppose all you have is the HTML string and the CSS blob(s)

Suppose all you have is a string of HTML and a list of strings of CSS:


const fs = require("fs");
const path = require("path");

const minimalcss = require("minimalcss");
const httpServer = require("http-server");

const HTML_BODY = `
<p>Hi Peterbe</p>
`;

const CSSes = [
  `
h1 {
  color: red;
}
p,
li {
  font-weight: bold;
}
`,
];

(async () => {
  const csses = CSSes.map((css, i) => {
    fs.writeFileSync(`${i}.css`, css);
    return `<link rel="stylesheet" href="${i}.css">`;
  });
  const html = `<!doctype html><html>
  <head>${csses}</head>
  <body>${HTML_BODY}</body>
  </html>`;
  const fp = path.resolve("./index.html");
  fs.writeFileSync(fp, html);
  const server = httpServer.createServer({
    root: path.dirname(fp),
  });
  server.listen(8080);

  let result;
  try {
    result = await minimalcss.minimize({
      urls: ["http://0.0.0.0:8080/" + path.basename(fp)],
    });
  } catch (err) {
    console.error(err);
    throw err;
  } finally {
    server.close();
    fs.unlinkSync(fp);
    CSSes.forEach((_, i) => fs.unlinkSync(`${i}.css`));
  }

  console.log(result.finalCss);
})();

Truth be told, you'll need a good pinch of salt to appreciate that example code. It works but most likely, if you're into web performance so much that you're even doing this, your parameters are likely to be more complex.

Suppose you have your own puppeteer instance

In the first example above, minimalcss will create an instance of puppeteer (e.g. const browser = await puppeteer.launch()) but that means you have less control over which version of puppeteer or which parameters you need. Also, if you have to run minimalcss on a bunch of pages it's costly to have to create and destroy puppeteer browser instances repeatedly.

To modify the original example, here's how you use your own instance of puppeteer:

  const path = require("path");

+ const puppeteer = require("puppeteer");
  const minimalcss = require("minimalcss");
  const httpServer = require("http-server");

  const HTML_FILE = "index.html"; // THIS IS YOURS

  (async () => {
    const server = httpServer.createServer({
      root: path.dirname(path.resolve(HTML_FILE)),
    });
    server.listen(8080);

+   const browser = await puppeteer.launch(/* your special options */);
+
    let result;
    try {
      result = await minimalcss.minimize({
        urls: ["http://0.0.0.0:8080/" + HTML_FILE],
+       browser,
      });
    } catch (err) {
      console.error(err);
      throw err;
    } finally {
+     await browser.close();
      server.close();
    }

    console.log(result.finalCss);
  })();

Note that this doesn't buy us anything in this particular example. But that's where your imagination comes in!

Conclusion

You can see the code here as a git repo if that helps.

The point is that this might solve some of the chicken-and-egg problem you might have is that you're building your perfect HTML + CSS and you want to perfect it before you ship it.

Note also that there are other ways to run minimalcss other than programmatically. For example, minimalcss-server is minimalcss wrapped in a express server.

Another thing that you might have is that you have multiple .html files that you want to process. The same technique applies but you just need to turn it into a loop and make sure you call server.close() (and optionally await browser.close()) when you know you've processed the last file. Exercise left to the reader?

How post JSON with curl to an Express app

April 15, 2020
2 comments Node, JavaScript

tl;dr; No need install or require body-parser and it's important to send the right content-type header.

I know Express has great documentation but I'm still confused about how to receive JSON and/or how to test it from curl. A great deal of confusion comes from the fact that, I think, body-parser used to be a third-party library you had to install yourself to add it to your Express app. You don't. It now gets installed by installing express. E.g.

▶ yarn init -y
▶ yarn add express
▶ ls node_modules/body-parser
HISTORY.md   LICENSE      README.md    index.js     lib          package.json

Let's work backward. This is how you set up the Express handler:


const express = require("express");  // v4.17.x as of Apr 2020
const app = express();

app.use(express.json());

app.post("/echo", (req, res) => {
  res.json(req.body);
}); 

app.listen(5000);

And, this is how you test it:

▶ curl -XPOST -d '{"foo": "bar"}' -H 'content-type: application/json' localhost:5000/echo
{"foo":"bar"}%

That's it. No need to require("body-parser") or anything like that. And make sure you're sending the content-type: application/json in the curl command.

Things that can go wrong

I kept fumbling around on StackOverflow questions and rummaging the Express documentation until I figured out what mistake I kept doing. So, here's a variant of the handler above, but much more verbose:


app.post("/echo", (req, res) => {

  if (req.body === undefined) {
    throw new Error("express.json middleware not installed");
  }
  if (!Object.keys(req.body).length) {
    // E.g curl -v -XPOST http://localhost:5000/echo
    if (!req.get("content-Type")) {
      return res.status(400).send("no content-type header\n");
    }
    // E.g. curl -v -XPOST -d '{"foo": "bar"}' http://localhost:5000/echo
    if (!req.get("content-Type").includes("application/json")) {
      return res.status(400).send("content-type not application/json\n");
    }
    // E.g. curl -XPOST -H 'content-type:application/json' http://localhost:5000/echo
    return res.status(400).send("no data payload included\n");
  }

  // At this point 'req.body' is *something*.
  // For example, you might want to `console.log(req.body.foo)`
  res.json(req.body);
}); 

How you treat these things is up to you. For example, an empty JSON data might be OK in your application.
I.e. perhaps curl -XPOST -d '{}' -H 'content-type:application/json' http://localhost:5000/echo might be fine.

An important option

express.json() is a piece of middleware. By default, it has a simple mechanism for bothering to do put .body into the request object. The default configuration is as if you'd typed:


app.use(express.json({
  type: 'application/json',
}));

(it's actually a bit more complicated than that)

If you're confident that you'll always be sending JSON to this handler, and you don't want to have to force clients to have to specify the application/json Content-Type you can change this to:

app.use(express.json({
  type: '*/*',
}));

Now you'll find that curl -XPOST -d '{"foo": "bar3"}' localhost:5000/ will work fine.

Instead of curl, let's fetch

This code works the same with node-fetch or browser Fetch API.


fetch("http://localhost:5000/echo", {
  method: "post",
  body: JSON.stringify({ foo: "bar" }),
  headers: { "Content-Type": "application/json" },
})
  .then((res) => res.json())
  .then((json) => console.log(json));

Performance of truth checking a JavaScript object

February 3, 2020
0 comments Node, JavaScript

I'm working on a Node project that involves large transformations of large sets of data here and there. For example:


if (!Object.keys(this.allTitles).length) {
  ...

In my case, that this.allTitles is a plain object with about 30,000 key/value pairs. That particular line of code actually only runs 1 single time so if it's hundreds of milliseconds, it's really doesn't matter that much. However, that's not a guarantee! What if you had something like this:


for (const thing of things) {
  if (!Object.keys(someObj).length) {
    // mutate someObj
  }
}

then, you'd potentially have a performance degradation once someObj becomes considerably large. And it gets particularly degraded if the length of things is considerably large as it would do the operation many times.

Actually, consider this:


const obj = {};
[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

On my macBook with Node 13.5, this outputs:

Truthcheck obj: 260.564ms

Maps

The MDN page on Map has a nice comparison, in terms of performance, between Map and regular object. Consider this super simple benchmark:


const obj = {};
const map = new Map();

[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
  map.set(i, i);
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

console.time("Truthcheck map");
[...Array(100)].forEach((_, i) => {
  return !!map.size;
});
console.timeEnd("Truthcheck map");

So, fill a Map instance and a plain object with 30,000 keys and values. Then, for each in turn, check if the thing is truthy 100 times. The output I get:

Truthcheck obj: 235.017ms
Truthcheck map: 0.029ms

That's not unexpected. The map instance maintains a size counter, which increments on .set (if the key is new), so doing that "truthy" check just takes O(1) seconds.

Conclusion

Don't run to rewrite everything to Maps!

In fact, I took the above mentioned little benchmark and changed the times to be a 3,000 item map and obj (instead of 30,000) and only did 10 iterations (instead of 100) and then the numbers are:

Truthcheck obj: 0.991ms
Truthcheck map: 0.044ms

These kinds of small numbers are very unlikely to matter in the scope of other things going on.

Anyway, consider using Map if you fear that you might be working with really reeeeally large mappings.

JavaScript destructuring like Python kwargs with defaults

January 18, 2020
1 comment Python, JavaScript

In Python

I'm sure it's been blogged about a buncha times before but, I couldn't find it, and I had to search too hard to find an example of this. Basically, what I'm trying to do is what Python does in this case, but in JavaScript:


def do_something(arg="notset", **kwargs):
    print(f"arg='{arg.upper()}'")

do_something(arg="peter")
do_something(something="else")
do_something()

In Python, the output of all this is:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

It could also have been implemented in a more verbose way:


def do_something(**kwargs):
    arg = kwargs.get("arg", "notset")
    print(f"arg='{arg.upper()}'")

This more verbose format has the disadvantage that you can't quickly skim it and see and what the default is. That thing (arg = kwargs.get("arg", "notset")) might happen far away deeper in the function, making it hard work to spot the default.

In JavaScript

Here's the equivalent in JavaScript (ES6?):


function doSomething({ arg = "notset", ...kwargs } = {}) {
  return `arg='${arg.toUpperCase()}'`;
}

console.log(doSomething({ arg: "peter" }));
console.log(doSomething({ something: "else" }));
console.log(doSomething());

Same output as in Python:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

Notes

I'm still not convinced I like this syntax. It feels a bit too "hip" and too one-liner'y. But it's also pretty useful.

Mind you, the examples here are contrived because they're so short in terms of the number of arguments used in the function.
A more realistic thing like be a function that lists, upfront, all the possible parameters and for some of them, it wants to point out some defaults. E.g.


function processFolder({
  source,
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder({ source: "/user", quiet: true }));

One could maybe argue that arguments that don't have a default are expected to always be supplied so they can be regular arguments like:


function processFolder(source, {
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder("/user", { quiet: true }));

But, I quite like keeping all arguments in an object. It makes it easier to write wrapper functions and I find this:


setProfile(
  "My biography here",
  false,
  193.5,
  230,
  ["anders", "bengt"],
  "South Carolina"
);

...harder to read than...


setProfile({
  bio: "My biography here",
  dead: false,
  height: 193.5,
  weight: 230,
  middlenames: ["anders", "bengt"],
  state: "South Carolina"
});

How depend on a local Node package without npmjs.com

January 15, 2020
0 comments JavaScript

Suppose that you're working on ~/dev/my-cool-project and inside ~/dev/my-cool-project/package.json you might have something like this:

"dependencies": {
     "that-cool-lib": "1.2.3",
     ...

But that that-cool-lib is one of your own projects. You're also working on that project and it's over at ~/dev/that-cool-lib. Within that-cool-lib you might be in a git branch or perhaps you're preparing a 2.0.0 release.

Now you're interested if that-cool-lib@2.0.0 is going to work here inside my-cool-project.

What you could do

First, you release this fancy that-cool-lib@2.0.0 to npmjs.com with that project's npm publish procedure. Then as soon as that's done and you can see that the release made it onto https://www.npmjs.com/package/that-cool-lib/v/2.0.0.

Then you go over to my-cool-project and start a new git branch to try the upgrade, npm install that-cool-project@2.0.0 --save so you have this:

"dependencies": {
-    "that-cool-lib": "1.2.3",
+    "that-cool-lib": "2.0.0",
     ...

Now you can try it that new version of my-cool-project and if that-cool-lib had any of its own entry point executables or post/pre install steps, they'd be fully resolved.

What you should do

Instead, use install-local. Don't use npm link because it might not install entry point executables and I also don't like the fact that I need to go into that-cool-lib and install it (globally?) first (when you do cd that-cool-lib && npm link). Also, see "What's wrong with npm-link?".

Here's how you do it:

npx install-local ~/dev/that-cool-lib

and it acts pretty much exactly as if you had gotten it from npmjs.com the normal way.

Notes

I almost never use npm these days. Go yarn! So, perhaps I've misinterpreted something.

Also, I try my very hardest to never use npm install -g ... (or yarn global ... for that matter) now that we have npx. Perhaps if you'd install it locally it'd speed up the use of local-install by 1-3 seconds each time you run this. Again, my skillset of modern npm is fading so I don't think I understand why it takes me 14 seconds the first time I run npx install that-cool-lib and then it takes 14 seconds again when I run the exact same command again. Does it not benefit from any caching? How much of that time is spent on npmjs.com resolving other sub-dependencies that that-cool-lib requires?

Hopefully, this helps other people stuck in a similar boat.

How to split a block of HTML with Cheerio in NodeJS

January 3, 2020
2 comments Node, JavaScript

cheerio is a great Node library for processing HTML. It's faster than JSDOM and years and years of jQuery usage makes the API feel yummily familiar.

What if you have a piece of HTML that you want to split up into multiple blocks? For example, you have this:


<div>Prelude</div>

<h2>First Header</h2>

<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>

<h2 id="second">Second Header</h2>

<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>

and you want to get this split by the <h2> tags so you end up with 3 (in this example) distinct blocks of HTML, like this:

first one


<div>Prelude</div>

second one


<h2>First Header</h2>

<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>

third one


<h2 id="second">Second Header</h2>

<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>

You could try to cast the regex spell on that and try to, I don't know, split the string by the </h2>. But it's risky and error prone because (although a bit unlikely in this simple example) get caught up in <h2>...</h2> tags that are nested inside something else. Also, proper parsing almost always wins in the long run over regexes.

Use cheerio

This is how I solved it and hopefully A) you can copy and benefit, or B) someone tells me there's already a much better way.

What you do is walk the DOM root nodes, one by one, and keep filling a buffer and then yield individual new cheerio instances.


const html = `
<div>Prelude</div>

<h2>First Header</h2>
<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>
<!-- comment -->

<h2 id="second">Second Header</h2>
<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>
`;

// load the raw HTML
// it needs to all be wrapped in *one* big wrapper
const $ = cheerio.load(`<div id="_body">${html}</div>`);

// the end goal
const blocks = [];

// the buffer
const section = cheerio
  .load("<div></div>", { decodeEntities: false })("div")
  .eq(0);

const iterable = [...$("#_body")[0].childNodes];
let c = 0;
iterable.forEach(child => {
  if (child.tagName === "h2") {
    if (c) {
      blocks.push(section.clone());
      section.empty();
      c = 0; // reset the counter
    }
  }
  c++;
  section.append(child);
});
if (c) {
  // stragglers
  blocks.push(section.clone());
}

// Test the result
const blocksAsStrings = blocks.map(block => block.html());
console.log(blocksAsStrings.length);
// 3
console.log(blocksAsStrings);
// [
//   '\n<div>Prelude</div>\n\n',
//   '<h2>First Header</h2>\n' +
//     '<p>Paragraph <b>here</b>.</p>\n' +
//     '<p>Another paragraph.</p>\n' +
//     '<!-- comment -->\n' +
//     '\n',
//   '<h2 id="second">Second Header</h2>\n' +
//     '<ul>\n' +
//     '  <li>One</li>\n' +
//     '  <li>Two</li>\n' +
//     '</ul>\n' +
//     '<blockquote>End quote!</blockquote>\n'
// ]

In this particular implementation the choice of splitting is by the every h2 tag. If you want to split by anything else, go ahead and adjust the conditional there where it's currently doing if (child.tagName === "h2") {.

Also, what you do with the blocks is up to you. Perhaps you need them as strings, then you use the blocks.map(block => block.html()). Otherwise, if it serves your needs they can remain as individual cheerio instances that you can do whatever with.

A Python and Preact app deployed on Heroku

December 13, 2019
2 comments Web development, Django, Python, Docker, JavaScript

Heroku is great but it's sometimes painful when your app isn't just in one single language. What I have is a project where the backend is Python (Django) and the frontend is JavaScript (Preact). The folder structure looks like this:

/
  - README.md
  - manage.py
  - requirements.txt
  - my_django_app/
     - settings.py
     - asgi.py
     - api/
        - urls.py
        - views.py
  - frontend/
     - package.json
     - yarn.lock
     - preact.config.js
     - build/
        ...
     - src/
        ...

A bunch of things omitted for brevity but people familiar with Django and preact-cli/create-create-app should be familiar.
The point is that the root is a Python app and the front-end is exclusively inside a sub folder.

When you do local development, you start two servers:

  • ./manage.py runserver - starts http://localhost:8000
  • cd frontend && yarn start - starts http://localhost:3000

The latter is what you open in your browser. That preact app will do things like:


const response = await fetch('/api/search');

and, in preact.config.js I have this:


export default (config, env, helpers) => {

  if (config.devServer) {
    config.devServer.proxy = [
      {
        path: "/api/**",
        target: "http://localhost:8000"
      }
    ];
  }

};

...which is hopefully self-explanatory. So, calls like GET http://localhost:3000/api/search actually goes to http://localhost:8000/api/search.

That's when doing development. The interesting thing is going into production.

Before we get into Heroku, let's first "merge" the two systems into one and the trick used is Whitenoise. Basically, Django's web server will be responsibly not only for things like /api/search but also static assets such as / --> frontend/build/index.html and /bundle.17ae4.js --> frontend/build/bundle.17ae4.js.

This is basically all you need in settings.py to make that happen:


MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "whitenoise.middleware.WhiteNoiseMiddleware",
    ...
]

WHITENOISE_INDEX_FILE = True

STATIC_URL = "/"
STATIC_ROOT = BASE_DIR / "frontend" / "build"

However, this isn't quite enough because the preact app uses preact-router which uses pushState() and other code-splitting magic so you might have a URL, that users see, like this: https://myapp.example.com/that/thing/special and there's nothing about that in any of the Django urls.py files. Nor is there any file called frontend/build/that/thing/special/index.html or something like that.
So for URLs like that, we have to take a gamble on the Django side and basically hope that the preact-router config knows how to deal with it. So, to make that happen with Whitenoise we need to write a custom middleware that looks like this:


from whitenoise.middleware import WhiteNoiseMiddleware


class CustomWhiteNoiseMiddleware(WhiteNoiseMiddleware):
    def process_request(self, request):
        if self.autorefresh:
            static_file = self.find_file(request.path_info)
        else:
            static_file = self.files.get(request.path_info)

            # These two lines is the magic.
            # Basically, the URL didn't lead to a file (e.g. `/manifest.json`)
            # it's either a API path or it's a custom browser path that only
            # makes sense within preact-router. If that's the case, we just don't
            # know but we'll give the client-side preact-router code the benefit
            # of the doubt and let it through.
            if not static_file and not request.path_info.startswith("/api"):
                static_file = self.files.get("/")

        if static_file is not None:
            return self.serve(static_file, request)

And in settings.py this change:


MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
-   "whitenoise.middleware.WhiteNoiseMiddleware",
+   "my_django_app.middleware.CustomWhiteNoiseMiddleware",
    ...
]

Now, all traffic goes through Django. Regular Django view functions, static assets, and everything else fall back to frontend/build/index.html.

Heroku

Heroku tries to make everything so simple for you. You basically, create the app (via the cli or the Heroku web app) and when you're ready you just do git push heroku master. However that won't be enough because there's more to this than Python.

Unfortunately, I didn't take notes of my hair-pulling excruciating journey of trying to add buildpacks and hacks and Procfiles and custom buildpacks. Nothing seemed to work. Perhaps the answer was somewhere in this issue: "Support running an app from a subdirectory" but I just couldn't figure it out. I still find buildpacks confusing when it's beyond Hello World. Also, I didn't want to run Node as a service, I just wanted it as part of the "build process".

Docker to the rescue

Finally I get a chance to try "Deploying with Docker" in Heroku which is a relatively new feature. And the only thing that scared me was that now I need to write a heroku.yml file which was confusing because all I had was a Dockerfile. We'll get back to that in a minute!

So here's how I made a Dockerfile that mixes Python and Node:


FROM node:12 as frontend

COPY . /app
WORKDIR /app
RUN cd frontend && yarn install && yarn build


FROM python:3.8-slim

WORKDIR /app

RUN groupadd --gid 10001 app && useradd -g app --uid 10001 --shell /usr/sbin/nologin app
RUN chown app:app /tmp

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    gcc apt-transport-https python-dev

# Gotta try moving this to poetry instead!
COPY ./requirements.txt /app/requirements.txt
RUN pip install --upgrade --no-cache-dir -r requirements.txt

COPY . /app
COPY --from=frontend /app/frontend/build /app/frontend/build

USER app

ENV PORT=8000
EXPOSE $PORT

CMD uvicorn gitbusy.asgi:application --host 0.0.0.0 --port $PORT

If you're not familiar with it, the critical trick is on the first line where it builds some Node with as frontend. That gives me a thing I can then copy from into the Python image with COPY --from=frontend /app/frontend/build /app/frontend/build.

Now, at the very end, it starts a uvicorn server with all the static .js, index.html, and favicon.ico etc. available to uvicorn which ultimately runs whitenoise.

To run and build:

docker build . -t my_app
docker run -t -i --rm --env-file .env -p 8000:8000 my_app

Now, opening http://localhost:8000/ is a production grade app that mixes Python (runtime) and JavaScript (static).

Heroku + Docker

Heroku says to create a heroku.yml file and that makes sense but what didn't make sense is why I would add cmd line in there when it's already in the Dockerfile. The solution is simple: omit it. Here's what my final heroku.yml file looks like:


build:
  docker:
    web: Dockerfile

Check in the heroku.yml file and git push heroku master and voila, it works!

To see a complete demo of all of this check out https://github.com/peterbe/gitbusy and https://gitbusy.herokuapp.com/