hashin 0.14.5 and canonical pip hashes

January 31, 2019
0 comments Python

Prior to version 0.14.5 hashin would write write down the hashes of PyPI packages in the order they appear in PyPI's JSON response. That means there's a slight chance that two distinct clients/computers/humans might actually get different output when then run hashin Django==2.1.5.

The pull request has a pretty hefty explanation as it demonstrates the fix.

Do note that if the existing order of hashes in a requirements file is not in the "right" order, hashin won't correct it unless any of the hashes are different.

Thanks @SomberNight for patiently pushing for this.

How to encrypt a file with Emacs on macOS (ccrypt)

January 29, 2019
0 comments macOS, Linux

Suppose you have a cleartext file that you want to encrypt with a password, here's how you do that with ccrypt on macOS. First:


▶ brew install ccrypt

Now, you have the ccrypt program. Let's test it:

cat secrets.txt
Garage pin: 123456
Favorite kid: bart
Wedding ring order no: 98c4de910X

▶ ccrypt secrets.txt
Enter encryption key: ▉▉▉▉▉▉▉▉▉▉▉
Enter encryption key: (repeat) ▉▉▉▉▉▉▉▉▉▉▉

# Note that the original 'secrets.txt' is replaced 
# with the '.cpt' version.ls | grep secrets
secrets.txt.cpt

▶ less secrets.txt.cpt
"secrets.txt.cpt" may be a binary file.  See it anyway?

There. Now you can back up that file on Dropbox or whatever and not have to worry about anybody being able to open it without your password. To read it again:


▶ ccrypt --decrypt --cat secrets.txt.cpt
Enter decryption key: ▉▉▉▉▉▉▉▉▉▉▉
Garage pin: 123456
Favorite kid: bart
Wedding ring order no: 98c4de910X

▶ ls | grep secrets
secrets.txt.cpt

Or, to edit it you can do these steps:


▶ ccrypt --decrypt secrets.txt.cpt
Enter decryption key: ▉▉▉▉▉▉▉▉▉▉▉


▶ vi secrets.txt

▶ ccrypt secrets.txt
Enter encryption key:
Enter encryption key: (repeat)

Clunky that you have you extract the file and remember to encrypt it back again. That's where you can use emacs. Assuming you have emacs already installed and you have a ~/.emacs file. Add these lines to your ~/.emacs:


(setq auto-mode-alist
 (append '(("\\.cpt$" . sensitive-mode))
               auto-mode-alist))
(add-hook 'sensitive-mode (lambda () (auto-save-mode nil)))
(setq load-path (cons "/usr/local/share/emacs/site-lisp/ccrypt" load-path))
(require 'ps-ccrypt "ps-ccrypt.el")

By the way, how did I know that the load path should be /usr/local/share/emacs/site-lisp/ccrypt? I looked at the output from brew:


▶ brew info ccrypt
ccrypt: stable 1.11 (bottled)
Encrypt and decrypt files and streams
...
==> Caveats
Emacs Lisp files have been installed to:
  /usr/local/share/emacs/site-lisp/ccrypt
...

Anyway, now I can use emacs to open the secrets.txt.cpt file and it will automatically handle the password stuff:

About to open
About to open

Opening
Opening with password

Opened
Opened

This is really convenient. Now you can open an encrypted file, type in your password, and it will take care of encrypting it for you when you're done (saving the file).

Be warned! I'm not an expert at either emacs or encryption so just be careful and if you get nervous take precaution and set aside more time to study this deeper.

variable_cache_control - Django view decorator to set max_age in runtime

January 22, 2019
0 comments Django, Python

tl;dr; If you use the django.views.decorators.cache.cache_control decorator, consider this one instead to change the max_age depending on the request.

I had/have a Django view function that looks something like this:


@cache_control(public=True, max_age=60 * 60)
def home(request, oc=None, page=1):
    ...

But, that number 60 * 60 I really needed it to be different depending on the request parameters. For example, that oc=None, if that's not None I know the page's Cache-Control header can and should be different.

So I wrote this decorator:


from django.utils.cache import patch_cache_control


def variable_cache_control(**kwargs):
    """Same as django.views.decorators.cache.cache_control except this one will
    allow the `max_age` parameter be a callable.
    """

    def _cache_controller(viewfunc):
        @functools.wraps(viewfunc)
        def _cache_controlled(request, *args, **kw):
            response = viewfunc(request, *args, **kw)
            copied = kwargs
            if kwargs.get("max_age") and callable(kwargs["max_age"]):
                max_age = kwargs["max_age"](request, *args, **kw)
                # Can't re-use, have to create a shallow clone.
                copied = dict(kwargs, max_age=max_age)
            patch_cache_control(response, **copied)
            return response

        return _cache_controlled

    return _cache_controller

Now, I can do this instead:


def _best_max_age(req, oc=None, **kwargs):
    max_age = 60 * 60
    if oc:
        max_age *= 10
    return max_age

@variable_cache_control(public=True, max_age=_best_max_age)
def home(request, oc=None, page=1):
    ...

I hope it inspires.

An example of using Immer to handle nested objects in React state

January 18, 2019
1 comment React, JavaScript

When Immer first came out I was confused. I kinda understood what I was reading but I couldn't really see what was so great about it. As always, nothing beats actual code you type yourself to experience how something works.

Here is, I believe, a great example: https://codesandbox.io/s/y2m399pw31

If you're reading this on your mobile it might be hard to see what it does. Basically, it's a very simple React app that displays a "todo list like" thing. The state (aka. this.state.tasks) is a pure JavaScript array. The React components that display the data (e.g. <List tasks={this.state.tasks}/> and <ShowItem item={item} />) are pure (i.e. extends React.PureComponent) meaning React natively protects from re-rendering a component when the props haven't changed. So no wasted render-cycles.

What Immer does is that it helps mutate an object in a smart way. I'm sure you've heard that you're never supposed to mutate state objects (arrays are a form of mutable objects too!) and instead do things like const stuff = Object.assign({}, this.state.stuff); or const things = this.state.things.slice(0);. However, those things are shallow copies meaning any mutable objects within (i.e. nested objects) don't get the clone treatment and can thus cause problems with not re-rendering when they should.

Here's the core gist:


import React from "react";
import produce from "immer";

class App extends React.Component {
  state = {
    tasks: [[false, { text: "Do something", date: new Date() }]]
  };
  onToggleDone = (i, done) => {
    // Immer
    // This is what the blog post is all about...
    const tasks = produce(this.state.tasks, draft => {
      draft[i][0] = done;
      draft[i][1].date = new Date();
    });

    // Pure JS
    // Don't do this!
    // const tasks = this.state.tasks.slice(0);
    // tasks[i][0] = done;
    // tasks[i][1].date = new Date();

    this.setState({ tasks });
  };
  render() {
    // appreviated, but...
    return <List tasks={this.state.tasks}/>
  }
}

class List extends React.PureComponent {
   ...

It just works. Neat!

By the way, here's a code sandbox that accomplishes the same thing but with ImmutableJS which I think is uglier. I think it's uglier because now the rendering components need to be aware that it's rendering immutable.Map objects instead.

Caveats

  1. The cost of doing what immer.produce isn't free. It's some smart work that needs to be done. But the alternative is to deep clone the object which is going to be much slower. Immer isn't the fastest kid on the block but unlike MobX and ImmutableJS once you've done this smart stuff you're back to plain JavaScript objects.

  2. Careful with doing something like console.log(draft) since it will raise a TypeError in your web console. Just be aware of that or use console.log(JSON.stringify(draft)) instead.

  3. If you know with confidence that your mutable object does not, and will not, have nested mutable objects you can use object spread, Object.assign(), or .slice(0) and save yourself the trouble of another dependency.

Use vars() to send an argparse Namespace into a function in Python

January 8, 2019
1 comment Python

I only just learned about this today after all these years and thought you might like it too.

The trick is to conveniently turn an argparse.Namespace into keyword arguments that you can send to a function. This is the old/wrong way I've been doing it for years:


# THE OLD WAY

def main(things, option_a, option_n):
    print(locals())  # Debugging 


import argparse

parser = argparse.ArgumentParser()
parser.add_argument("things", help="Bla bla", nargs="*")
parser.add_argument("-o", "--option-a", help="Bla bla", default="Op A")
parser.add_argument("-n", "--option-n", help="Ble ble", default="Op N")
args = parser.parse_args()

main(
    things=args.things,
    option_a=args.option_a,
    option_n=args.option_n
)

That works but the tedious thing is to have to have spell out every single argument, twice!, when sending the argparse Namespace into the function. Here's the much moar betterest way:


# THE NEW WAY

def main(things, option_a, option_n):
    print(locals())  # Debugging 


import argparse

parser = argparse.ArgumentParser()
parser.add_argument("things", help="Bla bla", nargs="*")
parser.add_argument("-o", "--option-a", help="Bla bla", default="Op A")
parser.add_argument("-n", "--option-n", help="Ble ble", default="Op N")
args = parser.parse_args()

# The only difference and the magic sauce...
main(**vars(args))  

What's neat about this is that you don't have to type up every argument defined in the parser to the get it as arguments into a function. And as a bonus, Python will name match keyword arguments to arguments so the order doesn't matter.

Caveat! This "trick" assumes that the arguments in the parser match the arguments in the function. So if the main() function takes an argument called foo_bar you have to have an argument in the parser called --foo-bar.

Number.prototype.toString() is incredibly useful to display numbers

January 4, 2019
0 comments JavaScript

tl;dr; Use Number.prototype.toString() to display percentages that might be floating point numbers.

10% entered
I started writing a complicated solution but as I discovered corner cases and surprised I was brutally forced to do some research and actually read some documentation. Turns out Number.prototype.toString(), with the precision argument omitted, is the ideal solution.

The application I was working on has an input field to type in a percentage. I.e. a number between 0 and 100. But whatever the user types in, we store the number in decimal. So, if the user typed in "10" into the input widget, we actually store it as 0.1 in the database. Most people will type in a whole number (aka. an integer) like "12" or "5" but some people actually need more precision so they might type in "0.2%" which means 0.002 stored in the backend database.

But the widget is a React controlled component meaning it's value prop needs to be potentially formatted to what gives the best user experience. If the user types in whole numbers set the value prop to a whole number. If the user types in floating point numbers set the value prop type a floating point number with the "matching formatting".

0.12% entered
I started writing an overly complicated function that tries to figure out how many decimal-points the user typed in. For example 0.123 is 3 because parseInt(0.123 * 10 ** 3, 10) === 0.123 * 10 ** 3. But, that approach doesn't work because of floating point arithmetic and the rounding problem. For example 103441 !== 10.3441 * (10 ** 4) === 103440.99999999999. So, don't look for a number to pass into .toFixed().

Turns out Number.prototype.toString() is all you need. If you omit the precision argument, it figures out how many significant digits to use based on the input. It's best explained with some examples:

> (33).toString()
"33"
> (33.3).toString()
"33.3"
> (33.10000).toString()
"33.1"
> (10.3441).toString()
"10.3441"

Perfect!

Next level stuff

So actually, it's a bit more complicated than that. You see, the number stored in the backend database might be 0.007 which you and I know as "0.7%" but be warned:

> 0.008 * 100
0.8
> 0.007 * 100
0.7000000000000001

You know, because of floating-point arithmetic, which every high-level software engineer remembers understanding one time years ago but now know just to watch out for.

So if you use the toString() on that you'd get...

> var backendPercentage = 0.007
> (100 * backendPercentage).toString() + '%'
"0.700000000000001%"

Ouch! So how to solve that? Use Math.round(number * 100) / 100 to get rid of those rounding errors. Apparently, it's very fast too. So, now combine this with the toString():

> var backendPercentage = 0.007
> (Math.round(100 * backendPercentage * 100) / 100).toString() + '%'
"0.7%"

Perfect!

Concurrent download with hashin without --update-all

December 18, 2018
0 comments Web development, Python

Last week, I landed concurrent downloads in hashin. The example was that you do something like...


$ time hashin -r some/requirements.txt --update-all

...and the whole thing takes ~2 seconds even though it that some/requirements.txt file might contain 50 different packages, and thus 50 different PyPI.org lookups.

Just wanted to point out, this is not unique to use with --update-all. It's for any list of packages. And I want to put some better numbers on that so here goes...

Suppose you want to create a requirements file for every package in the current virtualenv you might do it like this:


# the -e filtering removes locally installed packages from git URLs
$ pip freeze | grep -v '-e ' | xargs hashin -r /tmp/reqs.txt

Before running that I injected a little timer on each pypi.org download. It looked like this:


def get_package_data(package, verbose=False):
    url = "https://pypi.org/pypi/%s/json" % package
    if verbose:
        print(url)
+   t0 = time.time()
    content = json.loads(_download(url))
    if "releases" not in content:
        raise PackageError("package JSON is not sane")
+   t1 = time.time()
+   print(t1 - t0)

I also put a print around the call to pre_download_packages(lookup_memory, specs, verbose=verbose) to see what the "total time" was.

The output looked like this:

▶ pip freeze | grep -v '-e ' | xargs python hashin.py -r /tmp/reqs.txt
0.22896194458007812
0.2900810241699219
0.2814369201660156
0.22658205032348633
0.24882292747497559
0.268247127532959
0.29332590103149414
0.23981380462646484
0.2930259704589844
0.29442572593688965
0.25312376022338867
0.34232664108276367
0.49491214752197266
0.23823285102844238
0.3221290111541748
0.28302812576293945
0.567702054977417
0.3089122772216797
0.5273139476776123
0.31477880477905273
0.6202089786529541
0.28571176528930664
0.24558186531066895
0.5810830593109131
0.5219211578369141
0.23252081871032715
0.4650228023529053
0.6127192974090576
0.6000659465789795
0.30976200103759766
0.44440698623657227
0.3135409355163574
0.638585090637207
0.297544002532959
0.6462509632110596
0.45389699935913086
0.34597206115722656
0.3462028503417969
0.6250648498535156
0.44159507751464844
0.5733060836791992
0.6739277839660645
0.6560370922088623
SUM TOTAL TOOK 0.8481268882751465

If you sum up all the individual times it would have become 17.3 seconds. It's 43 individual packages and 8 CPUs multiplied by 5 means it had to wait with some before downloading the rest.

Clearly, this works nicely.

elapsed function in bash to print how long things take

December 12, 2018
0 comments macOS, Linux

I needed this for a project and it has served me pretty well. Let's jump right into it:


# This is elapsed.sh

SECONDS=0

function elapsed()
{
  local T=$SECONDS
  local D=$((T/60/60/24))
  local H=$((T/60/60%24))
  local M=$((T/60%60))
  local S=$((T%60))
  (( $D > 0 )) && printf '%d days ' $D
  (( $H > 0 )) && printf '%d hours ' $H
  (( $M > 0 )) && printf '%d minutes ' $M
  (( $D > 0 || $H > 0 || $M > 0 )) && printf 'and '
  printf '%d seconds\n' $S
}

And here's how you use it:


# Assume elapsed.sh to be in the current working directory
source elapsed.sh

echo "Doing some stuff..."
# Imagine it does something slow that
# takes about 3 seconds to complete.
sleep 3
elapsed

echo "Some quick stuff..."
sleep 1
elapsed

echo "Doing some slow stuff..."
sleep 61
elapsed

The output of running that is:

Doing some stuff...
3 seconds
Some quick stuff...
4 seconds
Doing some slow stuff...
1 minutes and 5 seconds

Basically, if you have a bash script that does a bunch of slow things, it having a like of elapsed there after some blocks of code will print out how long the script has been running.

It's not beautiful but it works.

How I performance test PostgreSQL locally on macOS

December 10, 2018
2 comments Web development, macOS, PostgreSQL

It's weird to do performance analysis of a database you run on your laptop. When testing some app, your local instance probably has 1/1000 the amount of realistic data compared to a production server. Or, you're running a bunch of end-to-end integration tests whose PostgreSQL performance doesn't make sense to measure.

Anyway, if you are doing some performance testing of an app that uses PostgreSQL one great tool to use is pghero. I use it for my side-projects and it gives me such nice insights into slow queries that I'm willing to live with the cost that it is to run it on a production database.

This is more of a brain dump of how I run it locally:

First, you need to edit your postgresql.conf. Even if you used Homebrew to install it, it's not clear where the right config file is. Start psql (on any database) and type this to find out which file is the one:


$ psql kintobench

kintobench=# show config_file;
               config_file
-----------------------------------------
 /usr/local/var/postgres/postgresql.conf
(1 row)

Now, open /usr/local/var/postgres/postgresql.conf and add the following lines:

# Peterbe: From Pghero's configuration help.
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all

Now, to restart the server use:


▶ brew services restart postgresql
Stopping `postgresql`... (might take a while)
==> Successfully stopped `postgresql` (label: homebrew.mxcl.postgresql)
==> Successfully started `postgresql` (label: homebrew.mxcl.postgresql)

The next thing you need is pghero itself and it's easy to run in docker. So to start, you need Docker for mac installed. You also need to know the database URL. Here's how I ran it:

docker run -ti -e DATABASE_URL=postgres://peterbe:@host.docker.internal:5432/kintobench -p 8080:8080 ankane/pghero

Duplicate indexes

Note the trick of peterbe:@host.docker.internal because I don't use a password but inside the Docker container it doesn't know my terminal username. And the host.docker.internal is so the Docker container can reach the PostgreSQL installed on the host.

Once that starts up you can go to http://localhost:8080 in a browser and see a listing of all the cumulatively slowest queries. There are other cool features in pghero too that you can immediately benefit from such as hints about unused/redundent database indices.

Hope it helps!

hashin 0.14.0 with --update-all and a bunch of other features

November 13, 2018
0 comments Python, Linux

If you don't know it is, hashin is a Python command line tool for updating your requirements file's packages and their hashes for use with pip install. It takes the pain out of figuring out what hashes each package on PyPI has. It also takes the pain out of figuring out what version you can upgrade to.

In the 0.14.0 release (changelog) there are a bunch of new features. The most exciting one is --update-all. Let's go through some of the new features:

Update all (--update-all)

Suppose you want to bravely upgrade all the pinned packages to the latest and greatest. Before version 0.14.0 you'd have to manually open the requirements file and list every single package on the command line:


$ less requirements.txt
$ hashin Django requests Flask cryptography black nltk numpy

With --update-all it's the same thing except it does that reading and copy-n-paste for you:


$ hashin --update-all

Particularly nifty is to combine this with --dry-run if you get nervous about that many changes.

Interactive update all (--interactive)

This new flag only makes sense when used together with --update-all. Used together, it basically reads all packages in the requirements file, and for each one that there is a new version it asks you if you want to update it or skip it:

It looks like this:


$ hashin --update-all --interactive
PACKAGE                        YOUR VERSION    NEW VERSION
Django                         2.1.2           2.1.3           ✓
requests                       2.20.0          2.20.1          ✘
numpy                          1.15.2          1.15.4          ?
Update? [Y/n/a/q/?]:

You can also use the aliases hashin -u -i to do the same thing.

Support for "extras"

If you want to have requests[security] or markus[datadog] in your requirements file, hashin used to not support that. This now works:


$ hashin "requests[security]"

Before, it would look for a package called verbatim requests[security] on PyPI which obviously doesn't exist. Now, it parses that syntax, makes a lookup for requests and when it's done it puts the extra syntax back into the requirements file.

Thanks Dustin Ingram for pushing for this one!

Atomic writes

Prior to this version, if you typed hashin requests flask numpy nltkay it would go ahead and do one of those packages at a time and effectively open and edit the requirements file as many times as there are packages mentioned. The crux of that is that if you, for example, have a typo (e.g. nltkay instead of nltk) it would crash there and not roll back any of the other writes. It's not a huge harm but it certainly is counter intuitive.

Another place where this matters is with --dry-run. If you specified something like hashin --dry-run requests flask numpy you would get one diff per package and thus repeat the diff header 3 (excessive) times.

The other reason why atomic writes is important is if you use hashin --update-all --interactive and it asks you if you want to update package1, package2, package3, and then you decide "Nah. I don't want any of this. I quit!" it would just do that without updating the requirements file.

Better not-found errors

This was never a problem if you used Python 2.7 but for Python 3.x, if you typoed a package name you'd get a Python exception about the HTTP call and it wasn't obvious that the mistake lies with your input and not the network. Basically, it traps any HTTP errors and if it's 404 it's handled gracefully.

(Internal) Black everything and pytest everything

All source code is now formatted with Black which, albeit imperfect, kills any boring manual review of code style nits. And, it uses therapist to wrap the black checks and fixes.

And all unit tests are now written for pytest. pytest was already the tool used in TravisCI but now all of those self.assertEqual(foo, bar)s have been replaced with assert foo == bar.