How I installed letsencrypt for Nginx

January 26, 2016
0 comments Linux, Web development

I have no problems admitting that I'm always finding SSL and certs and stuff like that confusing. And Let's Encrypt is no exception. However, with Let's Encrypt, apparently, all you need to do is download their software and run a command to get a couple of certificate files. No websites or forms to fill in. No need to create a .csr file. How hard can it be? After skimming some documentation and other blog posts I dug in. Turns out, it was quite doable.

To install it, I ran:

# pwd
/root
# git clone https://github.com/letsencrypt/letsencrypt
# cd letsencrypt
# pip install cryptography
# ./letsencrypt-auto

The reason I had to manually pip install cryptography was because the installer in ./letsencrypt-auto failed the first time.

Now it should be installed. To create the cert you have to temporarily stop Nginx. But I had to be quick because I don't want it to be down for long:

# /etc/init.d/nginx stop
# ./letsencrypt-auto certonly --standalone -d autocompeter.com
# /etc/init.d/nginx start

The first time I ran this I got Error: urn:acme:error:badNonce :: The client sent an unacceptable anti-replay nonce :: JWS has invalid anti-replay nonce which, according to this discussion is easy to bypass; simply try again. So I tried again, and the second time it worked.

This time it worked! Now I have 4 new files:

# ls -l /etc/letsencrypt/live/autocompeter.com/
total 0
lrwxrwxrwx 1 root root 32 Jan 25 08:04 cert.pem -> ../../archive/autocompeter.com/cert1.pem
lrwxrwxrwx 1 root root 33 Jan 25 08:04 chain.pem -> ../../archive/autocompeter.com/chain1.pem
lrwxrwxrwx 1 root root 37 Jan 25 08:04 fullchain.pem -> ../../archive/autocompeter.com/fullchain1.pem
lrwxrwxrwx 1 root root 35 Jan 25 08:04 privkey.pem -> ../../archive/autocompeter.com/privkey1.pem

Now add these lines to the Nginx config for that site:

listen 443;

ssl on;
ssl_certificate /etc/letsencrypt/live/autocompeter.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/autocompeter.com/privkey.pem;
ssl_session_timeout 5m;
ssl_session_cache shared:SSL:50m;

The new cert I just created expires in about 2 months. I created an entry in my calendar with an alert. I think I just need to run:

# /etc/init.d/nginx stop
# ./letsencrypt-auto certonly --standalone -d autocompeter.com
# /etc/init.d/nginx start

Best Atom packages of 2015

January 22, 2016
7 comments Web development, macOS

tl;dr last-cursor-position, advanced-open-file and highlight-line

Sorry, for the sensationalist headline on this blog post. Almost all of Atom, including the core functionality, is based on packages. For example, the autocomplete thing that pops up whilst you're typing is a package with its own git repo and README. However, it's not a community package. Let's focus on those instead.

Number 1

last-cursor-position

If you're in the midst of typing and for some reason you need to scroll somewhere else in the code to type something or to select to copy to the clipboard, how do you get back? You can either memorize which line you were on. Or you can split the windows so that when you're done, elsewhere, you just kill the newly created split-window. Or; you install last-cursor-position.

At any time you can press alt-- (that's alt and the minus character) and it'll go back to where the cursor was last.

It works across open tabs too. So if you switch tabs to edit index.html and want to go back to that app.py you were working on you can alt-- yourself back there. And suppose that you want to go back to index.html again, you hit shift-alt--.

Number 2

advanced-open-file

This was written by a friend of mine called Michael "Osmose" Kelly and this was his first package he wrote. It's apparently very popular and Michael's most popular Open Source project to date.

What it does is introduce a command-line looking prompt for opening files. By default, you start it with Ctrl-x Ctrl-f which is the Emacs command for opening files/buffers.

Don't get me wrong, I love using Cmd-t to fuzzy-find files and that's awesome too, but sometimes when you have eleventeen files called models.py and you want the one in the "current directory" it's much easier to just go directly to that file. I type Cmd-x Cmd-f m [TAB] [ENTER] and I'm there. Had I typed m on the fuzzy-finder it would certainly have yielded too many files.

Another really really useful thing about this package is that I can easily go to any other file outside the current directory. Suppose my Atom window is rooted in ~/dev/PYTHON/premailer/ and I want to open /tmp/hack.js I easily can, thanks to this package without reaching for the mouse.

Number 3

highlight-line

The name well describes what it does. But why do I need it? The answer is simple; it's when I jump around. When I'm in the midst of typing a function or snippet or something I don't need to know which line I'm on because things are settled. No, it's when I go somewhere else, for example using the last-cursor-position package, then it's hard to see where the cursor is. Especially relevant when you have a big screen with high resolution.

Why isn't this a core package?!

In Summary

I bet I've forgotten some package that I love and use every day that isn't a core package. If so, it's probably something subtle or something that I almost take for granted. For example, who doesn't use react or atom-beautify?! Also, those packages are already so popular they don't need a blog post to raise their attention and fame :)

What was your favorites that you like so much that they just need to be highlighted? Leave a comment or discuss here.

Advanced Closure Compiler vs UglifyJS2

January 20, 2016
12 comments JavaScript

A couple of years ago I wrote a blog post titled "Comparing Google Closure with UglifyJS". It concluded that Closure Compiler compressed files down to 45.6% of the original size. And UglifyJS only 51.5%. But UglifyJS was 1220% faster so I concluded that I'm going to stick to UglifyJS.

But things have changed since 2011. UglifyJS2 came out and stealthy replaced the original implementation (npm install uglify-js) and it has a --mangle option. Also, in the original experimental blog post I didn't use -O advanced when using Closure Compiler.

So I whip up a quick script to compare the two. Here's some of the output:

Truncated! Read the rest by clicking the link below.

Select all relations in PostgreSQL

December 10, 2015
1 comment PostgreSQL

tl;dr Start psql with -E or --echo-hidden

I wanted to find out EVERYTHING that's related to a specific topic. Tables, views, stored procedures etc.
One way of doing that is to go into psql and type \d and/or \df and look through that list. But that's unpractical if it gets large and I might want to get it out stdout instead so I can grep and grep -v.

There are lots of Stackoverflow questions about how to SQL select all tables but I want it all. The solution is to start psql with -E or --echo-hidden. When you do that, it prints out what SQL it used to generate the output for you there. You can then copy that and do whatever you want to do with it. For example:

peterbecom=# \d
********* QUERY **********
SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','v','m','S','f','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************

With this I was able to come up with this SQL select to get all tables, views, sequences and functions.


SELECT
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table'
  WHEN 'v' THEN 'view'
  WHEN 'm' THEN 'materialized view'
  WHEN 'i' THEN 'index'
  WHEN 'S' THEN 'sequence'
  WHEN 's' THEN 'special'
  WHEN 'f' THEN 'foreign table' END as "Type"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','v','m','S','f','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid);


SELECT
  p.proname as "Name",
  'function'
FROM pg_catalog.pg_proc p
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
WHERE pg_catalog.pg_function_is_visible(p.oid)
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema';

A usecase of this is that I put those two SQL selects in a file and now I can grep:

$ psql mydatabase < everything.sql | grep -i crap

Headsupper.io

December 5, 2015
0 comments Python, Web development, Django, JavaScript, React

tl;dr

Headsupper.io is a free GitHub webhook service that emails people when commits have the configurable keyword "headsup" in it.

Introduction

Headsupper.io is great for when you have a GitHub project with multiple people working on it and when you make a commit you want to notify other people by email.

Basically, you set up a GitHub Webhook, on pushes, to push to https://headsupper.io and then it'll parse the incoming push and its commits and look for certain things in the commit message. By default, it'll look for the word "headsup". For example, a git commit message might look like this:

fixes #123 - more juice in the Saab headsup! will require updating

Or you can use the multi-line approach where the first line is short and sweat and after the break a bit more elaborate:

bug 1234567 - tea kettle upgrade 2.1

Headsup: Next time you git pull from master, remember to run 
peep install on the requirements.txt file since this commit 
introduces a bunch of crazt dependency changes.

Git commits that come through that don't have any match on this word will simply be ignored by Headsupper.

How you use it

Maybe paradoxically, you need to authenticate with your GitHub account but that's in read-only mode and does NOT set up the Webhook for you. The reason you have to authenticate to prepare a configuration on headsupper.io is to tie the configuration to a real user.

Once you've authenticated you get the option to create your first configuration, then you have to enter at least these three piece of information:

  1. The GitHub "full name". This is the org name, slash, repo name. E.g. peterbe/django-peterbecom or mozilla/socorro.
  2. Pick a secret. Remember what you typed, because you'll need to type in this same secret when you set up the Webhook on your GitHub project's Webhooks page. (This is used to checksum and verify the source of the Webhook push)
  3. Who to send to. A list of email addresses separated with a newline or a semi-colon.

Once you've set that up, you'll need to go to your GitHub project's Setting page and enter a new Webhook and the URL you need to type in is https://headsupper.io and for the "Secret" type in that secret you used earlier. That's it!

Rules and options

The word that triggers is configurable by you. The default is headsupper. And by default, it's case insensitive. You can change that so it's case sensitive. Also, the word has to be word delimited on the left (e.g. a space or a newline character) and on the right it needs to be a space, a : or a !. So this won't match: theheadsup: or headsupper.

Other optional things you can configure are:

  • Which git branch to trigger on (by default it's master)
  • Which emails to CC when it sends
  • Which emails to BCC when it sends
  • Only send when you make a tag

That last option, Only send when a new tag is created, is interesting. I added that option because at work, we make production server releases by pushing a git tag. When a tag is pushed, all those commits are sent to the continuous deployment service which makes a server upgrade. This means you get a chance to enter a heads up message to be emailed to the people who care about new deployments going out.

How it was built

It's a mix between Django and ReactJS. The whole client-side app it built statically with Webpack in ES6. It's served as static files through Nginx. But Nginx is making an exception on all URLs that start with /api or /accounts. The /api/* it used for loading and setting JSON. The /accounts/* is used for the GitHub OAuth endpoints.

What's interesting about this the architecture is that it's using HTTP cookies. Not API tokens. Cookies are quite good in that they're established and the browser does all the automated work of keeping it secure and making each request potentially authenticated.

Here's the relevant React code and here's the relevant Django code that processes the Webhook.

The whole project is available on: https://github.com/peterbe/headsupper.

Also, I made a demo at the November Mozilla Beer and Tell.

Django forms and making datetime inputs localized

December 4, 2015
2 comments Python, Django

tl;dr

To change from one timezone aware datetime to another, turn it into a naive datetime and then use pytz's localize() method to convert it back to the timezone you want it to be.

Introduction

Suppose you have a Django form where you allow people to enter a date, e.g. 2015-06-04 13:00. You have to save it timezone aware, because you have settings.USE_TZ on and it's just many times to store things in timezone aware dates.

By default, if you have settings.USE_TZ and no timezone information is in the string that the django.form.fields.DateTimeField parses, it will use settings.TIME_ZONE and that timezone might be different from what it really should be. For example, in my case, I have an app where you can upload a CSV file full of information about events. These events belong to a venue which I have in the database. Every venue has a timezone, e.g. Europe/Berlin or US/Pacific. So if someone uploads a CSV file for the Berlin location 2015-06-04 13:00 means 13:00 o'clock in Berlin. I don't care where the server is hosted and what its settings.TIME_ZONE is. I need to make that input timezone aware specifically for Berlin/Europe.

Examples

Suppose you have settings.TIME_ZONE == 'US/Pacific' and you let the django.form.fields.DateTimeField do its magic you get something you don't want:


>>> from django.conf import settings
>>> settings.TIME_ZONE
'US/Pacific'
>>> assert settings.USE_TZ
>>> from django.forms.fields import DateTimeField
>>> DateTimeField().clean('2015-06-04 13:00')
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)

See! That's wrong. Sort of. Not Django's fault. What I need to do is to convert that datetime object into one that is timezone aware on the Europe/Berlin timezone.

In old versions of pytz, specifically <=2014.2 you could do this:


>>> import pytz
>>> pytz.VERSION
'2014.2'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>)

But in modern versions of pytz you can't do that because if you don't use the pytz.timezone instance to localize it will use the default version which might be one of those crazy "Local Mean Time" which they used a 100 years ago. E.g.


>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> tz = pytz.timezone('Europe/Berlin')
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>)

See, it's that crazy LMT+0:53:00 that's oft talked of on Stackoverflow!

Here's the trick

The trick is to use pytz.timezone(MY TIME ZONE NAME).localize(MY NAIVE DATETIME OBJECT). When you use the .localize() method pytz can use the date to make sure it uses the right conversion for that named timezone.

And in the case of our overly smart django.form.fields.DateTimeField it means we need to convert it back into a naive datetime object and then localize it.


>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date = date.replace(tzinfo=None)
>>> date
datetime.datetime(2015, 6, 4, 13, 0)
>>> tz = pytz.timezone('Europe/Berlin')
>>> tz.localize(date)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CEST+2:00:00 DST>)

That was much harder than it needed to be. Timezones are hard. Especially when you have the human element of people typing in things and just, rightfully, expect the system to figure it out and get it right.

I hope this helps the next schmuck who has/had to set aside an hour to figure this out.

Screenshot-sharing performance comparison

November 13, 2015
13 comments Web development, macOS

One tool that I use many times per day, at work, is to take a screenshot on my mac and then that gets uploaded to the clouds and a permalink to that picture gets put in my clipboard so I can quickly and easily share it.

First I was using CloudApp, which was awesome. I can't remember how much I paid but they started being very unreliable. Sometimes the upload just failed. Sometimes viewing the image failed. It was mostly working but unreliable enough that I just couldn't cope.

So I switched to Dropbox and they have been very reliable. I can't remember how much I pay them but the primary use for paying them is that they back up a folder on the hard drive and make it easy to share other files in a nice way.

But when I take a screenshot, and share that link, that page, that shows the screenshot, is horribly slow. It's just supposed to show an image! It's not supposed to load so slowly that it makes my browser tremble. Shame on you Dropbox!

So lastly, people have been saying great things about Jumpshare. It's free!! Their "plus upgrade", for $9.99/month gives you more options, more storage (1TB), possible password protection, custom branding, custom domain and analytics. That's nice but I'm not desperate so I might upgrade later.

Samples

But let's look at the difference in how these three perform in showing an image:

  1. Dropbox sample

  2. CloudApp sample

  3. Jumpshare sample

By the way, I'm sorry about the motif in the pictures but I encourage you to open each of these and notice that they all look different. I don't know if that's because those sites (CloudApp and Jumpshare) apply some CSS filters a la Instagram but they look different. Here's the original. That might be topic enough for a whole new blog post. But that's for another time.

Webpagetest.org

First, let's load these on Webpagetest.org:

  1. Dropbox

  2. CloudApp

  3. Jumpshare

Last but not least; a visual comparison of all three on Firefox, DSL from San Jose, CA, USA. Here's the video comparison.

Devtools

Here using the pure browser Firefox Devtools to measure the network requests needed:

  1. Dropbox
    Dropbox

  2. CloudApp
    CloudApp

  3. Jumpshare
    Jumpshare

Things to note about these:

  • Jumpshare has 636.18KB of CSS. That's way excessive. I wonder if you can even reach 636KB if you concatenate Bootstrap, SemanticUI, Foundation, PureCSS and Bootflat into one file? Perhaps that's a blog post on its own.
  • Dropbox has 4,974.83KB of JavaScript spread over 85 files!!
  • Of the 85 JavaScript files Dropbox force you to eat, roughly 20 of them are trackers that would get disabled if you enable tracking protection in your browser.
  • CloudApp doess their CSS better, but it's still bigger than it needs to be.
  • Dropbox is the only one that doesn't force you to load Flash.

In numbers

Metric Dropbox CloudApp Jumpshare
Length of URL 85 17 44
HTTPS Yes No Yes
Fully loaded (time) 21.216s 12.420s 13.839s
Fully loaded (bytes) 2,747 KB 1,772 KB 1,910 KB
Fully loaded (requests) 198 90 44
Speed Index 13065 8707 8685
Upgrade price (per month) $9.99 $8.25 $9.99

The winner?

As you can see, CloudApp loads marginally faster than Jumpshare (and Dropbox trails long long after). Also, CloudApp wins more rows in the "In numbers" section above. But lack of HTTPS I kinda sad.

But remember, the reason I ditched CloudApp was because it was unreliable to the point of serious frustration. They might win todays performance comparsion but I dare not go back. This new contender, Jumpshare, looks and feels great. The OSX app worked wonderfully and was really easy peasy to set up. Now I have a cute little kanguru in the OSX toolbar.

So, I think I'll stick with Jumpshare.com for now. I can't tell how much storage they give you for free but...

My money

So you get more features and more storage if you pay $X per month? What I really would pay for is a much faster web page. I know it would be possible. The image you view is 1,074.4KB and all you actually only need is a little bit of HTML around it and maybe some really basic CSS. It should be possible entirely without any JavaScripts. That, I would happily pay for.

UPDATE

On closer inspection, it seems Jumpshare's CSS is NOT 636.18KB. The requests analyzer in the Firefox Devtools most likely have a bug.

Whatsdeployed

November 11, 2015
4 comments Python, Web development, Mozilla

Whatsdeployed was a tool I developed for my work at Mozilla. I think many other organizations can benefit from using it too.

So, on many sites, what we do when deploying a site, is that we note which git sha was used and write that to a file which is then exposed via the web server. Like this for example. If you know that sha and what's at the tip of the master branch on the project's GitHub page, you can build up an interesting dashboard that allows you to see what's available and what's been deployed.

Sample Whatsdeployed screen for the Mozilla Socorro project
The other really useful case is when you have more than just one environment. For example, you might have a dev, stage and prod environment and, always lastly, the master branch on GitHub. Now you can see what code has been shipped on prod versus your staging environment for example.

This is one of those far too few projects that you build quickly one Friday afternoon and it turns out to be surprisingly useful to a lot of people. I for one, check various projects like this several times per day.

The code is on GitHub and it's basically a tiny bit of Flask with some jQuery doing a couple of AJAX requests. If you enjoy it and use it, please share.

UPDATE

Blogged about a facelift, Jan 2018

Chainable catches in a JavaScript promise

November 5, 2015
6 comments Web development, JavaScript

If you have a Promise that you're executing, you can chain multiple things quite nicely by simply returning the value as it "passes through".
For example:


new Promise((resolve) => {
  resolve('some value')
})
.then((value) => {
  console.log('1', value)
  return value
})
.then((value) => {
  console.log('2', value)
  return value
})

This will console log

1 some value
2 some value

And you can add more .then() to it. As many as you like. Just remember to "play ball" by passing the value. In fact, you can actually pass a different value. Like this for example:


new Promise((resolve) => {
  resolve('some value')
})
.then((value) => {
  console.log('1', value)
  return value
})
.then((value) => {
  console.log('2', value)
  return value.toUpperCase()
})
.then((value) => {
  console.log('3', value)
  return value
})

Demo here. This'll console log

1 some value
2 some value
3 SOME VALUE

But how do you do the same with multiple .catch()?

This is NOT how you do it:


new Promise((resolve, reject) => {
  reject('some reason')
})
.catch((reason) => {
  console.warn('1', reason)
  return reason
})
.catch((reason) => {
  console.warn('2', reason)
  return reason
})

Demo here. When you run that you just get:

1 some reason

To chain catches you have to re-raise (aka re-throw) it:


new Promise((resolve, reject) => {
  reject('some reason')
})
.catch((reason) => {
  console.warn('1', reason)
  throw reason
})
.catch((reason) => {
  console.warn('2', reason)
})

Demo here. The output if you run this is:

1 some value
2 some value

But you have to be a bit more careful here. Note that in the second .catch() it doesn't re-throw the reason one last time. If you do that, you get a general JavaScript error on that page. I.e. an unhandled error that makes it all the way out to the web console. Meaning, you have to be aware of errors and take care of them.

Why does this matter?

It matters because you might want to have a, for example, low level and a high level dealing with errors. For example, you might want to log all exceptions AND still pass them along so that higher level code can be aware of it. For example, suppose you have a function that fetches data using the fetch API. You use it from multiple places and you don't want to have to log it everywhere. Instead, that wrapping function can be responsible for logging it but you still have to deal with it.

For example, this is contrived by not totally unrealistic code:


let fetcher = (url) => {
  // this function might be more advanced
  // and do other fancy things
  return fetch(url)
}

// 1st
fetcher('http://example.com/crap')
.then((response) => {
  document.querySelector('#result').textContent = response
})
.catch((exception) => {
  console.error('oh noes!', exception)
  document.querySelector('#result-error').style['display'] = 'block'
})

// 2nd
fetcher('http://example.com/other')
.then((response) => {
  document.querySelector('#other').textContent = response
})
.catch((exception) => {
  console.error('oh noes!', exception)
  document.querySelector('#other-error').style['display'] = 'block'
})

Demo here

Notice how each .catch() handler does the same kind of logging but deals with the error in a human way differently.
Wouldn't it be nice if you could have a general and central .catch() for logging but continue dealing with the errors in a human way?

Here's one such example:


let fetcher = (url) => {
  // this function might be more advanced
  // and do other fancy things
  return fetch(url)
  .catch((exception) => {
    console.error('oh noes! on:', url, 'exception:', exception)
    throw exception
  })
}

// 1st
fetcher('http://example.com/crap')
.then((response) => {
  document.querySelector('#result').textContent = response
})
.catch(() => {
  document.querySelector('#result-error').style['display'] = 'block'
})

// 2nd
fetcher('http://example.com/other')
.then((response) => {
  document.querySelector('#other').textContent = response
})
.catch(() => {
  document.querySelector('#other-error').style['display'] = 'block'
})

Demo here

Here you get the best of both worlds. You have a central place where all exceptions are logged in a nice way, and the higher level code only has to deal with the human way of explaining that something went wrong.

It's pretty basic but it's probably useful to somebody else who gets confused about how to deal with exceptions in promises.

Weight of your PostgreSQL tables "lumped together"

October 31, 2015
0 comments PostgreSQL

UPDATE June 2020

That first SQL isn't working in Postgres 12 and onwards. Use this instead:


SELECT relname AS "table_name",
    pg_size_pretty(pg_relation_size(C.oid)) AS "size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
  AND relkind = 'r'
  ORDER BY pg_relation_size(C.oid) DESC
  LIMIT 10;

We have lots of tables that weigh a lot. Some of the tables are partitions so they're called "mytable_20150901" and "mytable_20151001" etc.

To find out how much each table weighs you can use this query:


select table_name, pg_relation_size(table_name), pg_size_pretty(pg_relation_size(table_name))
from information_schema.tables
where table_schema = 'public'
order by 2 desc limit 10;

It'll give you an output like this:

table_name        | pg_relation_size | pg_size_pretty
--------------------------+------------------+----------------
 raw_adi_logs             |      14724538368 | 14 GB
 raw_adi                  |      14691426304 | 14 GB
 tcbs                     |       7173865472 | 6842 MB
 exploitability_reports   |       6512738304 | 6211 MB
 reports_duplicates       |       4428742656 | 4224 MB
 addresses                |       4120412160 | 3930 MB
 missing_symbols_20150601 |       3264897024 | 3114 MB
 missing_symbols_20150608 |       3170762752 | 3024 MB
 missing_symbols_20150622 |       3039731712 | 2899 MB
 missing_symbols_20150615 |       2967281664 | 2830 MB
(10 rows)

But as you can see in this example, it might be interesting to know what the sum is of all the missing_symbols_* partitions.

Without further ado, here's how you do that:


select table_name, total, pg_size_pretty(total)
from (
  select trim(trailing '_0123456789' from table_name) as table_name, 
  sum(pg_relation_size(table_name)) as total
  from information_schema.tables
  where table_schema = 'public'
  group by 1
) as agg
order by 2 desc limit 10;

Then you'll get possibly very different results:

table_name        |    total     | pg_size_pretty
--------------------------+--------------+----------------
 reports_user_info        | 157111115776 | 146 GB
 reports_clean            | 106995695616 | 100 GB
 reports                  | 100983242752 | 94 GB
 missing_symbols          |  42231529472 | 39 GB
 raw_adi_logs             |  14724538368 | 14 GB
 raw_adi                  |  14691426304 | 14 GB
 extensions               |  12237242368 | 11 GB
 tcbs                     |   7173865472 | 6842 MB
 exploitability_reports   |   6512738304 | 6211 MB
 signature_summary_uptime |   6027468800 | 5748 MB
(10 rows)

You can read more about the trim() function here.