tl;dr ...


SELECT 
  indexrelname, 
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size 
FROM pg_stat_user_indexes 
WHERE relname LIKE 'search_terms%';

I'm working on a new table that is "search terms". A search term is constructed from extracting combinations of words, from titles, on my blog. For example, if the title is "Inspecting the index size in PostgreSQL" the search terms for that are:

  • inspecting
  • inspecting index
  • inspecting index size
  • inspecting index size postgresql
  • index
  • index size
  • index size postgresql
  • size
  • size postgresql
  • postgresql

It's created by lower casing the input and taking one, two, three, and four non-stopwords.

To search these in PostgreSQL I use trigrams using pg_trgm. For example,


SELECT id, term
FROM search_terms
WHERE
  term LIKE 'conf%'
  OR term LIKE '% conf%'
LIMIT 10

There's more to it, but this is a good start.

Anyway, I was curious how much space this is going to take up.
By indexing trigrams, you're indexing a lot of small individual strings. So lots of combinations.

The index is defined like this:


CREATE INDEX search_terms_term_trgm 
ON search_terms 
USING GIST (term gist_trgm_ops) 

After inserting a bit over 10,000 of these "search terms" the size of the index becomes:


peterbecom=#
SELECT
  indexrelname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname LIKE 'search_terms%';
      indexrelname      | index_size
------------------------+------------
 search_terms_pkey      | 248 kB
 search_terms_term_key  | 488 kB
 search_terms_term_trgm | 856 kB
(3 rows)

Curiously, if you set a higher siglen (e.g. GIST (term gist_trgm_ops(siglen=24))), it actually becomes smaller.

Conclusion(?)

If anything, it's not that large. In this particular example, there are only 10k of these smaller word combinations. It would be curious to see what the numbers become at 100k and 1M.

Comments

Your email will never ever be published.

Related posts