tl;dr ...
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname LIKE 'search_terms%';
I'm working on a new table that is "search terms". A search term is constructed from extracting combinations of words, from titles, on my blog. For example, if the title is "Inspecting the index size in PostgreSQL" the search terms for that are:
inspecting
inspecting index
inspecting index size
inspecting index size postgresql
index
index size
index size postgresql
size
size postgresql
postgresql
It's created by lower casing the input and taking one, two, three, and four non-stopwords.
To search these in PostgreSQL I use trigrams using pg_trgm
. For example,
SELECT id, term
FROM search_terms
WHERE
term LIKE 'conf%'
OR term LIKE '% conf%'
LIMIT 10
There's more to it, but this is a good start.
Anyway, I was curious how much space this is going to take up.
By indexing trigrams, you're indexing a lot of small individual strings. So lots of combinations.
The index is defined like this:
CREATE INDEX search_terms_term_trgm
ON search_terms
USING GIST (term gist_trgm_ops)
After inserting a bit over 10,000 of these "search terms" the size of the index becomes:
peterbecom=#
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname LIKE 'search_terms%';
indexrelname | index_size
------------------------+------------
search_terms_pkey | 248 kB
search_terms_term_key | 488 kB
search_terms_term_trgm | 856 kB
(3 rows)
Curiously, if you set a higher siglen
(e.g. GIST (term gist_trgm_ops(siglen=24))
), it actually becomes smaller.
Conclusion(?)
If anything, it's not that large. In this particular example, there are only 10k of these smaller word combinations. It would be curious to see what the numbers become at 100k and 1M.
Comments