'How to compare a tsvector against another tsvector?
I'm trying to get "possibly related" records of a given record.
There's a tsvector (tsv) on the table, so I'm thinking how to convert the source tsv to a tsquery format to then find the most closely related matches like a normal ranked search.
SELECT title,
link,
IMAGE,
intro,
created_at,
updated_at,
ts_rank_cd(tsv, q.match::tsquery) AS rank
FROM items,
(
SELECT tsv AS match
FROM items
WHERE id = 1234
) AS q
WHERE id <> 1234
ORDER BY rank DESC LIMIT 10;
Is there a nice way to achieve this?
Solution 1:[1]
I did some poking around and it didn't seem like there was an easy way of doing this. I think to do it effectively you would probably need your own C functions which could provide a distance from one tsvector to another (then you could use KNN searches).
Either way there is not a very easy way to do this and it is likely a significant amount of work, but it seems like it should be a generally applicable problem so the general community might be interested in a solution.
Note this is not as trivial as it sounds. Suppose I write a book about Albert Lord's the Singer of Tales and his emphasis on poetic formulas. Suppose I call it "Albert Lord and the Ring of Words." This would create a tsvector of Albert:1 Lord:2 Ring:5 Words:7, The Lord of the Rings is Lord:2 Ring:5 which would create a very false sense of similarity. If you have any categorization involved, you would want to leverage that as well.
Solution 2:[2]
You could perhaps compare tsvector with similarity from the pg_trgm extension. Something like this:
SELECT title, similarity(STRIP(to_tsvector('english', title))::text, STRIP(to_tsvector('english', 'The Lord of the Rings'))::text) sim
FROM (VALUES
('Albert Lord and the Ring of Words'),
('The Ring of Words'),
('Albert Lord')
) t(title)
ORDER BY sim DESC
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Chris Travers |
| Solution 2 | PPH |
