'understanding effect of clause order in full text search query

I'm using postgres FTS for prefix searching.
When a match occurs I want to favor an exact match over a prefix match.

A query of work:* should order work first and then workday. This query will give the same rank for both matches, so this won't work

SELECT ts_rank(to_tsvector('simple', 'work'), to_tsquery('simple', 'work:*'))
UNION ALL
SELECT ts_rank(to_tsvector('simple', 'workday'), to_tsquery('simple', 'work:*'));
-- result:
0.06079271
0.06079271

When I add an or clause for work ( trough | work ) , the rank for workday suddenly becomes 0. I'm not understanding why the or clause reduces the rank to 0, since there is still a matching term (work:*) in the query

SELECT ts_rank(to_tsvector('simple', 'work'), to_tsquery('simple', 'work:* | work'))
UNION ALL
SELECT ts_rank(to_tsvector('simple', 'workday'), to_tsquery('simple', ' work:* | work'));
-- result:
0.06079271
0 

When I switch the two or clauses positionally, the result is the same as not adding the clause at all:

SELECT ts_rank(to_tsvector('simple', 'work'), to_tsquery('simple', 'work | work:*'))
UNION ALL
SELECT ts_rank(to_tsvector('simple', 'workday'), to_tsquery('simple', 'work | work:*'));
-- result:
0.06079271
0.06079271

SQL fiddle: http://sqlfiddle.com/#!17/9eecb/89476

My goal is to better understand ts_rank and why the ranking gets impacted in such a dramatic way, potentially reporting any bugs too postgres.

I can not find any mention of the order of the query terms them self to have any impact on ranking.
https://www.postgresql.org/docs/13/textsearch-controls.html#TEXTSEARCH-RANKING

note: I am using ranking to order the results because my query can contain an arbitrary amount of keywords.



Solution 1:[1]

How about a query that explicitly ranks full matches before prefix matches?

SELECT doc
FROM tab
WHERE to_tsvector('simple', doc) @@ to_tsquery('simple', 'work:*')
ORDER BY NOT to_tsvector('simple', doc) @@ to_tsquery('simple', 'work'),
         ts_rank(to_tsvector('simple', doc), to_tsquery('simple', 'work:*'));

That relies on FALSE < TRUE.

Solution 2:[2]

to_tsquery('simple', 'work:*'), to_tsquery('simple', 'work:*') 

is the wrong way. Because :
select ('simple', 'work:*')::tsquery; will yield error. So:
select to_tsquery('simple', 'work:*'); == select to_tsquery( 'work:*');


why SELECT ts_rank(to_tsvector('workday') ,( 'work'::tsquery)); yield 0 Because workday and work is two different lexlexemes


Translate your question: why SELECT ts_rank(to_tsvector('workday') ,( 'work'::tsquery)); yield 0
select ts_rank(to_tsvector('workday'),to_tsquery('work:*')); yield 0.06079271
https://www.postgresql.org/docs/current/textsearch-controls.html. The order is important.

Ranking attempts to measure how relevant documents are to a particular query, so that when there are many matches the most relevant ones can be shown first. PostgreSQL provides two predefined ranking functions, which take into account lexical, proximity, and structural information; that is, they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur.

However there is another function ts_rank_cd

select ts_rank_cd(to_tsvector('workday'),to_tsquery('work | work:*')); 

return 0.1

select ts_rank_cd(to_tsvector('workday'),to_tsquery('work:* | work'));

return 0.1

For now I don't know 0.1 mean. usefullink: https://linuxgazette.net/164/sephton.html
https://www.postgresql.org/message-id/flat/Pine.LNX.4.64.0709111118150.2767%40sn.sai.msu.ru#f5da1024abc8a2fc85814fd49a8bd71d
http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking
https://www.postgresql.eu/events/pgconfeu2018/sessions/session/2116/slides/137/pgconf.eu-2018-fts.pdf
https://www.postgresql.org/message-id/CAAMbBLtckoNmFp6irVXsKcZufM6_DiQGN2vtHhWVgofESsE11g@mail.gmail.com


SELECT ts_rank(to_tsvector('simple', 'workday'), to_tsquery('simple', ' work:* | work'))

The Same as

SELECT ts_rank(to_tsvector('workday'), to_tsquery(' work:* | work'));

please read the manual. I also explained. If you want simple workday, you need use operator. 'simple & workday'

Order matters, then the result is inconsistent.
So You want consistent result use ts_rank_cd

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Laurenz Albe
Solution 2