'Inverted Index Speed Up
I've implemented a inverted index in python, which is essentially a dictionary, whose key is words in the corpus, value is the tuple containing document that the key occurs in together with its bm25 score.
{
"love": [(doc1, 12), (doc3, 7.9), (doc5, 6.5)],
"hate": [(doc2, 8.7), (doc4, 3.2)]
}
However, when I process a query, I find it's hard to benefit from the efficiency of inverted index, because I must iterate all words in the query in a for loop. Within this loop, I must further loop over the documents the word links and maintain a global score table for all documents.
I think this is not the optimal way. Some ideas to speed up? I think a batch dictionary which accepts multiple keys and returns multiple values in parallel would help.
Solution 1:[1]
It should be more efficient if you represent the inverted index as a matrix, particular a sparse matrix, where your rows are your corpus and the columns as each document.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | abckk |
