'Is there a way to force SymSpell Python to return more than one correction recommendation?
I'm using the symspellpy module in Python for query correction. It is really useful and fast, but I'm having a issue with it.
Is there a way to force Symspell to return more than one recommendation for correction. I need it to analyse a better correction based on my application.
I'm calling Symspell like this:
suggestions = sym_spell.lookup(query, VERBOSITY_ALL, max_edit_distance=3)
Example of what I'm trying to do:
query = "resende". The return that I want ["resende", "rezende"]. What the method returns ["resende"]. Note that both "resende" and "rezende" are in my dictionary.
Solution 1:[1]
Merely a typo. Change the underscore in
Verbosity_ALL ... to
Verbosity.ALL
The three options are CLOSEST, TOP and ALL
Couple of other things in SymSpell ...
Four algorithm choices
Described here
Supported edit distance algorithm choices.
LEVENSHTEIN = 0 Levenshtein algorithm
DAMERAU_OSA = 1 Damerau optimal string alignment algorithm (default)
LEVENSHTEIN_FAST = 2 Fast Levenshtein algorithm
DAMERAU_OSA_FAST = 3 Fast Damerau optimal string alignment algorithm
DAMERAU_OSA # high count/frequency wins when using .ALL but distances tied?
LEVENSHTEIN # lowest edit distance wins (fewest changes needed)
To change from the default, overwrite it with one of them:
from symspellpy.editdistance import DistanceAlgorithm
sym_spell._distance_algorithm = DistanceAlgorithm.LEVENSHTEIN
Output object details
word = 'something'
matches = sym_spell.lookup(word, Verbosity.ALL, max_edit_distance=2)
for match in matches: # match is ... term, distance, count
print(f'{word} -> {match.term} {match.distance} {match.count}')
Using collections Counter() with SymSpell instead of loading words from file
SymSpell can only read the dictionary of ok words from a file currently (Apr 2022) however this can be added inside symspellpy.py to make it able to read from a collections Counter() output dict or other dictionary of words : counts, a mere quick hack that works for my purposes ...
def load_Counter_dictionary(self, counts_each):
for key, count in counts_each.items():
self.create_dictionary_entry(key, count)
Can then drop the use of load_dictionary(), for something like this instead ...
sym_spell.load_Counter_dictionary( Counter(words_list) )
The reason I resorted to that is a million+ record csv file was already loaded into a pandas dataframe containing a column of codes (think words) with some of them in large numbers (likely correct) along with outliers to be corrected and a column already made containing their counts each. So rather than saving the counts dict to file (expensive) and the reload by SymSpell, this is direct and efficient.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
