'Match phrase with similar options
I'm looking for a way to match a phrase from some text with the phrases containing some of the same words, without assuming I know the possible phrases beforehand (e.g. they come from a database).
input = "I like apple pie"
possible = ["baked apple", "apple", "apple pie"]
possible phrases could expand to include ["vanilla ice cream", "ice cream", "ice cream sundae"]
So for each text input I would like to know which phrase (if any) matches without having partial matches.
>>> input = "i want an ice cream sundae with my apple pie"
>>> output> ["ice cream sundae", "apple pie"]
>>> input = "i would like an apple to go with my vanilla ice cream"
>>> output> ["apple", "vanilla ice cream"]
I have tried looping through the text for any single word matches and then adding them to a list and trying each item against bigger and bigger sections of the text but I run into index errors trying both forward and back searching and cant figure out a reasonable way to deal with them.
Solution 1:[1]
You can split the problem in three phases: find all matches, remove partial matches and then arrange phrases in their order of appearance, like below:
from itertools import permutations
from queue import PriorityQueue
input_str = "i would like an apple to go with my vanilla ice cream"
possible = ["baked apple", "apple", "apple pie", "vanilla ice cream", "ice cream", "ice cream sundae"]
# Find all matches
all_matches = [item for item in possible if item in input_str]
print(f"All matches: {all_matches}")
# Find partial matches
partial_matches = set()
for item in list(permutations(all_matches, 2)):
if item[0] in item[1]:
partial_matches.add(item[0])
elif item[1] in item[0]:
partial_matches.add(item[1])
# Good phrases but not in the right order of appearance
result = set(all_matches) - partial_matches
print(f"Without partial matches: {result}")
# Arrange phrases in the order of appearance
final = []
q = PriorityQueue()
for item in result:
q.put((input_str.find(item), item))
while not q.empty():
final.append(q.get()[1])
print(f"Right order or appearance: {final}")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
