'Match phrase with similar options

I'm looking for a way to match a phrase from some text with the phrases containing some of the same words, without assuming I know the possible phrases beforehand (e.g. they come from a database).

input = "I like apple pie"
possible = ["baked apple", "apple", "apple pie"]

possible phrases could expand to include ["vanilla ice cream", "ice cream", "ice cream sundae"]

So for each text input I would like to know which phrase (if any) matches without having partial matches.

>>> input = "i want an ice cream sundae with my apple pie"
>>> output> ["ice cream sundae", "apple pie"]
>>> input = "i would like an apple to go with my vanilla ice cream"
>>> output> ["apple", "vanilla ice cream"]

I have tried looping through the text for any single word matches and then adding them to a list and trying each item against bigger and bigger sections of the text but I run into index errors trying both forward and back searching and cant figure out a reasonable way to deal with them.



Solution 1:[1]

You can split the problem in three phases: find all matches, remove partial matches and then arrange phrases in their order of appearance, like below:

from itertools import permutations
from queue import PriorityQueue


input_str = "i would like an apple to go with my vanilla ice cream"
possible = ["baked apple", "apple", "apple pie", "vanilla ice cream", "ice cream", "ice cream sundae"]

# Find all matches
all_matches = [item for item in possible if item in input_str]
print(f"All matches: {all_matches}")

# Find partial matches
partial_matches = set()
for item in list(permutations(all_matches, 2)):
    if item[0] in item[1]:
        partial_matches.add(item[0])
    elif item[1] in item[0]:
        partial_matches.add(item[1])

# Good phrases but not in the right order of appearance
result = set(all_matches) - partial_matches
print(f"Without partial matches: {result}")

# Arrange phrases in the order of appearance
final = []

q = PriorityQueue()
for item in result:
    q.put((input_str.find(item), item))

while not q.empty():
    final.append(q.get()[1])

print(f"Right order or appearance: {final}")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1