'Apply Regex Pattern on key from nested list of objects of custom dictionaries
I have a nested list of objects called "words". It consists of objects of a class that has data like conf(float), end(float), start(float), word(string) I want to apply regex pattern "\b(\w+)\b(?=.*?\b\1\b)" on "word" and remove objects that match the pattern
class Word:
''' A class representing a word from the JSON format for vosk speech recognition API '''
def __init__(self, dict):
'''
Parameters:
dict (dict) dictionary from JSON, containing:
conf (float): degree of confidence, from 0 to 1
end (float): end time of the pronouncing the word, in seconds
start (float): start time of the pronouncing the word, in seconds
word (str): recognized word
'''
self.conf = dict["conf"]
self.end = dict["end"]
self.start = dict["start"]
self.word = dict["word"]
def to_string(self):
''' Returns a string describing this instance '''
return "{:20} from {:.2f} sec to {:.2f} sec, confidence is {:.2f}%".format(
self.word, self.start, self.end, self.conf*100)
def compare(self, other):
if self.word == other.word:
return True
else:
return False
here is the collection of objects
each object contain data like this
{'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': 'hello'}
{'conf': 0.0, 'end': 1.00, 'start': 0.00, 'word': 'hello'}
{'conf': 0.0, 'end': 2.00, 'start': 0.00, 'word': 'to'}
I tried to apply regex pattern this way but couldn't get it working
pattern = re.compile("\b(\w+)\b(?=.*?\b\1\b)")
for w in words:
lst = [x for x in w.word if not re.match(pattern, x)]
print(lst)
can some good soul guide me on how to apply regex pattern on "word" and remove objects that matches the pattern Thanks in advance!
Solution 1:[1]
Try this:
for i in range(len(words)):
if not re.match(pattern, words[i].word):
lst.append(i)
print(lst)
# lst will have index of objs that satisfy the above condition
You can then use the indices to remove the objects from your list of objects.
EDIT: according to your comments, I've updated the answer:
distinct_words = {}
lst = []
for i in range(len(words)):
if isinstance(distinct_words.get(words[i].word), int):
lst.append(i)
else:
distinct_words[words[i].word] = i
print(lst)
Add the current word to distinct word dict with its index, if the word is found again then append it to lst else update the new word with the dict.
At the end lst will contain indices of all the repeated words. So use the indices in lst to remove the objects from the list.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |


