'How to convert this for loop into a vectorized operation
How might I convert the for loop in the function fin() into a vectorized operation?
I am trying to remove all the words from the .txt file that contain any of the forbidden letters provided, receiving these words as an output.
The input code is provided here: https://github.com/AllenDowney/ThinkPython2/blob/master/code/words.txt
This code functions correctly, in that it provides the correct output, but does not meet the conditions of using vectorized operations:
def avoids(word:str, forbidden_letters:str):
return set(forbidden_letters.lower()).intersection(set(word.lower().replace(' ', ''))) == set()
def fin(forbidden_letters):
fin = open('words.txt')
for line in fin:
word = line.strip()
if avoids(word, forbidden_letters) == True:
print(word)
fin('abcde')
The head of the expected output being:
fifing
fifth
fifthly
fifths
fifty
fig
figging
fight
fighting
fightings
fights
...
Because 'abcde' are considered forbidden letters they are not included in the any of the words outputted, hence we expect to see words without these letters.
Solution 1:[1]
Perhaps you could do something like this:
# Read in the file and add all lines to a list
with open('words.txt','r') as f:
lines = [this_line.strip().lower() for this_line in f]
# Pandas has a bunch of useful string methods, so we turn the data into a DataFrame.
df = pd.DataFrame(lines, columns=["Lines"])
# This will return all lines in df that don't contain a,b,c,d or e.
df = df[~df.Lines.str.contains("a|b|c|d|e", case=False, regex=True)]
# Optional: If you want to convert back to a list after removing unwanted rows,
# you can do the following:
good_lines = df["Lines"].tolist()
Let me know if you have any questions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AJH |
