'Counting repeated words from list
If I have 3 lists like that:
list1 = ['hello', 'bye', 'hello', 'yolo']
list2 = ['hello', 'bye', 'world']
list3 = ['bye', 'hello', 'yolo', 'salut']
how can I output into:
word, list1,list2,list3
hello,2,1,1
bye,1,1,1
yolo,1,0,1
salut,1,0,0
and convert these lists into excel table. Thank you!
Solution 1:[1]
This is a "Bag of words" problem. Here is the solution:
import pandas as pd
list1 = ['hello', 'bye', 'hello', 'yolo']
list2 = ['hello', 'bye', 'world']
list3 = ['bye', 'hello', 'yolo', 'salut']
wordlist = []
wordlist.extend(list1)
wordlist.extend(list2)
wordlist.extend(list3)
wordlist = set(wordlist)
def calculateBOW(wordset,l_doc):
tf_diz = dict.fromkeys(wordset,0)
for word in l_doc:
tf_diz[word]=l_doc.count(word)
return tf_diz
bow1 = calculateBOW(wordlist, list1)
bow2 = calculateBOW(wordlist, list2)
bow3 = calculateBOW(wordlist, list3)
df = pd.DataFrame([bow1, bow2, bow3]).transpose()
df.columns = ["list1", "list2", "list3"]
df.to_excel("output.xlsx")
print(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mbostic |