'Counting repeated words from list

If I have 3 lists like that:

list1 = ['hello', 'bye', 'hello', 'yolo']
list2 = ['hello', 'bye', 'world']
list3 = ['bye', 'hello', 'yolo', 'salut']

how can I output into:

word, list1,list2,list3
hello,2,1,1
bye,1,1,1
yolo,1,0,1
salut,1,0,0

and convert these lists into excel table. Thank you!



Solution 1:[1]

This is a "Bag of words" problem. Here is the solution:

import pandas as pd

list1 = ['hello', 'bye', 'hello', 'yolo']
list2 = ['hello', 'bye', 'world']
list3 = ['bye', 'hello', 'yolo', 'salut']

wordlist = []
wordlist.extend(list1)
wordlist.extend(list2)
wordlist.extend(list3)
wordlist = set(wordlist)

def calculateBOW(wordset,l_doc):
    tf_diz = dict.fromkeys(wordset,0)
    for word in l_doc:
        tf_diz[word]=l_doc.count(word)
    return tf_diz

bow1 = calculateBOW(wordlist, list1)
bow2 = calculateBOW(wordlist, list2)
bow3 = calculateBOW(wordlist, list3)

df = pd.DataFrame([bow1, bow2, bow3]).transpose()
df.columns = ["list1", "list2", "list3"]

df.to_excel("output.xlsx")  

print(df)

Source: https://www.analyticsvidhya.com/blog/2021/08/a-friendly-guide-to-nlp-bag-of-words-with-python-example/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mbostic