'Trying to create a pandas df out of a frequency dict, with one col being the word and the next being the count

I have

import pandas as pd
from nltk import FreqDist as fd

# frankenstein freqdist
frank_fd = fd('frank_lemma')
for word, count in frank_fd.items():
    data = {'Word':[word], 'Counts':[count]}
    
df = pd.DataFrame(data)
df.head()

but my printout gives me only one word with one count. I tried putting print(word, count) in the first line of the for loop and it is going over every word, just not adding them all to the df I tried to create. Anyone know why?

Edit: I checked out my data and it is only adding the very last word to the df



Solution 1:[1]

You're trying to recreate a dict data structure very similar to the one you already have in the nltk.probability.FreqDist. Pandas is smart enough to let us add the FreqDist items to the DataFrame constructor.

This is working for me.

import pandas as pd
from nltk import FreqDist as fd

frank_fd = fd('frank_lemma')

df = pd.DataFrame(frank_fd.items(), columns=['Word', 'Counts'])

Output:

    Word    Counts
0   f       1
1   r       1
2   a       2
3   n       1
4   k       1
5   _       1
6   l       1
7   e       1
8   m       2

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1