'Trying to create a pandas df out of a frequency dict, with one col being the word and the next being the count
I have
import pandas as pd
from nltk import FreqDist as fd
# frankenstein freqdist
frank_fd = fd('frank_lemma')
for word, count in frank_fd.items():
data = {'Word':[word], 'Counts':[count]}
df = pd.DataFrame(data)
df.head()
but my printout gives me only one word with one count. I tried putting print(word, count) in the first line of the for loop and it is going over every word, just not adding them all to the df I tried to create. Anyone know why?
Edit: I checked out my data and it is only adding the very last word to the df
Solution 1:[1]
You're trying to recreate a dict data structure very similar to the one you already have in the nltk.probability.FreqDist. Pandas is smart enough to let us add the FreqDist items to the DataFrame constructor.
This is working for me.
import pandas as pd
from nltk import FreqDist as fd
frank_fd = fd('frank_lemma')
df = pd.DataFrame(frank_fd.items(), columns=['Word', 'Counts'])
Output:
Word Counts
0 f 1
1 r 1
2 a 2
3 n 1
4 k 1
5 _ 1
6 l 1
7 e 1
8 m 2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
