'Only returning word counts for words >= 5 characters & sort by key value (highest to lowest)
I have a .txt file that I am looking to return the count of each time a word appears in it. I got the code to work, but now I want to refine down to only returning words that are 5 or more characters long. I added in "len" function to a for statement, but it is still returning all words. Any help would be greatly appreciated.
I also am wondering if it is possible for me to sort by key count, to return the words with highest counts first.
import string
import os
os.chdir('mydirectory') # Changes directory.
speech = open("obamaspeech.txt", "r") # Opens file.
emptyDict = dict() # Creates dictionary
for line in speech:
line = line.strip() # Removes leading spaces.
line = line.lower() # Convert to lowercase.
line = line.translate(line.maketrans("", "", string.punctuation)) # Removes punctuation.
words = line.split(" ") # Splits lines into words.
for word in words:
if len(word) >= 5 in emptyDict:
emptyDict[word] = emptyDict[word] + 1
else:
emptyDict[word] = 1
for key in list(emptyDict.keys()):
print(key, ":", emptyDict[key])
Solution 1:[1]
Another answer has shown you how to modify your code to the desired effect. On the other hand, here is another implementation. Note that counting words and sorting them by frequency is made much easier with the help of list comprehension and the Counter object from the collections module.
from collections import Counter
os.chdir('mydirectory')
with open("obamaspeech.txt", "r") as speech:
full_speech = speech.read().lower().translate(str.maketrans("", "", string.punctuation))
words = full_speech.split()
count = Counter([w for w in words if len(w)>=5])
for w,k in count.most_common():
print(f"{w}: {k} time(s)")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
