'Python list of words from a file with a specific length * no punctuation included in the words

Result should be: a list of words with a length bigger than 9 , words should be lower and no punctuation in words, ***only three lines of code in the body of the function. The problem in my code is that is still adding punctuation to my word. I tried with checking just for one exmp. if ch not in one of those ->('-' or '"' or '!') or with r'[.,"!-]'.

I also tried to open the file not using with and it worked, i got the result that i want but using this method i am not gonna respect the part with only 3 lines of code inside body function

import string
min_length = 9
with open('my_file.txt') as file:

    content = ''.join([ch for ch in file if ch not in string.punctuation])
    result = [word.lower() for word in content.split() if len(word)>min_length]


print(result)
'''my output:
['distinctly', 'repeating,', 'entreating', 'entreating', 'hesitating', 'forgiveness', 'wondering,', 'whispered,', '"lenore!"-', 'countenance', '"nevermore."', 'sculptured', '"nevermore."', 'fluttered-', '"nevermore."', '"doubtless,"', 'unmerciful', 'melancholy', 'nevermore\'."', '"nevermore."', 'expressing', 'nevermore!', '"nevermore."', '"prophet!"', 'undaunted,', 'enchanted-', '"nevermore."', '"prophet!"', '"nevermore."', 'upstarting-', 'loneliness', 'unbroken!-', '"nevermore."', 'nevermore!']

as you can see there are still words with punctuation



Solution 1:[1]

I got this.

from string import punctuation
with open('test.txt') as f:
    data = f.read().replace('\n','')


for a in punctuation:
    data = data.replace(a,'')

data = list(set([a for a in data.split(' ') if len(a)>9]))
print(data)

output:

There is an empty list because in the given data there not a single word which has more than 9 letters.

Solution 2:[2]

I believe this could be an appropriate solution:


from string import punctuation

with open('files/text.txt') as f:

    print(set([a for a in f.read().translate(''.maketrans('', '', ''.join([ p for p in punctuation ]) + '\n')).split(' ') if len(a)>9]))


However this is a crime against humanity in terms of readability and I would highly suggest you relax this three line requirement to allow your code to be more understandable in the long run.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Neervana