'Filter the paragraphs with at least words, nltk, in python
I am learning nltk and I have segmented the novels into paragraphs. What should I do then to filter each author's list so that only paragraphs with at least 30 words are kept? I have gutenberg.words but it failed.
# Load the corpus
from nltk.corpus import gutenberg
# List files available in this corpus
print(gutenberg.fileids())
Austen = gutenberg.paras('austen-emma.txt') + \
gutenberg.paras('austen-persuasion.txt') + \
gutenberg.paras('austen-sense.txt')
Shakespeare = gutenberg.paras('shakespeare-caesar.txt') + \
gutenberg.paras('shakespeare-hamlet.txt') + \
gutenberg.paras('shakespeare-macbeth.txt')
gutenberg.words(Austen)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

