'What is the default nltk part of speech tagset?
While experimenting with NLTK part of speech tagging, I noticed a lot of VBP tags in the output of my calls to nltk.pos_tag. I noticed this tag is not in the Brown Corpus part of speech tagset. It is however a part of the UPenn tagset.
What tagset does nltk use by default? I can't find this in the official documentation or the apidocs.
Solution 1:[1]
Ntlk uses PennTreebank tagset . Have a look at this link http://nltk.org/api/nltk.tag.html
Solution 2:[2]
It use POS tags used in the Penn Treebank Project. You can see the list of tags with there meaning on "http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html"
Solution 3:[3]
NLTK uses the Penn Treebank tagset as default. Others are available. Here a list of other taggers (with other tagsets) available as part of the NLTK library.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Chandan Gupta |
| Solution 2 | Mayank Gour |
| Solution 3 | Simone |
