'What is the default nltk part of speech tagset?

While experimenting with NLTK part of speech tagging, I noticed a lot of VBP tags in the output of my calls to nltk.pos_tag. I noticed this tag is not in the Brown Corpus part of speech tagset. It is however a part of the UPenn tagset.

What tagset does nltk use by default? I can't find this in the official documentation or the apidocs.



Solution 1:[1]

Ntlk uses PennTreebank tagset . Have a look at this link http://nltk.org/api/nltk.tag.html

Solution 2:[2]

It use POS tags used in the Penn Treebank Project. You can see the list of tags with there meaning on "http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html"

Solution 3:[3]

NLTK uses the Penn Treebank tagset as default. Others are available. Here a list of other taggers (with other tagsets) available as part of the NLTK library.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Chandan Gupta
Solution 2 Mayank Gour
Solution 3 Simone