'nltk.pos_tag not picking up "will" as MD
I have the following sentence from my corpus as an example. "This is a test case with multiple sentences. Will you get it right?"
When using nltk.pos_tag, the word "Will" should be tagged as MD(modal), correct? I have the following code for this sentence:
def get_pos_tags(text) -> Counter:
""" when given a string, returns a POS tag counter, using NLTK"""
text = str(text)
tokens = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokens)
count = Counter(tag for _, tag in tagged)
However, if I print the tags, I get the following:
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), ('case', 'NN'), ('with', 'IN'), ('a', 'DT'), ('two', 'CD'), ('singular', 'JJ'), ('nouns', 'NNS'), ('.', '.')]
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), ('case', 'NN'), ('with', 'IN'), ('multiple', 'JJ'), ('sentences', 'NNS'), ('.', '.'), ('Will', 'NNP'), ('you', 'PRP'), ('get', 'VBP'), ('it', 'PRP'), ('right', 'RB'), ('?', '.')].
As you can see, "Will" gets tagged as an NNP, when it should be MD according to the documentation. Any reason why this is happening?
Update: "Will" only gets tagged as MD if I lower it...even more strange. It's a fix but I still do not understand why that works.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
