'Finding most common adjective in text (part of speech tagging)

I have a dataset where i'm trying to find the most common adjective/verb/noun, I already used NLTK to tag the word, so now my dataframe is looking like this:

Index POS
0 [('the', 'DT'),('quality', 'NN'),('of', 'IN'),('food', 'NN'),('was', 'VBD'),('poor', 'JJ')]
1 [('good', 'JJ'), ('food', 'NN'), ('for', 'IN'), ('the', 'DT'), ('price', 'NN')]

Now how do i find what word is most commonly used as adjective for example



Solution 1:[1]

This line will find the most common adjective (JJ) per row:

df['adj'] = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].groupby(level=0).apply(lambda x: x.mode()[0])

Output:

>>> df
                                                                        POS   adj
0  [(the, DT), (quality, NN), (of, IN), (food, NN), (was, VBD), (poor, JJ)]  poor
1               [(good, JJ), (food, NN), (for, IN), (the, DT), (price, NN)]  good

This line will the most the common adjective in the whole dataframe:

most_common = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].mode()[0]

Output:

>>> most_common
'good'

(Note that for your example data, there's an equal number of most-common values (i.e., 1) so this code will pick the first if that's the case.)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 richardec