'How to Augment text data by a group using NLPAUG for group imbalance (Python 3.9.6)
Good afternoon.
I am running into an issue with NLPAUG Python package. By default the package supports strings or list of strings, but I am interested to see how I can augment data by group. What I am trying to do is create new text records using NLPAUG for underrepresented groups. So if Group E only has one text record, I would like to augment it and create 2 new records or use synonyms for it via NLPAUG package.
I tried the following code as recommended by NLPAUG documentation, but it it keeps giving me AttributeError: 'DataFrame' object has no attribute 'strip' error message
Code Tried
aug_wordnet = naw.SynonymAug(aug_src='wordnet',aug_max=3)
aug_data = []
for group, d in mydataframe.groupby(['Group']):
a_data = aug_wordnet.augment(d)
a_data = pd.DataFrame(aug_data, columns=['Comment'])
a_data['Group'] = Group
aug_data.append(a_data)
aug_data = pd.concat(aug_data)
Here is what my data looks like.
Input Example
| ID | Comment | Group |
|---|---|---|
| 12 | This is good | A |
| 34 | This is OK | A |
| 56 | This is excellent | A |
| 78 | This is awful | A |
| 91 | This is good | B |
| 11 | This is awful | B |
| 21 | This is awful | C |
| 22 | This is awful | C |
| 23 | This is Amazing | C |
| 24 | This is Wonderful | D |
Output Example - using Synonyms from NLPAUG for group D
| ID | Comment | Group |
|---|---|---|
| 12 | This is good | A |
| 34 | This is OK | A |
| 56 | This is excellent | A |
| 78 | This is awful | A |
| 91 | This is good | B |
| 11 | This is awful | B |
| 21 | This is awful | C |
| 22 | This is awful | C |
| 23 | This is Amazing | C |
| 24 | This is replaced with synonym for Wonderful | D |
| 24 | This is Great | D |
| 24 | This is Excellent | D |
How can I tackle this issue?
Thank you
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
