'How to Augment text data by a group using NLPAUG for group imbalance (Python 3.9.6)

Good afternoon.

I am running into an issue with NLPAUG Python package. By default the package supports strings or list of strings, but I am interested to see how I can augment data by group. What I am trying to do is create new text records using NLPAUG for underrepresented groups. So if Group E only has one text record, I would like to augment it and create 2 new records or use synonyms for it via NLPAUG package.

I tried the following code as recommended by NLPAUG documentation, but it it keeps giving me AttributeError: 'DataFrame' object has no attribute 'strip' error message

Code Tried

aug_wordnet = naw.SynonymAug(aug_src='wordnet',aug_max=3)

aug_data = []
for group, d in mydataframe.groupby(['Group']):
  a_data = aug_wordnet.augment(d)
  a_data = pd.DataFrame(aug_data, columns=['Comment'])
  a_data['Group'] = Group
  aug_data.append(a_data)

aug_data = pd.concat(aug_data)

Here is what my data looks like.

Input Example

ID	Comment	Group
12	This is good	A
34	This is OK	A
56	This is excellent	A
78	This is awful	A
91	This is good	B
11	This is awful	B
21	This is awful	C
22	This is awful	C
23	This is Amazing	C
24	This is Wonderful	D

Output Example - using Synonyms from NLPAUG for group D

ID	Comment	Group
12	This is good	A
34	This is OK	A
56	This is excellent	A
78	This is awful	A
91	This is good	B
11	This is awful	B
21	This is awful	C
22	This is awful	C
23	This is Amazing	C
24	This is replaced with synonym for Wonderful	D
24	This is Great	D
24	This is Excellent	D

How can I tackle this issue?

Thank you

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to Augment text data by a group using NLPAUG for group imbalance (Python 3.9.6)

Sources

Related Questions