'How to Augment text data by a group using NLPAUG for group imbalance (Python 3.9.6)

Good afternoon.

I am running into an issue with NLPAUG Python package. By default the package supports strings or list of strings, but I am interested to see how I can augment data by group. What I am trying to do is create new text records using NLPAUG for underrepresented groups. So if Group E only has one text record, I would like to augment it and create 2 new records or use synonyms for it via NLPAUG package.

I tried the following code as recommended by NLPAUG documentation, but it it keeps giving me AttributeError: 'DataFrame' object has no attribute 'strip' error message

Code Tried

aug_wordnet = naw.SynonymAug(aug_src='wordnet',aug_max=3)

aug_data = []
for group, d in mydataframe.groupby(['Group']):
  a_data = aug_wordnet.augment(d)
  a_data = pd.DataFrame(aug_data, columns=['Comment'])
  a_data['Group'] = Group
  aug_data.append(a_data)

aug_data = pd.concat(aug_data)

Here is what my data looks like.

Input Example

ID Comment Group
12 This is good A
34 This is OK A
56 This is excellent A
78 This is awful A
91 This is good B
11 This is awful B
21 This is awful C
22 This is awful C
23 This is Amazing C
24 This is Wonderful D

Output Example - using Synonyms from NLPAUG for group D

ID Comment Group
12 This is good A
34 This is OK A
56 This is excellent A
78 This is awful A
91 This is good B
11 This is awful B
21 This is awful C
22 This is awful C
23 This is Amazing C
24 This is replaced with synonym for Wonderful D
24 This is Great D
24 This is Excellent D

How can I tackle this issue?

Thank you



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source