'Is there a way to create a new row with duplicate sentence but expanded acronym under it?
So I have a dataframe with a rows in a column containing sentences with acronyms. I have a list of what those acronyms stand for in two columns in a seperate dataframe.
What I would like to do is, for every cell in that first dataframe's column in which an acronym is used, create a new row underneath it with the same exact sentence except the acronym is now expanded.
I have as input a dataframe with a column and another dataframe with an acronym and it's expansion:
| Column 1 |
|---|
| I work at the CIA |
| I work at the NSA |
| I have worked at both the NSA and CIA |
| Column A | Column B |
|---|---|
| CIA | Central Intelligence Agency |
| NSA | National Security Agency |
And what I want to get:
Desired output:
| Column 1 |
|---|
| I work at the CIA |
| I work at the Central Intelligence Agency |
| I work at the NSA |
| I work at the National Security Agency |
| I have worked at both the NSA and CIA |
| I have worked at both the National Security Agency and the Central Intelligence Agency |
Solution 1:[1]
I have no idea why you want to add rows to your data frame, but here is an approach to do the interpretation of data using the second Dataframe.
Given two DataFrames defined as follows:
df1 = pd.DataFrame(data=['I work at the CIA', 'I work at the NSA',
'I have worked at both the NSA and CIA'], columns=['Raw_in'])
which yields:
Raw_in
0 I work at the CIA
1 I work at the NSA
2 I have worked at both the NSA and CIA
df2 = pd.DataFrame(data=[['CIA', 'Central Intelligence Agency'],
['NSA', 'National Security Agency']],
columns=['Abrev', 'Title'])
Which Yields:
Abrev Title
0 CIA Central Intelligence Agency
1 NSA National Security Agency
Define a translation function as follows:
def createNew(schstr, dfx):
schList = schstr.split(' ')
keys = dfx['Abrev'].to_list()
for i, w in enumerate(schList):
if w in keys:
schList[i] = dfx[dfx['Abrev'] == w]['Title'].values[0]
return " ".join(schList)
And employ the translation as follows:
df1['Results'] = [createNew(x, df2) for x in df1['Raw_in'].to_list()]
This results in adding a column to df1 as follows:
Raw_in Results
0 I work at the CIA I work at the Central Intelligence Agency
1 I work at the NSA I work at the National Security Agency
2 I have worked at both the NSA and CIA I have worked at both the National Security Agency
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | itprorh66 |
