'Python pivot column with comma and bracket separated
I am working with a large dataframe that currently looks like this:
| ID | Region | Noun |
|---|---|---|
| 1 | North America | [('Everything', 'NP'), ('the product','NP'), ('it','NP')] |
| 2 | North America | [('beautiful product','NP')] |
The third column contains noun phrase(s) and each phrases have labels (two capital letters). All of these are in quotation marks and then coupled in brackets, and then put in square brackets which make it difficult for me to separate and pivot.
I would like to pivot the last column only, so the final output will look like this:
| ID | Region | Noun | Type |
|---|---|---|---|
| 1 | North America | everything | NP |
| 1 | North America | the product | NP |
| 1 | North America | it | NP |
| 2 | North America | beautiful product | NP |
The annoying part is that some rows have more brackets than others. Is there any way I could make this happen on Python?
Solution 1:[1]
I'm assuming that you have strings in column Noun. You can apply ast.literal_eval on them to convert them to Python lists:
from ast import literal_eval
# apply if Noun are strings, skip otherwise
df["Noun"] = df["Noun"].apply(literal_eval)
df = df.explode("Noun")
df[["Noun", "Type"]] = df["Noun"].apply(pd.Series)
print(df)
Prints:
ID Region Noun Type
0 1 North America Everything NP
0 1 North America the product NP
0 1 North America it NP
1 2 North America beautiful product NP
Solution 2:[2]
Here is another way:
df2 = df.explode('Noun').reset_index(drop=True)
df2[['ID','Region']].join(pd.DataFrame(df2['Noun'].tolist(),columns = ['Noun','Type']))
Output:
ID Region Noun Type
0 1 North America Everything NP
1 1 North America the product NP
2 1 North America it NP
3 2 North America beautiful product NP
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrej Kesely |
| Solution 2 | rhug123 |
