'Python: Extract List-Dictionary column in Pandas Dataframe
i got a dataframe from an API call and want to extract the dictionary in the "_embedded_results" column. The dataframe looks as follows:
BC_id _embedded.results
0 6EAE8B27FCC11ED892E91CE972E580CC [{'className': 'Skill', 'classId': 'http://dat...
1 7EAE8B27FCC11ED892E91CE972E580CC [{'className': 'Skill', 'classId': 'http://dat...
2 8EAE8B27FCC11ED892E91CE972E580CC [{'className': 'Skill', 'classId': 'http://dat...
3 9EAE8B27FCC11ED892E91CE972ED00CC [{'className': 'Skill', 'classId': 'http://dat...
4 0EAE8B27FCC11ED892E91CE972ED00CC [{'className': 'Skill', 'classId': 'http://dat..
The "_embedded_results" column (on position 0 for example) in detail looks as follows. For every row, there is a list with 5 different dictionaries:
[{'className': 'Skill',
'classId': 'http://data.europa.eu/esco/model#Skill',
'uri': 'http://data.europa.eu/esco/skill/237db40b-4600-47c0-837f-4a2c4f3014ab',
'searchHit': 'range of project control principles',
'title': 'project management principles'},
{'className': 'Skill',
'classId': 'http://data.europa.eu/esco/model#Skill',
'uri': 'http://data.europa.eu/esco/skill/abb9c7f1-6d69-4feb-913e-6e577d426ea4',
'searchHit': 'Operate projection equipment manually or with a control panel.',
'title': 'operate projector'},
...}]
Now I want to extract the "title" value of "_embedded_results" and append it as extra column. For example like this at the first entry:
BC_id _embedded.results Title1 Title2 ...
0 6EAE8B27FCC11ED892E91CE972E580CC [{'className': 'Skill', 'classId': 'http://dat... project management principles operate projector
1 7EAE8B27FCC11ED892E91CE972E580CC [{'className': 'Skill', 'classId': 'http://dat...
2 8EAE8B27FCC11ED892E91CE972E580CC [{'className': 'Skill', 'classId': 'http://dat...
3 9EAE8B27FCC11ED892E91CE972ED00CC [{'className': 'Skill', 'classId': 'http://dat...
4 0EAE8B27FCC11ED892E91CE972ED00CC [{'className': 'Skill', 'classId': 'http://dat..
Another option would be to create a column "title" and append a row for every title.
I have tried something like this, to extract the titles for every row, but I don't know how to put this again into the dataframe:
my_list = [[x['title'] for x in list_dict] for list_dict in my_df1['_embedded.results']]
my_list[0:2]
[['project management principles',
'operate projector',
'manage railway construction projects',
'prepare financial projections',
'Prince2 project management'],
['project management principles',
'operate projector',
'manage railway construction projects',
'prepare financial projections',
'Prince2 project management']]
Does anyone knows how to solve this? Thanks in advance!
Solution 1:[1]
but I don't know how to put this again into the dataframe:
Just assign [[x['title'] for x in list_dict] for list_dict in my_df1['_embedded.results']] to new column rather than variable, consider following simple example
import pandas as pd
df = pd.DataFrame({"data":[[1,2,3],[4,5,6],[7,8,9]]})
df["cubes"] = [[j**2 for j in i] for i in df['data']]
print(df)
output
data cubes
0 [1, 2, 3] [1, 4, 9]
1 [4, 5, 6] [16, 25, 36]
2 [7, 8, 9] [49, 64, 81]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Daweo |
