'Python: Extract List-Dictionary column in Pandas Dataframe

i got a dataframe from an API call and want to extract the dictionary in the "_embedded_results" column. The dataframe looks as follows:

    BC_id                                _embedded.results
0   6EAE8B27FCC11ED892E91CE972E580CC    [{'className': 'Skill', 'classId': 'http://dat...
1   7EAE8B27FCC11ED892E91CE972E580CC    [{'className': 'Skill', 'classId': 'http://dat...
2   8EAE8B27FCC11ED892E91CE972E580CC    [{'className': 'Skill', 'classId': 'http://dat...
3   9EAE8B27FCC11ED892E91CE972ED00CC    [{'className': 'Skill', 'classId': 'http://dat...
4   0EAE8B27FCC11ED892E91CE972ED00CC    [{'className': 'Skill', 'classId': 'http://dat..

The "_embedded_results" column (on position 0 for example) in detail looks as follows. For every row, there is a list with 5 different dictionaries:

[{'className': 'Skill',
  'classId': 'http://data.europa.eu/esco/model#Skill',
  'uri': 'http://data.europa.eu/esco/skill/237db40b-4600-47c0-837f-4a2c4f3014ab',
  'searchHit': 'range of project control principles',
  'title': 'project management principles'},
 {'className': 'Skill',
  'classId': 'http://data.europa.eu/esco/model#Skill',
  'uri': 'http://data.europa.eu/esco/skill/abb9c7f1-6d69-4feb-913e-6e577d426ea4',
  'searchHit': 'Operate projection equipment manually or with a control panel.',
  'title': 'operate projector'},
 ...}]

Now I want to extract the "title" value of "_embedded_results" and append it as extra column. For example like this at the first entry:

    BC_id                                _embedded.results                                Title1                             Title2                  ...
0   6EAE8B27FCC11ED892E91CE972E580CC    [{'className': 'Skill', 'classId': 'http://dat...   project management principles   operate projector
1   7EAE8B27FCC11ED892E91CE972E580CC    [{'className': 'Skill', 'classId': 'http://dat...
2   8EAE8B27FCC11ED892E91CE972E580CC    [{'className': 'Skill', 'classId': 'http://dat...
3   9EAE8B27FCC11ED892E91CE972ED00CC    [{'className': 'Skill', 'classId': 'http://dat...
4   0EAE8B27FCC11ED892E91CE972ED00CC    [{'className': 'Skill', 'classId': 'http://dat..

Another option would be to create a column "title" and append a row for every title.

I have tried something like this, to extract the titles for every row, but I don't know how to put this again into the dataframe:

my_list = [[x['title'] for x in list_dict] for list_dict in my_df1['_embedded.results']]


my_list[0:2]
[['project management principles',
  'operate projector',
  'manage railway construction projects',
  'prepare financial projections',
  'Prince2 project management'],
 ['project management principles',
  'operate projector',
  'manage railway construction projects',
  'prepare financial projections',
  'Prince2 project management']]

Does anyone knows how to solve this? Thanks in advance!



Solution 1:[1]

but I don't know how to put this again into the dataframe:

Just assign [[x['title'] for x in list_dict] for list_dict in my_df1['_embedded.results']] to new column rather than variable, consider following simple example

import pandas as pd
df = pd.DataFrame({"data":[[1,2,3],[4,5,6],[7,8,9]]})
df["cubes"] = [[j**2 for j in i] for i in df['data']]
print(df)

output

        data         cubes
0  [1, 2, 3]     [1, 4, 9]
1  [4, 5, 6]  [16, 25, 36]
2  [7, 8, 9]  [49, 64, 81]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Daweo