'How to create a Dataframe from multiple dictionaries
I have a little issue with my the data I have (multiple dictionaries) to process and create a Dataframe from them.
This what the data look like:
print(data)
0 {'john': True}
1 {'joe': True}
2 {'tom': True}
3 {'mark': False}
4 {'andrew': True}
...
93 {'david': False}
94 {'luke': False}
95 {'vincent': True}
96 {'oliver': True}
97 {'matthew': True}
Length: 98, dtype: object
Basically what I want is this:
df = pd.DataFrame()
df['name'] = data[0].keys()
df['result'] = data[0].values()
print(df)
name result
0 john True
So 1 dataframe with 2 columns - name and result.
How can I apply that procedure for all dictionaries in data and have 1 output in the same Dataframe?
I was not able to replicate that action via lambda function, but maybe I was not doing it right.
Solution 1:[1]
pd.DataFrame(data.apply(lambda x: list(x.items())[0]).values.tolist())
you can rename columns, using:
df.rename(columns={0: 'name', 1: 'result'}, inplace=True)
what do you want? a dataframe with 2 columns, so we can find an idea...
we konw that pd.DataFrame is a constructor that get data and convert it to dataframe and if data be in 2d form like a 2d list or 2d numpy array or something else will been converted to a dataframe with 2 columns.
ok, now, we have idea... so, let's start.... and convert data to 2d form
Hmm.. oh... every cell in data series is a dictionary and python provide a builtin method to convert it to 2d form: items:
data.apply(lambda x: x.items())
output:
0 ((john, True))
1 ((joe, True))
2 ((tom, True))
3 ((mark, False))
...
wow, it's good... but, wait... we just need (john, True), and a parentheses is useless and items add it for dictionary with more than one key-value like: {'john': True, 'jane': False, 'joe': True}...
but, there, we only have one, and must remove it... I mean just select first element of this: ((john, True))...
unfortunately... items output doesn't allow us to select first element,so, we convert it to list and then select first element:
data.apply(lambda x: list(x.items())[0])
output:
0 (john, True)
1 (joe, True)
2 (tom, True)
oh.. it's 1d in every cell, and 2d in total (series, itself, is a dimension)
if you give a series to pd.DataFrame nothing was changed... and it give you that, again... so, you must change it to a list...
pandas has a builtin property: values that give you what make that dataframe or series as a numpy array and you can give it to someone else and he\she make dataframe from that again...
did you notice? again so, you must convert it to 2d list. this array,itself, is 2d, so just convert it to list: .tolist():
data.apply(lambda x: list(x.items())[0]).values.tolist()
output:
[('john', True),
('joe', True),
('tom', True),
('mark', False),
('andrew', True),
...]
oh... a beautiful, clean and 2d list... pass it to pd.DataFrame... :))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
