'Splitting nested dictionary with list as Values (with two values) in two dataframes
I'm fairly new to Python and Pandas and I'm currently working on a paper for university. I have a nested dictionary:
#Normaly this is a dict contains about 400 ID's and is created from an CSV file, so I just added an example dict.
ExampleDict= { 10001: 123123, 10002: 234234, 1003: 345345}
# Just pasted the function I'm using to create a base dict with 0 as values, these are changed later, but are not needed for this example.
def createVDict (BaseDict):
date_generated = pd.date_range(datetime.date(2019, 6, 1),datetime.date(2022, 1, 16))
emptyDict = defaultdict(dict)
for id in BaseDict:
for Date in date_generated:
emptyDict[id][Date.strftime('%Y-%m-%d')] = [0, 0]
return emptyDict
Dict_to_transform = createVDict(ExampleDict)
I now want to transform this Dict into two seperate Dataframes, one for each Value in the list:
| ID 1 | ID 2 | |
|---|---|---|
| Date 1 | Value1 | Value1 |
| Date 2 | Value1 | Value1 |
| ID 1 | ID 2 | |
|---|---|---|
| Date 1 | Value2 | Value2 |
| Date 2 | Value2 | Value2 |
From the information I gathered in other posts I think the way it is supposed to work is like this:
- From Dataframe from Dict:
DictDF = pd.DataFrame.from_dict(Nested_Dict(),orient='columns'):
| ID 1 | ID 2 | |
|---|---|---|
| Date 1 | [Value1, Value2] | [Value1, Value2] |
| Date 2 | [Value1, Value2] | [Value1, Value2] |
Split Columns using
pd.DataFrame(DictDF[ID].to_list(), columns=str(ID)+'V1',str(ID)+'V2'])Safe those columns as a list (as one should not alter Dataframes for perfomance sake).
Create two dataframes from the lists created.
My problem is that I'm not able to combine the different answers I found for the steps. I hope you guys can help me out. My goal is, to compare the two dataframes (for each value) with two other dataframes (which have the same structure) using .corrwith(), so that I have the correlation for value 1 and value 2.
Solution 1:[1]
With Pandas, it also can be pretty straight forward:
Strategy and comments:
(Value3 added to clarify) in comments and code
- Read the nested dict as a dict of dataframes
- Concat the result in a dataframe
df: ID1 ID2 Date1 Date2 Date1 Date2 0 ID1D1Value1 ID1D2Value1 ID2D1Value1 ID2D2Value1 1 ID1D1Value2 ID1D2Value2 ID2D1Value2 ID2D2Value2 2 ID1D1Value3 ID1D2Value3 ID2D1Value3 ID2D2Value3 - stack: last columns multiindex level goes to last multiindex level
df: ID1 ID2 0 Date1 ID1D1Value1 ID2D1Value1 Date2 ID1D2Value1 ID2D2Value1 1 Date1 ID1D1Value2 ID2D1Value2 Date2 ID1D2Value2 ID2D2Value2 2 Date1 ID1D1Value3 ID2D1Value3 Date2 ID1D2Value3 ID2D2Value3 - extract the sub dataframes for the first level index values (you need to filter to remove duplicates)
Code:
Data:
data = {
'ID1': {'Date1': ['ID1D1Value1', 'ID1D1Value2', 'ID1D1Value3'], 'Date2': ['ID1D2Value1', 'ID1D2Value2', 'ID1D2Value3']},
'ID2': {'Date1': ['ID2D1Value1', 'ID2D1Value2', 'ID2D1Value3'], 'Date2': ['ID2D2Value1', 'ID2D2Value2', 'ID2D2Value3']}
}
Processing:
df = (pd.concat({k: pd.DataFrame(v) for k,v in data.items()}, axis=1)
.stack()
)
dfs = [df.loc[i] for i in df.index.get_level_values(0).unique()]
Result:
>>>
dfs[0]
ID1 ID2
Date1 ID1D1Value1 ID2D1Value1
Date2 ID1D2Value1 ID2D2Value1
dfs[1]
ID1 ID2
Date1 ID1D1Value2 ID2D1Value2
Date2 ID1D2Value2 ID2D2Value2
dfs[2]
ID1 ID2
Date1 ID1D1Value3 ID2D1Value3
Date2 ID1D2Value3 ID2D2Value3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
