'Splitting nested dictionary with list as Values (with two values) in two dataframes

I'm fairly new to Python and Pandas and I'm currently working on a paper for university. I have a nested dictionary:

#Normaly this is a dict contains about 400 ID's and is created from an CSV file, so I just added an example dict.
ExampleDict= { 10001: 123123, 10002: 234234, 1003: 345345}

# Just pasted the function I'm using to create a base dict with 0 as values, these are changed later, but are not needed for this example.

def createVDict (BaseDict):
    date_generated = pd.date_range(datetime.date(2019, 6, 1),datetime.date(2022, 1, 16))
    emptyDict = defaultdict(dict)
    for id in BaseDict:
        for Date in date_generated:
            emptyDict[id][Date.strftime('%Y-%m-%d')] = [0, 0]
    return emptyDict

Dict_to_transform = createVDict(ExampleDict)

I now want to transform this Dict into two seperate Dataframes, one for each Value in the list:

ID 1 ID 2
Date 1 Value1 Value1
Date 2 Value1 Value1
ID 1 ID 2
Date 1 Value2 Value2
Date 2 Value2 Value2

From the information I gathered in other posts I think the way it is supposed to work is like this:

  1. From Dataframe from Dict: DictDF = pd.DataFrame.from_dict(Nested_Dict(),orient='columns'):
ID 1 ID 2
Date 1 [Value1, Value2] [Value1, Value2]
Date 2 [Value1, Value2] [Value1, Value2]
  1. Split Columns using pd.DataFrame(DictDF[ID].to_list(), columns=str(ID)+'V1',str(ID)+'V2'])

  2. Safe those columns as a list (as one should not alter Dataframes for perfomance sake).

  3. Create two dataframes from the lists created.

My problem is that I'm not able to combine the different answers I found for the steps. I hope you guys can help me out. My goal is, to compare the two dataframes (for each value) with two other dataframes (which have the same structure) using .corrwith(), so that I have the correlation for value 1 and value 2.



Solution 1:[1]

With Pandas, it also can be pretty straight forward:

Strategy and comments:

(Value3 added to clarify) in comments and code

  1. Read the nested dict as a dict of dataframes
  2. Concat the result in a dataframe
    df:
               ID1                       ID2             
             Date1        Date2        Date1        Date2
    0  ID1D1Value1  ID1D2Value1  ID2D1Value1  ID2D2Value1
    1  ID1D1Value2  ID1D2Value2  ID2D1Value2  ID2D2Value2
    2  ID1D1Value3  ID1D2Value3  ID2D1Value3  ID2D2Value3
    
  3. stack: last columns multiindex level goes to last multiindex level
    df:
                     ID1          ID2
    0 Date1  ID1D1Value1  ID2D1Value1
      Date2  ID1D2Value1  ID2D2Value1
    1 Date1  ID1D1Value2  ID2D1Value2
      Date2  ID1D2Value2  ID2D2Value2
    2 Date1  ID1D1Value3  ID2D1Value3
      Date2  ID1D2Value3  ID2D2Value3
    
  4. extract the sub dataframes for the first level index values (you need to filter to remove duplicates)

Code:

Data:

data = {
'ID1': {'Date1': ['ID1D1Value1', 'ID1D1Value2', 'ID1D1Value3'], 'Date2': ['ID1D2Value1', 'ID1D2Value2', 'ID1D2Value3']},
'ID2': {'Date1': ['ID2D1Value1', 'ID2D1Value2', 'ID2D1Value3'], 'Date2': ['ID2D2Value1', 'ID2D2Value2', 'ID2D2Value3']}
}

Processing:


df = (pd.concat({k: pd.DataFrame(v) for k,v in data.items()}, axis=1)
        .stack()
     )

dfs = [df.loc[i] for i in df.index.get_level_values(0).unique()]

Result:

>>>
dfs[0]
               ID1          ID2
Date1  ID1D1Value1  ID2D1Value1
Date2  ID1D2Value1  ID2D2Value1


dfs[1]
               ID1          ID2
Date1  ID1D1Value2  ID2D1Value2
Date2  ID1D2Value2  ID2D2Value2


dfs[2]
               ID1          ID2
Date1  ID1D1Value3  ID2D1Value3
Date2  ID1D2Value3  ID2D2Value3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1