'Splitting single row to multiple unique rows Pandas Permutations

I have a problem where I have to split one row into 8 combinations of rows.

Example I have 8 columns -- first 6 belonging to each face of coin. and last two any dummy columns. As shown below(this is the old df)

Date, Coin1_Face_1, Coin1_Face_2, Coin2_Face_1, Coin2_Face_2, Coin3_Face_1, Coin3_Face_2, Random1, Random2

1-Jan   H   T   H   T   H   T   X   Y
2-Jan   P   Q   P   Q   P   Q   A   B

I want to reduce number of columns but increase number of rows as shown below(this should be the new_df):

Date, Coin_1, Coin_2, Coin_3, Random1, Random2
1-Jan   H   H   H   X   Y
1-Jan   H   H   T   X   Y
1-Jan   H   T   H   X   Y
1-Jan   H   T   T   X   Y
1-Jan   T   H   H   X   Y
1-Jan   T   H   T   X   Y
1-Jan   T   T   H   X   Y
1-Jan   T   T   T   X   Y
2-Jan   P   P   P   A   B
2-Jan   P   P   Q   A   B
2-Jan   P   Q   P   A   B
2-Jan   P   Q   Q   A   B
2-Jan   Q   P   P   A   B
2-Jan   Q   P   Q   A   B
2-Jan   Q   Q   P   A   B
2-Jan   Q   Q   Q   A   B

Can anyone pls help.



Solution 1:[1]

Try using pd.wide_to_long and I am not sure about your row expansion, you either have to many or to few rows in your expected output. Please explain how to expand.

pd.wide_to_long(df,['Coin1', 'Coin2', 'Coin3'], ['Date', 'Random1', 'Random2'], 'j', '_', '.*').reset_index()

Output:

|    | Date   | Random1   | Random2   | j      | Coin1   | Coin2   | Coin3   |
|---:|:-------|:----------|:----------|:-------|:--------|:--------|:--------|
|  0 | 1-Jan  | X         | Y         | Face_1 | H       | H       | H       |
|  1 | 1-Jan  | X         | Y         | Face_2 | T       | T       | T       |
|  2 | 2-Jan  | A         | B         | Face_1 | P       | P       | P       |
|  3 | 2-Jan  | A         | B         | Face_2 | Q       | Q       | Q       |

Try:

from itertools import combinations

df_m = pd.wide_to_long(df,['Coin1', 'Coin2', 'Coin3'], ['Date', 'Random1', 'Random2'], 'j', '_', '.*')

def f(g):
    g = g.reset_index(level=3, drop=True)
    indx = g.index.drop_duplicates()
    data = list(combinations(g.to_numpy().flatten('F'), 3))
    return (pd.DataFrame(data, 
                         columns=['Coin1', 'Coin2', 'Coin3'], 
                         index =indx.repeat(len(data)))
              .drop_duplicates())
    

df_out = df_m.groupby(level=[0,1,2], as_index=False).apply(f).reset_index().drop('level_0', axis=1)  
print(df_out)

Output:

     Date Random1 Random2 Coin1 Coin2 Coin3
0   1-Jan       X       Y     H     T     H
1   1-Jan       X       Y     H     T     T
2   1-Jan       X       Y     H     H     T
3   1-Jan       X       Y     H     H     H
4   1-Jan       X       Y     T     H     T
5   1-Jan       X       Y     T     H     H
6   1-Jan       X       Y     T     T     H
7   1-Jan       X       Y     T     T     T
8   2-Jan       A       B     P     Q     P
9   2-Jan       A       B     P     Q     Q
10  2-Jan       A       B     P     P     Q
11  2-Jan       A       B     P     P     P
12  2-Jan       A       B     Q     P     Q
13  2-Jan       A       B     Q     P     P
14  2-Jan       A       B     Q     Q     P
15  2-Jan       A       B     Q     Q     Q

Solution 2:[2]

A combination of pivot_longer and complete from pyjanitor can help with the reshaping and row generation ; pivot_longer transforms to long form, while complete introduces missing rows:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
    column_names = "Coin*", 
    names_to = ".value", 
    names_pattern = "([A-Za-z]+\d)_*", 
    sort_by_appearance=True)
.complete(
    'Coin1','Coin2','Coin3', 
     by = ['Date', 'Random1', 'Random2'])
)

     Date Random1 Random2 Coin1 Coin2 Coin3
0   1-Jan       X       Y     H     H     H
1   1-Jan       X       Y     H     H     T
2   1-Jan       X       Y     H     T     H
3   1-Jan       X       Y     H     T     T
4   1-Jan       X       Y     T     H     H
5   1-Jan       X       Y     T     H     T
6   1-Jan       X       Y     T     T     H
7   1-Jan       X       Y     T     T     T
8   2-Jan       A       B     P     P     P
9   2-Jan       A       B     P     P     Q
10  2-Jan       A       B     P     Q     P
11  2-Jan       A       B     P     Q     Q
12  2-Jan       A       B     Q     P     P
13  2-Jan       A       B     Q     P     Q
14  2-Jan       A       B     Q     Q     P
15  2-Jan       A       B     Q     Q     Q

The .value in pivot_longer determines which part of the columns stay as headers. The complete function simply combines and introduces new rows based on the by grouping of Date, Random1 and Random2

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 sammywemmy