'Splitting single row to multiple unique rows Pandas Permutations
I have a problem where I have to split one row into 8 combinations of rows.
Example I have 8 columns -- first 6 belonging to each face of coin. and last two any dummy columns. As shown below(this is the old df)
Date, Coin1_Face_1, Coin1_Face_2, Coin2_Face_1, Coin2_Face_2, Coin3_Face_1, Coin3_Face_2, Random1, Random2
1-Jan H T H T H T X Y
2-Jan P Q P Q P Q A B
I want to reduce number of columns but increase number of rows as shown below(this should be the new_df):
Date, Coin_1, Coin_2, Coin_3, Random1, Random2
1-Jan H H H X Y
1-Jan H H T X Y
1-Jan H T H X Y
1-Jan H T T X Y
1-Jan T H H X Y
1-Jan T H T X Y
1-Jan T T H X Y
1-Jan T T T X Y
2-Jan P P P A B
2-Jan P P Q A B
2-Jan P Q P A B
2-Jan P Q Q A B
2-Jan Q P P A B
2-Jan Q P Q A B
2-Jan Q Q P A B
2-Jan Q Q Q A B
Can anyone pls help.
Solution 1:[1]
Try using pd.wide_to_long
and I am not sure about your row expansion, you either have to many or to few rows in your expected output. Please explain how to expand.
pd.wide_to_long(df,['Coin1', 'Coin2', 'Coin3'], ['Date', 'Random1', 'Random2'], 'j', '_', '.*').reset_index()
Output:
| | Date | Random1 | Random2 | j | Coin1 | Coin2 | Coin3 |
|---:|:-------|:----------|:----------|:-------|:--------|:--------|:--------|
| 0 | 1-Jan | X | Y | Face_1 | H | H | H |
| 1 | 1-Jan | X | Y | Face_2 | T | T | T |
| 2 | 2-Jan | A | B | Face_1 | P | P | P |
| 3 | 2-Jan | A | B | Face_2 | Q | Q | Q |
Try:
from itertools import combinations
df_m = pd.wide_to_long(df,['Coin1', 'Coin2', 'Coin3'], ['Date', 'Random1', 'Random2'], 'j', '_', '.*')
def f(g):
g = g.reset_index(level=3, drop=True)
indx = g.index.drop_duplicates()
data = list(combinations(g.to_numpy().flatten('F'), 3))
return (pd.DataFrame(data,
columns=['Coin1', 'Coin2', 'Coin3'],
index =indx.repeat(len(data)))
.drop_duplicates())
df_out = df_m.groupby(level=[0,1,2], as_index=False).apply(f).reset_index().drop('level_0', axis=1)
print(df_out)
Output:
Date Random1 Random2 Coin1 Coin2 Coin3
0 1-Jan X Y H T H
1 1-Jan X Y H T T
2 1-Jan X Y H H T
3 1-Jan X Y H H H
4 1-Jan X Y T H T
5 1-Jan X Y T H H
6 1-Jan X Y T T H
7 1-Jan X Y T T T
8 2-Jan A B P Q P
9 2-Jan A B P Q Q
10 2-Jan A B P P Q
11 2-Jan A B P P P
12 2-Jan A B Q P Q
13 2-Jan A B Q P P
14 2-Jan A B Q Q P
15 2-Jan A B Q Q Q
Solution 2:[2]
A combination of pivot_longer and complete from pyjanitor can help with the reshaping and row generation ; pivot_longer transforms to long form, while complete introduces missing rows:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
column_names = "Coin*",
names_to = ".value",
names_pattern = "([A-Za-z]+\d)_*",
sort_by_appearance=True)
.complete(
'Coin1','Coin2','Coin3',
by = ['Date', 'Random1', 'Random2'])
)
Date Random1 Random2 Coin1 Coin2 Coin3
0 1-Jan X Y H H H
1 1-Jan X Y H H T
2 1-Jan X Y H T H
3 1-Jan X Y H T T
4 1-Jan X Y T H H
5 1-Jan X Y T H T
6 1-Jan X Y T T H
7 1-Jan X Y T T T
8 2-Jan A B P P P
9 2-Jan A B P P Q
10 2-Jan A B P Q P
11 2-Jan A B P Q Q
12 2-Jan A B Q P P
13 2-Jan A B Q P Q
14 2-Jan A B Q Q P
15 2-Jan A B Q Q Q
The .value
in pivot_longer determines which part of the columns stay as headers. The complete function simply combines and introduces new rows based on the by
grouping of Date
, Random1
and Random2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | sammywemmy |