'Extract indices of a grouped elements in Pandas
The objective is to extract the index number of a randomly selected grouped rows in Pandas.
Specifically, given a df
nval
0 4
1 4
2 0
...
23 0
24 4
...
29 4
30 4
31 0
I would like to extract each 5 random index of the element 0 and 4.
For example, the 5 randomly selected value for
0
can be
3,11,15,16,22
and
4
can be
6 9 7 29 27
Currently, the code below answer the above objective
import numpy as np
import numpy.random
import pandas as pd
np.random.seed(0)
dval=[4,4,0,0,0,0,4,4,0,4,0,0,4,4,0,0,0,0,4,
4,0,0,0,0,4,0,4,4,4,4,4,0,]
df = pd.DataFrame (dict(nval=dval))
cgroup=5
df=df.reset_index()
all_df=[]
for idx in [0,4]:
x=df[df['nval']==idx].reset_index(drop=True)
ids = np.random.choice(len(x), size=cgroup, replace=False).tolist()
all_df.append(x.iloc[ids].reset_index(drop=True))
df=pd.concat(all_df).reset_index(drop=True).sort_values(by=['index'])
sel_index=df[['index']]
Which produced
index
0 3
1 6
2 7
3 9
4 11
5 15
6 16
7 22
8 27
9 29
However, I wonder there is compact way of doing this using pandas or numpy?
Solution 1:[1]
IIUC, you can use
pd.DataFrame({'index': df.groupby('nval').sample(5).index.sort_values()})
I'd just keep the result as an index, so it simplifies to
df.groupby('nval').sample(5).index.sort_values()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | timgeb |
