'How to find subset(s) in a dataframe column and return <subsetOf - >?
I just used .groupby and .agg to make my df as follows -
Name inclusionId
A 1 , 2
B 1 , 3
C 5 , 7
D 5 , 2 , 9 , 7 , 1
E 2 , 1 , 9
Now I want to check whether these are subsets of each other or not. Need output like below -
Name inclusionId Subset of -
A 1 , 2 E
B 1 , 3 No
C 5 , 7 D
D 5 , 2 , 9 , 7 , 1 No
E 2 , 1 , 9 D
Please help!
Solution 1:[1]
With pandas you can select
- all rows and limited columns
- all columns and limited rows
- limited rows and limited columns
you can select columns like this:
dataframe['column']
or
dataframe[['column1', 'column2' ]]
now to select the rows you point out the column and set a condition that only certain rows meet like the following:
population_500 = housing[housing['population']>500]
in here we select rows having population greater than 500
you can also use dataframe.loc(row_number/s) to select certain rows for example:
dataframe.loc[[1,5,7]]
and you can select both rows and columns also using .loc():
dataframe.loc[1:7,['column_1', 'column_2']]
where 1 and 7 refer to the rows numbers
you can also use .iloc() to select a subset of rows and columns:
dataframe.iloc[[2,3,6], [3, 5]]
Hope you find this helpful!
Solution 2:[2]
A little bit complicative
s = df.set_index('Name').inclusionId.str.get_dummies(',')
s = s.dot(s.T)
diag = np.diag(s).copy()
np.fill_diagonal(s.values,0)
df['new'] = s.eq(diag).T.dot(s.columns+',').str[:-1].values
Out[74]:
Name inclusionId new
0 A 1,2 D,E
1 B 1,3
2 C 5,7 D
3 D 5,2,9,7,1
4 E 2,1,9 D
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ahmad Ebrahim |
| Solution 2 | BENY |
