'How to find subset(s) in a dataframe column and return <subsetOf - >?

I just used .groupby and .agg to make my df as follows -

  Name inclusionId            
    A   1 , 2                  
    B   1 , 3                  
    C   5 , 7                  
    D   5 , 2 , 9 , 7 , 1     
    E   2 , 1 , 9              

Now I want to check whether these are subsets of each other or not. Need output like below -

 Name inclusionId            Subset of -
    A   1 , 2                  E
    B   1 , 3                  No
    C   5 , 7                  D
    D   5 , 2 , 9 , 7 , 1      No
    E   2 , 1 , 9              D

Please help!



Solution 1:[1]

With pandas you can select

  1. all rows and limited columns
  2. all columns and limited rows
  3. limited rows and limited columns

you can select columns like this:

dataframe['column']

or

dataframe[['column1', 'column2' ]]

now to select the rows you point out the column and set a condition that only certain rows meet like the following:

population_500 = housing[housing['population']>500]

in here we select rows having population greater than 500

you can also use dataframe.loc(row_number/s) to select certain rows for example:

dataframe.loc[[1,5,7]]

and you can select both rows and columns also using .loc():

dataframe.loc[1:7,['column_1', 'column_2']]

where 1 and 7 refer to the rows numbers

you can also use .iloc() to select a subset of rows and columns:

dataframe.iloc[[2,3,6], [3, 5]]

Hope you find this helpful!

Solution 2:[2]

A little bit complicative

s = df.set_index('Name').inclusionId.str.get_dummies(',')
s = s.dot(s.T)
diag = np.diag(s).copy()
np.fill_diagonal(s.values,0)
df['new'] = s.eq(diag).T.dot(s.columns+',').str[:-1].values
Out[74]: 
  Name inclusionId  new
0    A         1,2  D,E
1    B         1,3     
2    C         5,7    D
3    D   5,2,9,7,1     
4    E       2,1,9    D  

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ahmad Ebrahim
Solution 2 BENY