'Dividing values in columns based on their previous marker
I have the following dataframe:
df = {'id': [1,2,3,4],
    '1': ['Green', 'Green', 'Green', 'Green'],
  '2': ['34','67', 'Blue', '77'],
  '3': ['Blue', '45', '99', 'Blue'],
  '4': ['12', None, '101', '56'],
     '5': [None, None, None, '23']}
df = pd.DataFrame(df)
    id  1       2     3     4       5
0   1   Green   34    Blue  12      None
1   2   Green   67    45    None    None
2   3   Green   Blue  99    101     None
3   4   Green   77    Blue  56      23
I would like to creating separate dataframes for GREEN and BLUE with their id number. For example the output of the GREEN DF would be:
  id   number
   1     34
   2     67
   2     45
   4     77
							
						Solution 1:[1]
You can try:
# set id as index
df = df.set_index('id')
# mask the color
mask = df.isin(['Green','Blue'])
# label the cells' color with `where` and `ffill`
# print `df.where(mask).ffill(axis=1)` if you want to know how it looks
out = pd.DataFrame({'color':df.where(mask).ffill(axis=1).stack(),
                    'value':df.mask(mask).stack()
                  }).dropna()
# blue color
out.query('color=="Blue"')
Output:
     color value
id              
1  4  Blue    12
3  3  Blue    99
   4  Blue   101
4  4  Blue    56
   5  Blue    23
Note first level index is the old column, you can drop it if you don't want to.
Solution 2:[2]
Check
df.set_index('id',inplace=True)
out = df.where(df.where(lambda x : x.isin(['Green','Blue'])).ffill(axis=1).eq('Green')).stack().loc[lambda x : x!='Green']
Out[571]: 
id   
1   2    34
2   2    67
    3    45
4   2    77
dtype: object
    					Solution 3:[3]
Here is one approach, by stacking first to Series:
# set id as index stack to Series (removing the NaNs)
s = df.set_index('id').stack()
# which values are not Green/Blue?
m = ~s.isin(['Green', 'Blue'])
# ffill and select the non Green/Blue positions
df2 = pd.DataFrame({'color': s.mask(m).ffill().loc[m],
                    'number': s.loc[m]},
                   ).droplevel(1) # keep level if you want the column labels
output:
    color number
id              
1   Green     34
1    Blue     12
2   Green     67
2   Green     45
3    Blue     99
3    Blue    101
4   Green     77
4    Blue     56
4    Blue     23
    					Solution 4:[4]
I'd create a dictionary of dataframes with the key of colors like this:
dfm =  df.melt('id').sort_values(['id', 'variable'])[['value']]
dfm['group'] = dfm.where(dfm.isin(['Green', 'Blue'])).ffill()['value']
c_dict = dict(tuple(dfm.dropna(how='any').query('value != group').groupby('group')[['value']]))
Output:
#c_dict['Blue']
   value
12    12
10    99
14   101
15    56
19    23
and
#c_dict['Green']
  value
4    34
5    67
9    45
7    77
    					Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | Quang Hoang | 
| Solution 2 | BENY | 
| Solution 3 | mozway | 
| Solution 4 | Scott Boston | 
