'Is there a way to loop through and based a single column value and mark a value into multiple new columns in Pandas?
The dataframe would look something similar to this:
start = [0,2,4,5,1]
end = [3,5,5,5,2]
df = pd.DataFrame({'start': start,'end': end})
The result I want look something like this: Basically marking a value from start to finish across multiple columns. So if one that start on 0 and ends on 3 I want to mark new column 0 to 3 with a value(1) and the rest with 0.
start = [0,2,4,5,1]
end = [3,5,5,5,2]
diff = [3,3,1,0,1]
col_0 = [1,0,0,0,0]
col_1=[1,0,0,0,1]
col_2 = [1,1,0,0,1]
col_3=[1,1,0,0,0]
col_4=[0,1,1,0,0]
col_5=[0,1,1,1,0]
df = pd.DataFrame({'start': start,'end': end, 'col_0':col_0, 'col_1': col_1, 'col_2': col_2, 'col_3':col_3, 'col_4': col_4, 'col_5': col_5})
start end col_0 col_1 col_2 col_3 col_4 col_5
0 3 1 1 1 1 0 0
2 5 0 0 1 1 1 1
4 5 0 0 0 0 1 1
5 5 0 0 0 0 0 1
1 2 0 1 1 0 0 0
Solution 1:[1]
Convert your range from start to stop to a list of indices then explode it. Finally, use indexing to set values to 1:
import numpy as np
range_to_ind = lambda x: range(x['start'], x['end']+1)
(i, j) = df.apply(range_to_ind, axis=1).explode().astype(int).reset_index().values.T
a = np.zeros((df.shape[0], max(df['end'])+1), dtype=int)
a[i, j] = 1
df = df.join(pd.DataFrame(a).add_prefix('col_'))
Output:
>>> df
start end col_0 col_1 col_2 col_3 col_4 col_5
0 0 3 1 1 1 1 0 0
1 2 5 0 0 1 1 1 1
2 4 5 0 0 0 0 1 1
3 5 5 0 0 0 0 0 1
4 1 2 0 1 1 0 0 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
