'Pandas / python to autogenerate new values through a range
I have a dataframe read in from some CSV data. I have two numbers in seperate columns, representing a range and would like to do the following;
A) iterate through the data set B) At each new index, calculate the range C) create a total number of new rows based on the sum of that range (doesn't need to be in the existing dataframe) D) And for each new row, insert incremental values through that range, concatanating specified column data, and retaining the original additional details from the other columns.
Example:
| value | range1 | range2 | color |
|---|---|---|---|
| a | 0 | 3 | blue |
| b | 4 | 6 | yellow |
Result:
| output | value | colour |
|---|---|---|
| a1 | a | blue |
| a2 | a | blue |
| a3 | a | blue |
| b4 | b | yellow |
| b5 | b | yellow |
| b6 | b | yellow |
Solution 1:[1]
Here's one way to do it:
import pandas as pd
df = pd.DataFrame({
'value' : ['a', 'b'],
'range1' : [0, 4],
'range2' : [3, 6],
'color' : ['blue', 'yellow']
})
df['output'] = df.apply(lambda x: [x['value'] + str(i + 1) for i in range(x['range1'], x['range2'])], axis=1)
df = df.explode('output', ignore_index=True)[['output', 'value', 'color']]
print(df)
Output
output value color
0 a1 a blue
1 a2 a blue
2 a3 a blue
3 b5 b yellow
4 b6 b yellow
Solution 2:[2]
Apart from that it's not clear why range 0 is not counted,
but here is one way:
df = df.loc[df.index.repeat(df.range2 - df.range1+1)]
df['output'] = df.value + (df.range1 + df.groupby('value').cumcount()).map(str)
output:
>>
value range1 range2 color output
0 a 0 3 blue a0
0 a 0 3 blue a1
0 a 0 3 blue a2
0 a 0 3 blue a3
1 b 4 6 yellow b4
1 b 4 6 yellow b5
1 b 4 6 yellow b6
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | constantstranger |
| Solution 2 |
