'Pandas: Populate rows for same index in a group with different values
Sample df:
In [2004]: df
Out[2004]:
index table_name column_name data_type default max_length
0 0 f_person active integer NaN NaN
1 0 f_person actv integer NaN NaN
2 5 f_person ssn varchar NaN 256.0
3 5 f_person ssn varchar NaN 99.0
4 6 f_person pl varchar 10.0 256.0
5 6 f_person pl bigint NaN 256.0
6 8 f_person prefix varchar NaN 256.0
7 8 f_person prefix integer NaN 256.0
For the same index, I want to add a new column schema and populate different values for each row. Number of rows per group will be always <= 2.
Expected Output:
In [2006]: df
Out[2006]:
index table_name column_name data_type default max_length schema
0 0 f_person active integer NaN NaN s1
1 0 f_person actv integer NaN NaN s2
2 5 f_person ssn varchar NaN 256.0 s1
3 5 f_person ssn varchar NaN 99.0 s2
4 6 f_person pl varchar 10.0 256.0 s1
5 6 f_person pl bigint NaN 256.0 s2
6 8 f_person prefix varchar NaN 256.0 s1
7 8 f_person prefix integer NaN 256.0 s2
I solved it using a for loop, but there must be a better way. Can someone please suggest a more pandaic way?
Solution 1:[1]
Try with groupby.cumcount() which performs a grouped cumulative count and concatenate that with a simple 's':
df['schema'] = 's' + df.groupby('index').cumcount().add(1).astype(str)
df[['index','schema']]
Out[53]:
index schema
0 0 s1
1 0 s2
2 5 s1
3 5 s2
4 6 s1
5 6 s2
6 8 s1
7 8 s2
I have a feeling I might be misuderstanding your question, so apologies, if that.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
