'Duplicated IDs pandas
I have the following dataframes (df1,df2):
| ID | Q1 |
|---|---|
| 111 | 2 |
| 111 | 3 |
| 112 | 1 |
| ID | Q2 |
|---|---|
| 111 | 1 |
| 111 | 5 |
| 112 | 7 |
Since the IDs are duplicated, I want to reinitialize them, using the following code:
df1.sort_values('ID',inplace=True)
df1['ID_new'] = range(len(df1))
df2.sort_values('ID',inplace=True)
df2['ID_new'] = range(len(df2))
in order to have smth like this:
| ID_new | ID | Q1 |
|---|---|---|
| 0 | 111 | 2 |
| 1 | 111 | 3 |
| 2 | 112 | 1 |
| ID_new | ID | Q2 |
|---|---|---|
| 0 | 111 | 1 |
| 1 | 111 | 5 |
| 2 | 112 | 7 |
The question is: are we sure that the ID_new will be the same for df1 and df2?
For example:
is it possible that ID_new = 1 corresponds to the first ID=111 in df1 and to the second ID = 111 in df2?
If yes, there is another way to reinitialize it in a more robust way?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
