'Replacing names with IDs in two datasets
I have two datasets: one includes customer's family details and another one includes classes associated with those customers. I would like to replace the name of customers with IDs for privacy reasons. An example of data is
dataset 1 (customer's family relationships)
Customer Relative Age Note
Amber Bryan Viola Walter 22 none
Amber Bryan Christopher Lyl 22 none
Viola Walter Stephan Said 43 xxx
Sion X. Martin Grey 64 none
dataset 2 (classes)
Customer Class Age
Amber Bryan 1 22
Viola Walter 2 43
Christopher Lyl -2 41
Stephan Said 1 42
Sion X. 0 64
Martin Grey 1 34
I would like to get the following datasets:
Customer Relative Age Note
1 2 22 none
1 3 22 none
2 4 43 xxx
5 6 64 none
and
Customer Class Age
1 1 22
2 2 43
3 -2 41
4 1 42
5 0 64
6 1 34
It would be good if the number of IDs would be given based on the list provided in the dataset 2.
I am thinking of creating an index column for the dataset 2 but I do not know how to use this information in dataset 1, also considering that I would need to assign IDs for both Customer and Relative.
Solution 1:[1]
I think this should work, you can create a dictionary to asign unique identification numbers for each customers and then, in whatever dataframe you need, you just pull that customers id.
customersid = dict(zip(range(1, df['Customer'].unique().count()+1), df['Customer'].unique()))
df['Customer ID'] = customersid.get(df['Customer'])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | luka1156 |
