'Replacing names with IDs in two datasets

I have two datasets: one includes customer's family details and another one includes classes associated with those customers. I would like to replace the name of customers with IDs for privacy reasons. An example of data is

dataset 1 (customer's family relationships)

Customer       Relative        Age   Note
Amber Bryan   Viola Walter      22    none
Amber Bryan   Christopher Lyl   22    none
Viola Walter  Stephan Said      43    xxx
Sion X.       Martin Grey              64    none

dataset 2 (classes)

Customer    Class      Age 
Amber Bryan 1          22
Viola Walter 2         43
Christopher Lyl  -2    41
Stephan Said     1     42
Sion X.          0     64
Martin Grey     1      34

I would like to get the following datasets:

Customer       Relative  Age   Note
1                 2      22    none
1                 3      22    none
2                 4      43    xxx
5                 6      64    none

and

Customer    Class      Age 
    1         1         22
    2         2         43
    3        -2         41
    4         1         42
    5         0         64
    6         1         34

It would be good if the number of IDs would be given based on the list provided in the dataset 2.

I am thinking of creating an index column for the dataset 2 but I do not know how to use this information in dataset 1, also considering that I would need to assign IDs for both Customer and Relative.



Solution 1:[1]

I think this should work, you can create a dictionary to asign unique identification numbers for each customers and then, in whatever dataframe you need, you just pull that customers id.

customersid = dict(zip(range(1, df['Customer'].unique().count()+1), df['Customer'].unique()))
      
df['Customer ID'] = customersid.get(df['Customer'])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 luka1156