'errors when hashing data in Python

I'm hoping somebody could please assist. When running the following code (below) in Jupyter notebook, I get an error

dummydata["ID_NUMBER"] = dummydata["ID_NUMBER"].to_string()
def clean_dummydata(dummydata,cols):
    for col_name in cols:
        keys = {cats: i for i,cats in str(hash(dummydata[col_name].unique()))}
        dummydata[col_name] = dummydata[col_name].apply(lambda x: keys[x])
        return dummydata
    
cols = ['ID_NUMBER'] 
dummydata = clean_dummydata(dummydata,cols)
dummydata.to_csv('anon_dummydata.csv')

This is the error:

TypeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_3140/2100616149.py in 7 8 cols = ['ID_NUMBER'] ----> 9 dummydata = clean_dummydata(dummydata,cols) 10 dummydata.to_csv('anon_dummydata.csv')

~\AppData\Local\Temp/ipykernel_3140/2100616149.py in clean_dummydata(dummydata, cols) 2 def clean_dummydata(dummydata,cols): 3 for col_name in cols: ----> 4 keys = {cats: i for i,cats in str(hash(dummydata[col_name].unique()))} 5 dummydata[col_name] = dummydata[col_name].apply(lambda x: keys[x]) 6 return dummydata

TypeError: unhashable type: 'numpy.ndarray'



Solution 1:[1]

Mutable types like NumPy arrays and lists are not hashable because they could change and break the lookup based on the hashing algorithm.

So, you can use hash only with immutable datatypes like a tuple. So, you can convert your numpy array into a tuple and then hash it, for eg:

import numpy as np
z = np.array(['one', 'two', 'three'])
tuple_z = tuple(z)
hash_z = hash(z)

and it should run perfectly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Prats