'errors when hashing data in Python
I'm hoping somebody could please assist. When running the following code (below) in Jupyter notebook, I get an error
dummydata["ID_NUMBER"] = dummydata["ID_NUMBER"].to_string()
def clean_dummydata(dummydata,cols):
for col_name in cols:
keys = {cats: i for i,cats in str(hash(dummydata[col_name].unique()))}
dummydata[col_name] = dummydata[col_name].apply(lambda x: keys[x])
return dummydata
cols = ['ID_NUMBER']
dummydata = clean_dummydata(dummydata,cols)
dummydata.to_csv('anon_dummydata.csv')
This is the error:
TypeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_3140/2100616149.py in 7 8 cols = ['ID_NUMBER'] ----> 9 dummydata = clean_dummydata(dummydata,cols) 10 dummydata.to_csv('anon_dummydata.csv')
~\AppData\Local\Temp/ipykernel_3140/2100616149.py in clean_dummydata(dummydata, cols) 2 def clean_dummydata(dummydata,cols): 3 for col_name in cols: ----> 4 keys = {cats: i for i,cats in str(hash(dummydata[col_name].unique()))} 5 dummydata[col_name] = dummydata[col_name].apply(lambda x: keys[x]) 6 return dummydata
TypeError: unhashable type: 'numpy.ndarray'
Solution 1:[1]
Mutable types like NumPy arrays and lists are not hashable because they could change and break the lookup based on the hashing algorithm.
So, you can use hash only with immutable datatypes like a tuple. So, you can convert your numpy array into a tuple and then hash it, for eg:
import numpy as np
z = np.array(['one', 'two', 'three'])
tuple_z = tuple(z)
hash_z = hash(z)
and it should run perfectly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Prats |
