'unable to convert object data type to int using python pandas
I have a file named sample.csv. It looks like below:
no sample_id
30 7f6fe071848736985d3eaf751e498407416c3udhfy3hfbshj 23 897gfe071848736985d3eaf751e498407416c3udhfy3hfbshj 21 34frfe071848736985d3eaf751e498407416c3udhfy3hfbshj 100 1090e071848736985d3eaf751e498407416c3udhfy3hfbshj
If I try to change the sample_id column which contains 64 digit alpha numeric characters from object to int data type it is returning ValueError: invalid literal for int() with base 10 error.
I tried the below steps to convert, but nothing worked - all returned a value error:
df['sample_id'].astype(str).astype(int)
df['sample_id'] = pd.to_numeric(df['sample_id'])
df['sample_id'] = df.sample_id.astype(int)
the reason why i want to convert the alpha numeric object type to int data type is to apply hashing on the sample_id column using the below function.
import hashlib
def encrypt_id_sha256(sample_id):
hashed_sample_id = hashlib.sha256(bytes(int(sample_id))).hexdigest()
``` return hashed_sample_id
df["sample_id"] = df["sample_id"].apply(encrypt_id_sha256)
changed the function to below to directly hash from hex..it worked for me..Thanks All..
hash = sha256(bytes.fromhex(sample_id)).hexdigest()
Solution 1:[1]
It looks like the numbers are hexadecimal (base-16) in which case turning them into an int normally will not work. Instead do this:
df["sample_id"].apply(int, base=16)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Lecdi |
