'unable to convert object data type to int using python pandas

I have a file named sample.csv. It looks like below:

no sample_id

30 7f6fe071848736985d3eaf751e498407416c3udhfy3hfbshj 23 897gfe071848736985d3eaf751e498407416c3udhfy3hfbshj 21 34frfe071848736985d3eaf751e498407416c3udhfy3hfbshj 100 1090e071848736985d3eaf751e498407416c3udhfy3hfbshj

If I try to change the sample_id column which contains 64 digit alpha numeric characters from object to int data type it is returning ValueError: invalid literal for int() with base 10 error.

I tried the below steps to convert, but nothing worked - all returned a value error:

df['sample_id'].astype(str).astype(int)
df['sample_id'] = pd.to_numeric(df['sample_id'])

df['sample_id'] = df.sample_id.astype(int)

the reason why i want to convert the alpha numeric object type to int data type is to apply hashing on the sample_id column using the below function.

import hashlib
def encrypt_id_sha256(sample_id):
    hashed_sample_id = hashlib.sha256(bytes(int(sample_id))).hexdigest()
``` return hashed_sample_id

df["sample_id"] = df["sample_id"].apply(encrypt_id_sha256)


changed the function to below to directly hash from hex..it worked for me..Thanks All..

hash = sha256(bytes.fromhex(sample_id)).hexdigest()


Solution 1:[1]

It looks like the numbers are hexadecimal (base-16) in which case turning them into an int normally will not work. Instead do this:

df["sample_id"].apply(int, base=16)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lecdi