'Converting excel multiple columns data types to appropriate data types
I have an excel file with 245 columns, however, most of the columns data types are object data types which is wrong, so I'd like to convert all the columns to their appropriate data types i.e., strings, float, int and bool for easy preprocessing.
I tried to get the value of the first column and use that to convert the whole column based on that first value but it's not working. I have tried to use the function convert_dtyes() and infer_objects() but to no avail. I would do it manually but the columns are just a lot and have long names.
Here I am looping through all the columns then get the first value, and use that to convert the whole column data type.
for col in df.columns:
# check for the first value of each column
first_value = df[col].iloc[0]
# check the data type of the first value is 'NaN' or '---'def
if pd.isna(first_value) or first_value == '---':
# move to second row in the column
df[col] = df[col].shift(1)
# if the second row is also 'NaN' or '---' then just pass
if pd.isna(df[col].iloc[0]) or df[col].iloc[0] == '---':
pass
# check if the first value is a sting
elif isinstance(first_value, str):
# check if the first value is a date
if re.match(r'\d{4}-\d{2}-\d{2}', first_value):
# convert the column to datetime
df[col] = pd.to_datetime(df[col])
else:
# convert the column to string
df[col] = df[col].astype(str)
# check if the first value is a float
elif isinstance(first_value, float):
# convert the column to float
df[col] = df[col].astype(float)
# check if the first value is an integer
elif isinstance(first_value, int):
# convert the column to integer
df[col] = df[col].astype(int)
# else do nothing
else:
pass
I also tried using regex matching the values but it still won't change, I don't know whether this is a thing in excel or whether the only way to do it is manual.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
