'Pandas loop over CSV files to find a column by name
Could someone give me a tip with Pandas on how I could loop over csv files in a directory, find a columns in the CSV files called Temp where then the values of the columns need to be converted from degree C to degrees F, something like degF = degC * 1.8 + 32
I think I am close but the last bit errors out:
import pandas as pd
import os
import glob
# use glob to get all the csv files
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f)
# print the location and filename
file_name = f.split("\\")[-1]
print('File Name Original:', file_name)
# print the content
print('Columns:', df.columns)
# Find Columns with Temp in the Column Name
temp_cols = [col for col in df.columns if 'Temp' in col]
# print the content
print('temp_cols Columns:', temp_cols)
for i in range(len(temp_cols)):
print(df.temp_cols[i].values)
Prints a few lines then errors:
File Name Original: ADMIN FRONT DESK.csv
Columns: Index(['Date', 'Temp', 'RH', 'CO2'], dtype='object')
temp_cols Columns: ['Temp']
Traceback (most recent call last):
File "C:\OneDrive - \fix_temp.py", line 34, in <module>
print(df.temp_cols[i].values)
File "C:\Python39\lib\site-packages\pandas\core\generic.py", line 5465, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'temp_cols'
This will print the name of the dataframe column, how do I modify?
for i in range(len(temp_cols)):
#print(df.temp_cols[i].values)
print(temp_cols[i])
Solution 1:[1]
I'm not completely clear on the question, but this should allow you to find if the 'Temp' column is in a DataFrame:
import pandas as pd
import os
import glob
# use glob to get all the csv files
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f)
# print the location and filename
file_name = f.split("\\")[-1]
print('File Name Original:', file_name)
# print the content
print('Columns:', df.columns)
columns = df.columns
if 'Temp' in columns:
df['Temp_F'] = (df['Temp'] * 1.8) + 32
Solution 2:[2]
If your dataset has multiple target columns to modify with the help of @speeder answer, I still needed this:
# Find Columns with Temp in the Column Name
temp_cols = [col for col in df.columns if 'Temp' in col]
# print the content
print('temp_cols Columns:', temp_cols)
for target in temp_cols:
print(f'in the loop fixing the {target} column...')
df[target] = (df[target] * 1.8) + 32
Complete script to loop over multiple CSV files that may have multiple target columns to modify. My target columns is temp or a column that contains the name of temp:
import os
import glob
import pandas as pd
# use glob to get all the csv files
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f)
# print the location and filename
file_name = f.split("\\")[-1]
print('File Name Original:', file_name)
# print the content
print('Columns:', df.columns)
# Find Columns with Temp in the Column Name
temp_cols = [col for col in df.columns if 'Temp' in col]
# print the content
print('temp_cols Columns:', temp_cols)
for target in temp_cols:
print(f'in the loop fixing the {target} column...')
df[target] = (df[target] * 1.8) + 32
print(f"File {file_name} done!")
df = df.dropna()
df.to_csv(file_name, index=False)
print(f"File {file_name} saved success!")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | speeder1987 |
| Solution 2 | bbartling |
