'Combine csv files with same name from different subfolders in to one csv
I have three CSV files each for a particular filename for multiple files. Let's say there are a total 20 filenames so total 20* 3csv files in three different folders.
Folder A- 1001.CSV,1002.CSV,1003.CSV...
Folder B-1001.CSV,1002.CSV,1003.CSV
Folder C-1001.csv,1002.csv,1003.csv......
I want to get a single CSV file for each 1001,1002,1003,1004..... So total 20csv files
How can I do this? Since the files are in different folders glob is not working(or I don't know how to)
Solution 1:[1]
I made the following assumptions:
- all the subfolders will be rooted at some known directory "parentdir"
- each subfolder contains only relevant csv files
- the csv files do not contain any header/footer lines
- each record in the csv files is separated by a newline
- all of the records in each file are relevant
This should produce a "concat.csv" file in each subfolder with the contents of all the other files in that same folder. I used a snippet of code from this other answer on stackoverflow for actually concatenating the files.
import os
import fileinput
rootdir = 'C:\\Users\\myname\\Desktop\\parentdir'
os.chdir(rootdir)
children = os.listdir()
for i in children:
path = os.path.join(rootdir, i)
os.chdir(path)
filenames = os.listdir()
with open('concat.csv', 'w') as fout, fileinput.input(filenames) as fin:
for line in fin:
fout.write(line + '\n')
Solution 2:[2]
import os
import shutil
import glob
import pandas as pd
path = '/mypath/'
# rename files
count = 1
for root, dirs, files in os.walk(path):
for i in files:
if i == 'whatever.csv':
os.rename(os.path.join(root, i), os.path.join(root, "whatever" + str(count) + ".csv"))
count += 1
# delete unwanted files
main_dir = path
folders = os.listdir(main_dir)
for (dirname, dirs, files) in os.walk(main_dir):
for file in files:
if file.startswith('dontwant'):
source_file = os.path.join(dirname, file)
os.remove(source_file)
# copy files to dir
for root, dirs, files in os.walk(path): # replace the . with your starting directory
for file in files:
if file.endswith('.csv'):
path_file = os.path.join(root,file)
shutil.copy2(path_file,path) # change you destination dir
# combine files
os.chdir(path)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user11321561 |
| Solution 2 | David A |
