'How to create a single data frame by combining multiple CSV with exact same columns name side by side
i have 3 csv files with exact similar columns and i want to put them side by side in a pandas dataframe. I also want to make sure the column headers include the file name so i can identify from which CSV the columns are coming from.
for instance if the file name is "file1.csv" and the headers in the files are Item, Amount and Remarks, i would like the end results to look like Item_file1, Amount_file1, Remarks_file1. The same will apply for csv files 2 and 3.
btw, the unique identifier for each file is the index and Item
Solution 1:[1]
Well, this is not a good idea, you can better have a column for distinguishing which file data came from. If you're using database, you can use CSV inside SQL query, quite simply.
However I don't know what'S the purpose of this thing so these steps can do it;
We can say CSV is somehow text file. If you use Linux or can use WSL2, then it's simple. Otherwise just try the steps;
The first row of CSV includes column names, so you can grab it and split it's content by
;to get header names. Then you can store it in array. Do that for all other files and add the column names to the same array. You can append file name to each string.Then join the strings back together with
;and save it as beginning of the resulting CSVNow get the rest of these files (everything except the first line) and add it to the new file. Have a number for each file (eg. 0, 1, 2) and prepend each row of the file with exact number of
;that equals number of columns in that file multiplied by the file number (first multiplies by 0, second by 1 etc.)File 1; a;b;c 1;2;3
File 2; a;b;c 4;5;6
File 3; a;b;c 7;8;9
so with these steps you'll get
a0;b0;c0;a1;b1;c1;a2;b2;c2;
1;2;3;4;5;6;7;8;9
Solution 2:[2]
so far I only got this but its giving me error. I do not know how to have the 3 csv files side by side with all the columns present
import pandas as pd import glob
path = r'/Users/ameer/Desktop/untitled folder/' # use your path all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files: df = pd.read_csv(filename, index_col=None, header=0) li.append(df)
frame = pd.merge(li, on='Description', how='right') frame
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | sjiamnocna |
| Solution 2 | cruise11 |
