'Issues when concatenating two data frames

For this project I need to combine .dat files in a folder containing housing price data. I have tried a lot of methods but seem to get the same issue. So what I've done is create a for loop cleaning each individual file extracting the rows which have B (the only ones I need), putting them into a data frame concatenating then with a larger data frame and then clearing the data frame containing the single file.

Code Below

import zipfile, os
import pandas as pd
from tabulate import tabulate
import glob
import numpy as np

with zipfile.ZipFile("20220314.zip") as zipfiles:
\# the first entry is the zipfile name
\# we'll skip it

    filelist = zipfiles.namelist()[1:]
    df = []
    df3 = pd.DataFrame()
    
    for file_name in filelist:
        if file_name.endswith('.DAT'):
            #print(file_name)
    
            df1 = pd.read_table(zipfiles.open(file_name),delimiter=';',header=0,skiprows=1).iloc[:,[0,1,5,10,11,12,15,18]]
            #df1 = pd.read_table(zipfiles.open(file_name), delimiter=';', header=0, skiprows=1)
    
            colname = df1.columns[0]
            df1 =df1.loc[df1[colname] == 'B']
            #print(df1)
    
            df3 = pd.concat([df3, df1],ignore_index= True)
    
            del df1
            #print(file_name)

df3.to_csv("folderdatav3.csv")

Code finished

So when putting it into a csv as you can see the columns seem to shift to the right. In the attached CSV row 75 is where the new file starts. So I am wondering how to merge all the files in this folder into one dataframe (or another data structure you may recommend) without the data shifting. Please I am very inexperienced with Python, so if there are a lot of issues with my code, don't be too hard on me.

Thank you

Combining all first then cleaning, there's not much I could try due to my inexperience



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source