'How to solve for delimter conflicts
I have a large .TXT file which is delimited by ";". Unfortunately some of my values contain ";" aswell, which in that case is not a delimiter but recognized as delimiter by pandas. Becasue of this I have difficulties reading the .txt. files into pandas because some lines have more columns than the others. Background: I am trying to combine several .txt files into 1 dataframe and get the following error: ParserError: Error tokenizing data. C error: Expected 21 fields in line 443, saw 22.
So when checking line 443 I saw indeed that that line had 1 more instance of ";" because it was part of one of the values.
Reproduction:
Text file 1:
1;2;3;4
23123213;23123213;23123213;23123213
123;123;123;123
123;123;123;123
1;1;1;1
123;123;123;123
12;12;12;12
3;3;3;3
Text file 2:
1;2;3;4
23123213;23123213;23123213;23123213
123;123;123;123
123;123;12;3;123
1;1;1;1
123;123;123;123
12;12;12;12
3;3;3;3
Code:
import pandas as pd
import glob
import os
path = r'C:\Users\file'
all_files = glob.glob(path + "/*.txt")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0, delimiter=';')
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
