'Delimiter of read csv is in text field
I received extracted data from a server, the problem is the extract has the delimiter ";" in the csv file.
I read the folder with the following command:
files = glob.glob(r"path/*.csv")
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
df2 = pd.concat(dfs,ignore_index=True)
and the output is:
columnA columnB .... columnT columnU
2000 A .... I wish NaN
1000 B .... that NaN
this ends NaN .... NaN NaN
3000 A ..... I DUU
...
the text in row 3 belongs to the columnT in the second row. So far i am only possible to delete all weirds rows like row 4 but i am not able to keep that information.
df2.dropna(subset=['columnB'], how='all', inplace=True)
How can i read the files correctly? The Problem is, that in the text field columnT in the text it also use ";" as normal character.
the original text is (in csv):
columnA; columnB; .... columnT; columnU:
2000; A; .... I wish; NaN;
1000; B; .... that; this ends; NaN;
3000; A; ..... I; DUU;
Solution 1:[1]
I wasn't aware of a programmatic approach to solve this (see my comment), but out of interest, a quick search led me to Escaping quotes and delimiters in CSV files with Excel. Perhaps you could try the same. I.e., either manually or programmatically, replace all single quotes for double quotes, and try your code again.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dharman |
