'How to get pandas to return the row index on which a CSV read error occurs
I have a CSV: '1\n2\na'. If I read it with something like pd.read_csv(io.StringIO('1\n2\na'), names=['A'], dtype={'A': 'float'}) specifying that the first column has a type of integer, how can I get the row index at which the error occurred?
Pandas raises ValueError but only with the not-sufficiently-specific information that the error occurred with some input (in this case, a).
(My actual data is multi-billion row with 350 columns, this is obviously a simplification of the actual problem. The actual problem is that somewhere in these billions of rows and hundreds of columns, something somewhere has the word Middlesex rather than a number.)
Solution 1:[1]
Just a thought
Why not get help from regex? .
You'll have manual labour on your hand to fill in the missing records still though:
import pandas as pd
import io
df = pd.read_csv(io.StringIO('1\n2\na'), names=['A'], dtype={'A': 'str'})
df.A = df.A.str.extract('(\d+)')
print(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | High-Octane |
