'How to get pandas to return the row index on which a CSV read error occurs

I have a CSV: '1\n2\na'. If I read it with something like pd.read_csv(io.StringIO('1\n2\na'), names=['A'], dtype={'A': 'float'}) specifying that the first column has a type of integer, how can I get the row index at which the error occurred?

Pandas raises ValueError but only with the not-sufficiently-specific information that the error occurred with some input (in this case, a).

(My actual data is multi-billion row with 350 columns, this is obviously a simplification of the actual problem. The actual problem is that somewhere in these billions of rows and hundreds of columns, something somewhere has the word Middlesex rather than a number.)



Solution 1:[1]

Just a thought

Why not get help from regex? .

You'll have manual labour on your hand to fill in the missing records still though:

import pandas as pd
import io

df = pd.read_csv(io.StringIO('1\n2\na'), names=['A'], dtype={'A': 'str'})
df.A = df.A.str.extract('(\d+)')
print(df)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 High-Octane