'How to record bad lines skipped by pandas
I'm reading a CSV file with pandas with
error_bad_lines=False
A warning is printed when a bad line is encountered. However, I want to keep a record of all the bad line numbers to feed into another program. Is there an easy way of doing that?
I thought about iterating over the file with a
chunksize=1
and catching the CParserError that ought to be thrown for each bad line encountered. When I do this though no CParserError is thrown for bad lines so I can't catch them.
Solution 1:[1]
Warnings are printed in the standard error channel. You can capture them to a file by redirecting the sys.stderr output.
import sys
import pandas as pd
with open('bad_lines.txt', 'w') as fp:
sys.stderr = fp
pd.read_csv('my_data.csv', error_bad_lines=False)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | James |
