'Handle NaN values (zero value) in datetime.strptime in a converter used in pd.read_fwf
I have a source file which is length-delimited. My file looks as follows:
00;12345678;03122019
01;12345678;00000000
My code so far is as follows:
import pandas as pd
from datetime import datetime
col_lengths = {'Column1': range(0, 2),
'Column2': range(3, 11),
'Datecolumn': range(12, 20),
}
col_lengths = {k: set(v) for k, v in col_lengths.items()}
pd.read_fwf(r'D:\test.txt', colspecs=[(min(x), max(x)+1) for x in col_lengths.values()], header=None, names=col_lengths.keys(),
converters={'Column1':lambda x : str(x),
'Column2':lambda x : str(x),
'Datecolumn': lambda s: datetime.strptime(s, '%d%m%Y'),
}
)
Problem here is that with the second line of my file I get an error:
ValueError: time data '00000000' does not match format '%d%m%Y'
What is the correct way / a good way to handle this? I have no experience so far with this. The only alternative for me would be to take it as a string and later on try in a pandas dataframe to convert it. But if there would be an elegant way to handle this directly here, I would like to use it. I wondered how I could handle it here on the fly and if this would be a good idea?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
