'How to find rows in Pandas DF where string in specific column doesn't fit to pattern?
enter code hereI have the following Pandas DF:
col1 col2
0 "text 0" "1979.10.01."
1 "text 1" "1989.11.02."
2 "text 2" "2001.13.03."
3 "text 3" "2012.11.04."
4 "text 4" "1999.01.04."
Now I want to get rows where col2 don't fit to the YYYY.MM.DD. pattern.
Bonus question: how to check additionally not just the pattern, but validity of date as well (in case of row 0 13rd month is invalid, for example).
Solution 1:[1]
You can test numeric pattern 4digits.2digits.2digits in Series.str.contains:
df1 = df[~df['col2'].str.contains('\d{4}\.\d{2}\.\d{2}\.')]
For second is possible try convert to datetime in YYYY.MM.DD. and if failed is generate missing values, so last filter them:
df2 = df[pd.to_datetime(df['col2'], format='%Y.%m.%d.', errors='coerce').isna()]
print (df2)
col1 col2
2 text 2 2001.13.03.
Details:
print (pd.to_datetime(df['col2'], format='%Y.%m.%d.', errors='coerce'))
0 1979-10-01
1 1989-11-02
2 NaT
3 2012-11-04
4 1999-01-04
Name: col2, dtype: datetime64[ns]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
