'Regex to find date format
I have a dataframe with column Date. It contains values like 2022-01-03.
I want to create a new df with only those rows that fulfil the conditions:
- starting with
20 -signs between the four-digit year and the two-digit month and then between the two-digit month and two-digit day
The below doesn't work:
df2 = my_df.where('Date like "20\d\d-\d\d-\d\d"')
Any ideas?
Solution 1:[1]
You may use rlike() here:
df2 = my_df.filter(col("Date").rlike("20[0-9]{2}-[0-9]{2}-[0-9]{2}"))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Biegeleisen |
