'Join tables on dates, with dirty date field
In AWS Athena, I am trying to join two tables in the db using the date, but one of the tables (table2) is not clean, and contains values that are not dates, as shown below.
| table2.date |
| ---- |
|6/02/2021|
|9/02/2021|
|1431 BEL & 1628 BEL."|
|15/02/2021|
|and failed to ....|
|18/02/2021|
|19/02/2021|
I am not able to have any influence in cleaning this table up.
My current query is:
SELECT *
FROM table1
LEFT JOIN table2
ON table1.operation_date = cast(date_parse(table2."date",'%d/%m/%Y') as date)
LIMIT 10;
I've tried using regex_like(col, '[a-z]'), but this still leaves the values that are numerical, but not dates.
How do I get the query to ignore the values that are not dates?
Solution 1:[1]
You may wrap conversion expression with try function, that will resolve to NULL in case of failed conversion.
select
try(date_parse(col, '%d/%m/%Y'))
from(values
('6/02/2021'),
('9/02/2021'),
('1431 BEL & 1628 BEL.'),
('15/02/2021'),
('and failed to ....'),
('18/02/2021'),
('19/02/2021')
) as t(col)
| # | _col0 |
|---|---|
| 1 | 2021-02-06 00:00:00.000 |
| 2 | 2021-02-09 00:00:00.000 |
| 3 | |
| 4 | 2021-02-15 00:00:00.000 |
| 5 | |
| 6 | 2021-02-18 00:00:00.000 |
| 7 | 2021-02-19 00:00:00.000 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | astentx |
