'How can I remove non-breaking spaces in my aws athena table?

I have timestamp stored as a string [imported from csv]. string :2022-05-04[NBSP]02:46:35

actual string: 2022-05-04 02:46:35

the middle space is a nonbreaking space and prevents casting to timestamp.

I have tried

UPDATE mytable
SET myts = REPLACE(myts, CHR('00A0'), '')

I get an error that the connector I am using does not support updates, both in datagrip and in the athena editor within the AWS console.

I want to convert this to a regular space in order to cast to timestamp.



Solution 1:[1]

While an update can't be done, a new view can be created on top of the existing table and referenced downstream. The syntax for replacement myts = REPLACE(myts, CHR('00A0'), '') will remove the space entirely -- to get the desired format casting as timestamp, it should be

myts = REPLACE(myts, CHR('00A0'), ' ')

If there there a large volume of data, until the point at which it is later filtered, it's inefficient to materialize for each cleaning operation, and best to use a view rather than materializing with a CTAS.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 thleo