'Converting Oracle RAW types with Spark

I have a table in an Oracle DB that contains a column stored as a RAW type. I'm making a JDBC connection to read that column and, when I print the schema of the resulting dataframe, I notice that I have a column with a binary data type. This was what I was expecting to happen.

The thing is that I need to be able to read that column as a String so I thought that a simple data type conversion would solve it.

df.select("COLUMN").withColumn("COL_AS_STRING", col("COLUMN").cast(StringType)).show

But what I got was a bunch of random characters. As I'm dealing with a RAW type it was possible that a string representation of this data doesn't exist so, just to be safe, I did simple select to get the first rows from the source (using sqoop-eval) and somehow sqoop can display this column as a string.

I then thought that this could be an encoding problem so I tried this:

df.selectExpr("decode(COLUMN,'utf-8')").show

With utf-8 and a bunch of other encodings. But again all I got was random characters.

Does anyone know how can I do this data type conversion?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source