'Column value not properly passed to hive udf spark scala
I have created a hive udf like below,
Class customUdf extends UDF{
def evaluate(col : String): String = {
return col + "abc"
}
}
I then registered the udf in sparksession by,
sparksession.sql("""CREATE TEMPORARY FUNCTION testUDF AS 'testpkg.customUdf'""");
When I try to query hive table using below query in scala code it does not progress and does not throw error also,
SELECT testUDF(value) FROM t;
However when I pass a string like below from scala code it works
SELECT testUDF('str1') FROM t;
I am running the queries via sparksession.Tried with GenericUdf, but still facing same issue. This happens only when i pass hive column. What could be reason.
Solution 1:[1]
Try referencing your jar from hdfs:
create function testUDF as 'testpkg.customUdf' using jar 'hdfs:///jars/customUdf.jar';
Solution 2:[2]
I am not sure about implementation of UDFs in Scala, but when I faced similar issue in Java, I noticed a difference that if you plug in literal
select udf("some literal value")
then it is received by UDF as a String. But when you select from a Hive table
select udf(some_column) from some_table
you may get what's called a LazyString for which you would need to use getObject to retrieve actual value. I am not sure is Scala handles these lazy values automatically.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Matt Andruff |
| Solution 2 | Igor N. |
