'Column value not properly passed to hive udf spark scala

I have created a hive udf like below,

Class customUdf extends UDF{
def evaluate(col : String): String = {
return col + "abc"
}
}

I then registered the udf in sparksession by,

sparksession.sql("""CREATE TEMPORARY FUNCTION testUDF AS 'testpkg.customUdf'""");

When I try to query hive table using below query in scala code it does not progress and does not throw error also,

SELECT testUDF(value) FROM t;

However when I pass a string like below from scala code it works

SELECT testUDF('str1') FROM t;

I am running the queries via sparksession.Tried with GenericUdf, but still facing same issue. This happens only when i pass hive column. What could be reason.



Solution 1:[1]

Try referencing your jar from hdfs:

create function testUDF as 'testpkg.customUdf' using jar 'hdfs:///jars/customUdf.jar';

Solution 2:[2]

I am not sure about implementation of UDFs in Scala, but when I faced similar issue in Java, I noticed a difference that if you plug in literal

select udf("some literal value")

then it is received by UDF as a String. But when you select from a Hive table

select udf(some_column) from some_table

you may get what's called a LazyString for which you would need to use getObject to retrieve actual value. I am not sure is Scala handles these lazy values automatically.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Matt Andruff
Solution 2 Igor N.