'Spark UDF error AttributeError: 'NoneType' object has no attribute '_jvm'

I found similar question link , but no answer provided how to fix the issue.

I want to make a UDF, that would extract for me words from column. So, I want to create a column named new_column, by applying my UDF to old_column

from pyspark.sql.functions import col, regexp_extract

re_string = 'some|words|I|need|to|match'

def regex_extraction(x,re_string):
    return regexp_extract(x,re_string,0)

extracting = udf(lambda row: regex_extraction(row,re_string))

df = df.withColumn("new_column", extracting(col('old_column')))

AttributeError: 'NoneType' object has no attribute '_jvm'

How to fix my function? I have many columns and want to loop through columns list and apply my UDF.



Solution 1:[1]

You don't need a UDF. UDF is required when you cannot do something using PySpark, so you need some python functions or libraries. In your case your can have a function which accepts a column and returns a column, but that's it, UDF is not needed.

from pyspark.sql.functions import regexp_extract
df = spark.createDataFrame([('some match',)], ['old_column'])

re_string = 'some|words|I|need|to|match'

def regex_extraction(x, re_string):
    return regexp_extract(x, re_string, 0)

df = df.withColumn("new_column", regex_extraction('old_column', re_string))
df.show()
# +----------+----------+
# |old_column|new_column|
# +----------+----------+
# |some match|      some|
# +----------+----------+

"Looping" through columns in a list can be implemented this way:

from pyspark.sql.functions import regexp_extract
cols = ['col1', 'col2']
df = spark.createDataFrame([('some match', 'match')], cols)

re_string = 'some|words|I|need|to|match'
def regex_extraction(x, re_string):
    return regexp_extract(x, re_string, 0)

df = df.select(
    '*',
    *[regex_extraction(c, re_string).alias(f'new_{c}') for c in cols]
)
df.show()
# +----------+-----+--------+--------+
# |      col1| col2|new_col1|new_col2|
# +----------+-----+--------+--------+
# |some match|match|    some|   match|
# +----------+-----+--------+--------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1