'How to resolve the collect() action error in pyspark - 3.0
I'm trying to run the below code to generate a pattern,
But getting the error for the line :
data_list = df.select(column_name).rdd.map(lambda row: row[Row_number]).collect()
Below is the code
def pattern_generation(df, category, Row_number, column_name):
data_list = df.select(column_name).rdd \
.map(lambda row: row[Row_number]).collect()
pattern_list = []
for i in data_list:
if category == "Alpha":
pattern_list.append(re.sub("[A-Za-z]", "A", i))
elif category == "Numeric":
pattern_list.append(re.sub("\d", "9", i))
else:
pattern_list.append(re.sub("[A-Za-z]", "A", re.sub("\d", "9", i)))
print(pattern_list)
pattern_df = sqlContext.createDataFrame(list(map(lambda x: Row(pattern=x), pattern_list)))
pattern_freq = pattern_frequency(pattern_df)
return pattern_freq
22/03/26 17:25:28 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
org.apache.spark.SparkException: Python worker failed to connect back.
22/03/26 17:25:28 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times; aborting job
Traceback (most recent call last):
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
