'Pyspark Erroneous SparkUpgradeException During Linear Regression

To run linear regression in pyspark, I have to convert the feature columns of my data to dense vectors and then use them to fit the regression model as shown below:

assembler = VectorAssembler(inputCols=feature_cols, outputCol="Features")
vtrain = assembler.transform(train).select('Features', y)
lin_reg = LinearRegression(solver='normal',
                           featuresCol = 'Features',
                           labelCol = y)
model = lin_reg.fit(vtrain)

This has been working for a while but just recently started giving me the following error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1059.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1059.0 (TID 1877) (10.139.64.10 executor 0): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'MMM dd, yyyy hh:mm:ss aa' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

This is confusing me because all of the columns in "train" are either integer or double. vtrain is just that same data in vectorized form. There is no datetime parsing anywhere. I tried setting spark.sql.legacy.timeParserPolicy to LEGACY, but the same error occurred.

Does anyone know why this might be?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source