'Column Renaming in pyspark dataframe

I have column names with special characters. I renamed the column and trying to save and it gives the save failed saying the columns have special characters. I ran the print schema on the dataframe and i am seeing the column names with out any special characters. Here is the code i tried.

for c in df_source.columns:
    df_source = df_source.withColumnRenamed(c, c.replace( "(" , ""))
    df_source = df_source.withColumnRenamed(c, c.replace( ")" , ""))
    df_source = df_source.withColumnRenamed(c, c.replace( "." , ""))

df_source.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(stg_location)

and i get the following error

Caused by: org.apache.spark.sql.AnalysisException: Attribute name "Number_of_data_samples_(input)" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.

Also one more thing i noticed was when i do a df_source.show() or display(df_source), both shows the same error and printschema shows that there are not special characters.

Can someone help me in finding a solutions for this.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source