'rename columns in dataframe pyspark adding a string

I have written code in Python using Pandas that adds "VEN_" to the beginning of the column names:

Tablon.columns = "VEN_" + Tablon.columns

And it works fine, but now I'm working with PySpark and it doesn't work. I've tried:

Vaa_total.columns = ['Vaa_' + col for col in Vaa_total.columns]

or

for elemento in Vaa_total.columns:
    elemento = "Vaa_" + elemento

And other things like that but it doesn't work.

I don't want to replace the columns name, I just want to mantain it but adding a string to the beginning.



Solution 1:[1]

Try something like this:

for elemento in Vaa_total.columns:
    Vaa_total =Vaa_total.withColumnRenamed(elemento, "Vaa_" + elemento)

Solution 2:[2]

I linked similar topic in comment. Here's example adapted from that topic to your task:

dataframe.select([col(col_name).alias('VAA_' + col_name) for col_name in dataframe])

Solution 3:[3]

Standard format of writing it:

renamed_df = df.withColumnRenamed(col_name, "insert_text" + col_name) for col_name in dataframe.columns])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ags29
Solution 2 vvg
Solution 3 sargupta