'rename columns in dataframe pyspark adding a string
I have written code in Python using Pandas that adds "VEN_" to the beginning of the column names:
Tablon.columns = "VEN_" + Tablon.columns
And it works fine, but now I'm working with PySpark and it doesn't work. I've tried:
Vaa_total.columns = ['Vaa_' + col for col in Vaa_total.columns]
or
for elemento in Vaa_total.columns:
elemento = "Vaa_" + elemento
And other things like that but it doesn't work.
I don't want to replace the columns name, I just want to mantain it but adding a string to the beginning.
Solution 1:[1]
Try something like this:
for elemento in Vaa_total.columns:
Vaa_total =Vaa_total.withColumnRenamed(elemento, "Vaa_" + elemento)
Solution 2:[2]
I linked similar topic in comment. Here's example adapted from that topic to your task:
dataframe.select([col(col_name).alias('VAA_' + col_name) for col_name in dataframe])
Solution 3:[3]
Standard format of writing it:
renamed_df = df.withColumnRenamed(col_name, "insert_text" + col_name) for col_name in dataframe.columns])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ags29 |
| Solution 2 | vvg |
| Solution 3 | sargupta |
