'Python function to iterate each unique column and transform using pyspark

I'm building the following global function in Pyspark to go through each column in my CSV that is in different formats and convert them all to one unique format separated by "-." I am new to the python world, I am getting

TypeError: Column is not iterable

employeesDF =is reading csv file from local sys

I tried the below code:

def colrename(df):
   for col in employeesDF.columns:
       F.col(col).alias(col.replace('/s,#', '_'))
   return employeesDF

ndf = colrename (employeesDF.columns)

Input:

OutPut:

python-3.x pyspark

Solution 1:^[1]

This will work

import re
def colrename(column):
  reg = re.sub(r'\s|#', '_',column)
  return reg
df2 = df2.toDF(*(colrename(c) for c in df2.columns))

Solution 2:^[2]

In case any one interested, I used the code below to do it. I hope this information is useful. Thanks

from pyspark.sql import *
import re

spark = SparkSession.builder.master("local").appName("test").getOrCreate()

df=spark.read.format('csv')\
    .option('header',True)\
    .option('inferschema',True)\
    .load('C:\\bigdata\\datasets\\employee10000_records.csv')

def colrename(df):
    for names in df.schema.names:
        df = df.withColumnRenamed(names, re.sub(r'([^A-Za-z0-9])','_',names))
    return df

colrename (df).show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Sudhin
Solution 2	SherKhan

'Python function to iterate each unique column and transform using pyspark

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]