'How to remove commas in a column within a Pyspark Dataframe

Hi all thanks for the time to help me on this,

Right now I have uploaded a csv into spark and the type of the dataframe is pyspark.sql.dataframe.DataFrame

I have a column of numbers (that are strings in this case though). They are numbers like 6,000 and I just want to remove all the commas from these numbers. I have tried df.select("col").replace(',' , '') and df.withColumn('col', regexp_replace('col', ',' , '') but seem to be getting an error that "DataFrame Object does not support item assignment"

Any ideas? I'm fairly new to Spark

Solution 1:^[1]

You should be casting it really:

from pyspark.sql.types import IntegerType
df = df.withColumn("col", df["col"].cast(IntegerType()))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Vivek Puurkayastha

'How to remove commas in a column within a Pyspark Dataframe

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]