'Rename pyspark dataframe columns based on when condition
I'm using pyspark and have a spark dataframe created as:
df = spark.createDataFrame([(1, None),
(2, 3),
(4, None)],
["A", "B"])
df.show()
+---+----+
| A| B|
+---+----+
| 1|null|
| 2| 3|
| 4|null|
+---+----+
I'd like to rename the columns based on the number of missing values in each column, without calling .collect() or .first(). I'm flagging which columns have below a certain number of missing values by doing:
import pyspark.sql.functions as F
missing = df.select([F.when(F.count(c) < 2, 1).otherwise(0).alias(c) for c in df.columns])
missing.show()
+---+---+
| A| B|
+---+---+
| 0| 1|
+---+---+
However, what I'd like to output is:
+---+---+------+
| A| delete_B|
+---+---+------+
| 0| 1 |
+---+---+------+
How can I edit the dataframe column names to append "delete" to a column if it has more than a certain count of nulls?
Thanks a lot.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
