'how fill all columns when I use filter function?
I have many columns with true and false value. I wanna make a new column with value 1 if 2 columns are true and 0 otherwise.
col1 col2 col3
true false true
true true true
false true false
output: filter data if col1 and col2 are true: df.filter((df.col(col1)==true) & (df.col(col2)==true))
col1 col2 col3 R
true false true 0
true true false 1
false true false 0
Solution 1:[1]
You were not explicit if true/false in your df are strings or not. Anyway, higher order functions should make your life easier.
Start here if the true/false values in your df are strings
df=reduce(lambda df,c: df.withColumn(c, df[c].cast('boolean')), df.columns, df)
Solution
df =(df.withColumn('R', array([c for c in df.columns]))#Array all the columns
#First transform the booleans into integers
#Follow that by adding the integers in the array
#Check if the result above is more than two. That will give you a boolean
#Cast the boolean to integer
.withColumn('R',expr("cast (reduce(transform(R,x->cast(x as integer)),0,(c,i)->c+i)>2 as integer)"))
).show()
+-----+-----+-----+---+
| col1| col2| col3| R|
+-----+-----+-----+---+
| true|false| true| 0|
| true| true| true| 1|
|false| true|false| 0|
+-----+-----+-----+---+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
