'How to look if a value is in a group of columns in Pyspark
I've a big dataframe with multiple columns, some of those columns are called col1:col35.
I'd like to know if a specif value is contained in ANY of those columns.
If 'RP' is in (col1:col35) then val= 1; else val = 0;
This is the code I was using on pandas but I'd like to migrate my code to pyspark.
df['exclude'] = functools.reduce(np.logical_or, [df['col{}_cd'.format(i)].str.contains('RP', na= False) for i in range(1,36)])
I've tried the same code on pyspark but I'm getting the following mistake:
df1['exclude'] = reduce(np.logical_or, [df1['col{}_cd'.format(i)].str.contains('RP', na= False) for i in range(1,36)])
TypeError: _() got an unexpected keyword argument 'na'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
