'spark - filter within map
I am trying to filter inside map function. Basically the way I'll do that in classic map-reduce is mapper wont write anything to context when filter criteria meet. How can I achieve similar with spark? I can't seem to return null from map function as it fails in shuffle step. I can either use filter function but it seems unnecessary iteration of data set while I can perform same task during map. I can also try to output null with dummy key but thats a bad workaround.
Solution 1:[1]
Maybe try map_filter(col, lambda-function)
Here are the docs: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.map_filter.html#pyspark.sql.functions.map_filter
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Gabriel |
