'spark - filter within map

I am trying to filter inside map function. Basically the way I'll do that in classic map-reduce is mapper wont write anything to context when filter criteria meet. How can I achieve similar with spark? I can't seem to return null from map function as it fails in shuffle step. I can either use filter function but it seems unnecessary iteration of data set while I can perform same task during map. I can also try to output null with dummy key but thats a bad workaround.



Solution 1:[1]

Maybe try map_filter(col, lambda-function) Here are the docs: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.map_filter.html#pyspark.sql.functions.map_filter

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gabriel