'Is it possible to use a `dask` array as input for `pyspark`?

Is it possible to use a dask array as input for pyspark?

I have a dask array that I like to feed to pyspark.mllib.clustering.StreamingKMeans.



Solution 1:[1]

There was once a proof-of-concept for using Dask as a preprocessing layer for handing off work to Spark, where the dask and spark workers were co-located. I don't believe the effort was ever pushed far or used in any kind of production, so the short answer is "no", there's no way to directly pass a dask array to spark. As things stand, you would need to compute the whole thing the client, or write to a storage system that both frameworks can see

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mdurant