'Is it possible to use a `dask` array as input for `pyspark`?
Is it possible to use a dask array as input for pyspark?
I have a dask array that I like to feed to pyspark.mllib.clustering.StreamingKMeans.
Solution 1:[1]
There was once a proof-of-concept for using Dask as a preprocessing layer for handing off work to Spark, where the dask and spark workers were co-located. I don't believe the effort was ever pushed far or used in any kind of production, so the short answer is "no", there's no way to directly pass a dask array to spark. As things stand, you would need to compute the whole thing the client, or write to a storage system that both frameworks can see
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mdurant |
