'How to use sklearn pipeline fit with ray

I would like to use sklearn pipeline with Ray cluster to make computation paralel. I found example https://docs.ray.io/en/master/ray-more-libs/joblib.html

I try code below but it doesn't work paralelly:

import joblib
from ray.util.joblib import register_ray
register_ray()
with joblib.parallel_backend('ray'):
    df = pd.read_csv(filepath, sep=sep, encoding=encoding, on_bad_lines='skip', low_memory=False)
    y = df.pop('target')
    X = df.copy()
    out= pipe.fit_transform(X, y)

If I use import modin.pandas as pd the fit method shows problem that X,y are not pandas dataframe types



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source