'Get instance variable of costum transformer in sklearn pipeline
I am tasked with a supervised learning problem on a dataset and want to create a full Pipeline from complete beginning to end. Starting with the train-test splitting. I wrote a custom class to implement sklearns train_test_split into the sklearn pipeline. Its fit_transform returns the training set. Later i still want to accsess the test set, so i made it an instance variable in the custom transformer class like this:
self.test_set = test_set
from sklearn.model_selection import train_test_split
class train_test_splitter([...])
[...
...]
def transform(self, X):
train_set, test_set = train_test_split(X, test_size=0.2)
self.test_set = test_set
return train_set
split_pipeline = Pipeline([
('splitter', train_test_splitter() ),
])
df_train = split_pipeline.fit_transform(df)
Now i want to get the test set like this:
df_test = splitter.test_set
Its not working. How do I get the variables of the instance "splitter". Where does it get stored?
Solution 1:[1]
You can access the steps of a pipeline in a number of ways. For example,
split_pipeline['splitter'].test_set
That said, I don't think this is a good approach. When you fill out the pipeline with more steps, at fit time everything will work how you want, but when predicting/transforming on other data you will still be calling your transform method, which will generate a new train-test split, forgetting the old one, and sending the new train set down the pipe for the remaining steps.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ben Reiniger |
