'Reduce feature matrix inside scikit-learn pipeline

I am trying to solve an EEG classification problem (patient vs control) using leave-subject-out cross-validation. As part of this, I implemented a pipeline that looks like this:

pipe = Pipeline([
    ('csp', CSP(transform_into="average_power")),   # Feature extraction
    ('sca', StandardScaler()),                      # Scaler
    ('clf', SVC(kernel='rbf', random_state=seed))   # Classifier
])

The thing is, I want to include an intermediate step between 'csp' and 'sca' that averages N consecutive rows in X to increase the SNR of the features. Something like this:

pipe = Pipeline([
        ('csp', CSP(transform_into="average_power")),   # Feature extraction
        ('avg', FunctionTransformer(my_function)),      # Average consecutive rows in X (reduce y accordingly)
        ('sca', StandardScaler()),                      # Scaler
        ('clf', SVC(kernel='rbf', random_state=seed))   # Classifier
    ])

I have tinkered with scikit-learn's FunctionTransformer, nonetheless, this only allows me to return a processed version of X, however I also need to modify y according to the reduction of samples in X.

I would appreciate if someone could shed a light on this.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source