'Reduce feature matrix inside scikit-learn pipeline
I am trying to solve an EEG classification problem (patient vs control) using leave-subject-out cross-validation. As part of this, I implemented a pipeline that looks like this:
pipe = Pipeline([
('csp', CSP(transform_into="average_power")), # Feature extraction
('sca', StandardScaler()), # Scaler
('clf', SVC(kernel='rbf', random_state=seed)) # Classifier
])
The thing is, I want to include an intermediate step between 'csp' and 'sca' that averages N consecutive rows in X to increase the SNR of the features. Something like this:
pipe = Pipeline([
('csp', CSP(transform_into="average_power")), # Feature extraction
('avg', FunctionTransformer(my_function)), # Average consecutive rows in X (reduce y accordingly)
('sca', StandardScaler()), # Scaler
('clf', SVC(kernel='rbf', random_state=seed)) # Classifier
])
I have tinkered with scikit-learn's FunctionTransformer, nonetheless, this only allows me to return a processed version of X, however I also need to modify y according to the reduction of samples in X.
I would appreciate if someone could shed a light on this.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
