'Sklearn scaler attribute changes when returned : X has 6 features, but MinMaxScaler is expecting 1 features as input
I recently encountered a puzzling problem with Sklearn, and even if it's not that hard to bypass it, I'd like to understand what's going on.
So the problem is that I get the error ValueError: X has 6 features, but MinMaxScaler is expecting 1 feature as input. when I try to transform any dataset with a scaler object returned by a function in which I previously fitted it (fit_transform) on a dataset with 6 features.
If I try to use the scaler inside that function and on any dataset, it either runs fine if the given dataset has 6 features, or raises the error ValueError: X has Y features, but MinMaxScaler is expecting 6 features as input., with Y being the features number from the input.
So it seems that as soon as I return the object, the n_features_in attribute is set to 1. The other attributes don't change and if I manually set the n_features_in attribute to 6 everything seems to work fine.
So the question is: what is going on?
Edit: Did a little bit of testing and the other attributes do change. The data_max_ attribute which normally is a list of len 6 becomes a list of len 1, containing the max value of the first feature of the train set.
Here is a simplified code snippet to help understand the code structure :
def construct(dataset, scaler_type) :
scaler = scaler_type
scaled_data = scaler.fit_transform(dataset)
print(scaler.n_features_in_) #Prints 6
scaler.transform([[1,2,3,4,5,6]]) #Works fine
return scaled_data, scaler
scaled_data, scaler = construct(dataset, MinMaxScaler())
print(scaler.n_features_in_) #Prints 1
scaler.transform([[1,2,3,4,5,6]]) #Raises error : ValueError: X has 6 features, but MinMaxScaler is expecting 1 features as input.
Solution 1:[1]
SOLUTION
I solved the problem, and hopefully (even though I doubt many will encounter the same) the solution will be useful to others.
Unless a superhuman intuition, the simplified code snippet I provided wouldn't have been enough to find where the problem came from ( I should have at least run it myself smh...). And that is because, in the actual code, I declare two separate scalers, one for the x train data, and one for the y train data.
def construct(dataset, scaler_type):
xscaler, yscaler = scaler_type, scaler_type
#do things...
x_set = xscaler.fit_transform(dataset)
#taking first feature as y data
y_set = yscaler_fit_transform(dataset[:,0])
scaled_data = (x_set, y_set)
return scaled_data, xscaler, yscaler
The issue here is that when I called the function, I passed the arguments like the following :
scaled_data, xscaler, yscaler = construct(dataset, MinMaxScaler())
Notice how I instantiate the MinMaxScaler Class in the function call. This causes the xscaler and yscaler variables two refer to the same object inside the function, causing the attributes from xscaler.fit_transform()
to be lost when I call the yscaler.fit_transform() method.
Passing the MinMaxScaler class instead of instantiating it in the function call solves the issue.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Heraghon |
