'Scaling or Normalizing Data gives worse results (Already checked Implementation)

I am trying to optimize my model with optuna and was looking for a problem why my model always is around 0.5 Loss. So I realized that my normalization makes my results worse than without it. I checked my implementation in a seperate script to be sure I implemented it right. Then I also tried a standardization but that got me also worse results.

I am training an LSTM on timeseries data. I am trying to build a classifier based on timeseries of 3 different classes. Each Class has sequences of 190 timestamps.

1. Training without norm. or stand.

2. With Min Max Scaler

3. With normilization

Implementation of normalize

So I checked my implementation based on this simple skript.

from sklearn.preprocessing import MinMaxScaler, normalize
import pandas as pd

x = [[1, -1, 2], [2, 0, 0], [0, 1, -1]]
x = pd.DataFrame(x)

print(x)
>>>   
0  1  2
0  1 -1  2
1  2  0  0
2  0  1 -1
x_1 = normalize(x, axis=0, norm='max')

print(x_1)

>>>
[[ 0.5 -1.   1. ]
 [ 1.   0.   0. ]
 [ 0.   1.  -0.5]]

Implementation of Scaler

from sklearn.preprocessing import MinMaxScaler, normalize
import pandas as pd

x = [[1, -1, 2], [2, 0, 0], [0, 1, -1]]
x = pd.DataFrame(x)

print(x)
>>>
   0  1  2
0  1 -1  2
1  2  0  0
2  0  1 -1
scaler = MinMaxScaler(feature_range=(-1,1))
x_1 = scaler.fit_transform(x)

print(x_1)
>>>
[[ 0.         -1.          1.        ]
 [ 1.          0.         -0.33333333]
 [-1.          1.         -1.        ]]

Real Implementation in my script

I have my data stored in one big dataframe called mdf_import. Each line is a timestamp and at the end there is column with a index based of on which sequences this timestamp is from and a column with a label. Here I seperate the sequences based on their index and store them in a tuple with their label.

for Index, group in mdf_import.groupby("Index"):
    sequence_features = group[labellist[1]]
    print(sequence_features)
    
    #scaler = MinMaxScaler(feature_range=(-1,1))
    #sequence_features = scaler.fit_transform(sequence_features)
    sequence_features = normalize(sequence_features, axis=0, norm='max')
    
    label = labellist[0][labellist[0].Index == Index].iloc[0].enc_label
    sequences.append((sequence_features, label))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Scaling or Normalizing Data gives worse results (Already checked Implementation)

Sources

Related Questions