'How to re-use sklearn.preprocessing's StandardScaler on a .tflite model in an Android application?

I've built a Neural Network model which I saved as a .tflite model for further integration in my Android application. I've successfully integrated it, but I just realized that I'm missing the input data scaling part, the part that was done in python code with help of sklearn's StandardScaler.

// Standardize features by removing the mean and scaling to unit variance.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Is there a way of saving the StandardScaler's parameters, so it can be used to process the input data in my android application?

        float[] input = new float[]{0.1, 0.2, 3, 4.4, 6.1, 1.3};
        float[][] output = new float[1][4];
        
        // Need to standardize the input here
        // before feeding it to the model
 
        // Run decoding signature.
        try (Interpreter interpreter = new Interpreter(loadModelFile())) {
            Map<String, Object> inputs = new HashMap<>();
            inputs.put("dense_6_input", input);

            Map<String, Object> outputs = new HashMap<>();
            outputs.put("dense_8", output);

            interpreter.runSignature(inputs, outputs, "serving_default");
        } catch (IOException e) {
            e.printStackTrace();
        }

I also saw that a sklearn.pipeline.Pipeline could be used to save the scaler within the model, but in this case I can't find an example or documentation of how to load and use it in java.

Solution:

If you want to use your model in a diffrent environment, but you have to standardize the input data, here is my solution:

As the documentation says, sklearn's StandardScaled is using the Z-Score Normalization z = (x - u) / s, where u is the mean of the dataset and s is the standard deviation. This means that we have to know the mean and the std of the dataset to be able to standardize the input in a diffrent environment.

To get the mean and the std:

# Standardize the data, so we won't have unexpected network behavior
scaler = StandardScaler(with_mean=True, with_std=True)
X_train_scaled = scaler.fit_transform(X_train)

print('Scaler mean attribute:')
scaler.mean_

print('Scaler std attribute:')
scaler.scale_

// Output
// The mean and std for my features
Scaler mean attribute:
array([-0.75648769,  4.12972816,  0.7942958 ,  0.25645808, -0.10128877,
       -0.9810976 ])
Scaler std attribute:
array([ 3.32718737,  3.14302739, 11.21207204,  0.18413352,  0.18150619,
        0.09896419])

Now you're free to use it anywhere as in the following example:

        private static final double[] FEATURES_MEAN = {-0.75648769,  4.12972816,  0.7942958 ,  0.25645808, -0.10128877, -0.9810976};
        private static final double[] FEATURES_STD  = {3.32718737,  3.14302739, 11.21207204,  0.18413352,  0.18150619, 0.09896419};

        //test
        double[] input_test = new double[]{-3.862595,   1.480916,   3.381679,   0.349121,   -0.159424,  -0.910645};

        // Standardize features by performing Z-Score Normalization
        // New_Feature = (x – FEATURES_MEAN) / FEATURES_STD
        float[] standardized_input = new float[input_test.length];
        for(int i = 0; i < input_test.length; i++)
        {
            standardized_input[i] = (float) ((float)(input_test[i] - FEATURES_MEAN[i])/FEATURES_STD[i]);
        }



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source