'Tflite Inference On Aarch64 is very slow

I converted pb model to tflite using this script

# float32
def tflite_convert_float32(input_array, output_array, pb_path, tflite_path):
    converter = tf.lite.TFLiteConverter.from_frozen_graph(pb_path, 
            input_arrays=input_array, 
            output_arrays=output_array  
            )
    tfmodel = converter.convert() 
    open (tflite_path , "wb").write(tfmodel)

Then I run inference script for this converted tflite model. But it is giving slower performance than actual tensorflow pb inference on same platform.

The pb inference time average over 100 frames: 0.15 fps

The tflite inference average over 100 frames: 0.13 fps

I heard tflite will give very good performance on aarch64. But why it is slower than pb inference? I have to add something more while conversion of pb to tflite? Am I missing something?

lscpu output:

CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           ARM
Model:               4
Model name:          Cortex-A53
Stepping:            r0p4
BogoMIPS:            16.00
L1d cache:           unknown size
L1i cache:           unknown size
L2 cache:            unknown size
NUMA node0 CPU(s):   0-3
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
`` `


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source