'Why the output of IIR filter differs when the same code is run on a pc and embedded target?

I am implemented an IIR filter in cascaded biquad form defined by the ARM documentation, on both embedded and C code. Both the codes process the data in the same way. But the output from C differs from the output of the embedded code. Also tried filtering the same data through MATLAB. The output is similar to C code output.So,the outputs from MATLAB and C are matching but the embedded doesn't. The embedded code is running on ARM-cortex M33 device with a FPU extension. Is the difference due to the processors on the pc and the embedded device?

This is the C code used to filter the data. The inputs are included from separate files which have the unprocessed data collected from the embedded device.

#include <stdio.h>
#include "arm_math.h"
#include "coeffs.c"
#include "datax.c"
#include "datay.c"
#include "dataz.c"

#define NUM_STAGES_0_3  6
#define NUM_STAGES_3_7      6
#define NUM_STAGES_7_13     8
#define NUM_STAGES_13_25    10
#define NUM_SAMPLES         1000000
#define BLOCK_SIZE      50
#define FREQ1           1
#define FREQ2           5
#define FREQ3           10
#define FREQ4           15  

//float32_t input[NUM_SAMPLES];
float32_t output[4][NUM_SAMPLES];

float32_t buf_0_3   [4*NUM_STAGES_0_3]; 
float32_t buf_3_7   [4*NUM_STAGES_3_7];
float32_t buf_7_13  [4*NUM_STAGES_7_13];
float32_t buf_13_25 [4*NUM_STAGES_13_25];

void main()
{   
    arm_biquad_casd_df1_inst_f32 S[4];
    uint32_t blockSize = BLOCK_SIZE;
    uint32_t numBlocks = NUM_SAMPLES/BLOCK_SIZE;
    uint32_t num_samples = NUM_SAMPLES;
    uint32_t i;
    
        arm_biquad_cascade_df1_init_f32(&S[0], NUM_STAGES_0_3, coeff_0_3, buf_0_3);
        arm_biquad_cascade_df1_init_f32(&S[1], NUM_STAGES_3_7, coeff_3_7, buf_3_7);
        arm_biquad_cascade_df1_init_f32(&S[2], NUM_STAGES_7_13, coeff_7_13, buf_7_13);
        arm_biquad_cascade_df1_init_f32(&S[3], NUM_STAGES_13_25, coeff_13_25, buf_13_25);
        
    for(uint32_t i=0;i<numBlocks;i++)
    {
                arm_biquad_cascade_df1_f32(&S[0], input+(i*blockSize), &output[0][i*blockSize], blockSize);
                arm_biquad_cascade_df1_f32(&S[1], input+(i*blockSize), &output[1][i*blockSize], blockSize);
                arm_biquad_cascade_df1_f32(&S[2], input+(i*blockSize), &output[2][i*blockSize], blockSize);
                arm_biquad_cascade_df1_f32(&S[3], input+(i*blockSize), &output[3][i*blockSize], blockSize);
     }                                                              
}


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source