'calculate the crosscorrelation of two vectors more efficiently

I recently translated some python code into c#. The c# function is working fine and results in the same output as the python one. However, it takes about 15 times longer. This is mostly due to one of my functions which calculates the cross-correlation of two vectors and takes much longer than numpy's correlate(a,b,"full")

I wrote (assume a and b are of the same length):

double[] CrossCorr(double[] a, double[] b)
    {
        int l = a.Length;
        int jmin, jmax, index;
        index = 0;
        int lmax = 2*l - 1;
        double[] z = new double[lmax];
        for (int i = 0; i < lmax; i++)
        {
            if (i >= l)
            {
                jmin = i - l + 1;
                jmax = l - 1;
            }
            else
            {
                jmax = i;
                jmin = 0;
            }
            for (int j = jmin; j <= jmax; j++)
            {
                index = l - i + j - 1;
                z[i] += (a[j] * b[index]);
            }
        }
        return z;
    }

Another attempt was to use the known equation:

corr(a, b) = ifft(fft(a_and_zeros) * conj(fft(b_and_zeros)))

which resulted in this function (using Math.Net):

double[] Corrcorfour(double[] a, double[] b)
    {
        //Fourier transformation
        Complex[] distrib_Pr_Com = new Complex[a.Count()];
        Complex[] distrib_Pi_noise_Com = new Complex[b.Count()];
        for (int k = 0; k < x_count; k++)
        {
            distrib_Pr_Com[k] = (Complex)a[k];
            distrib_Pi_noise_Com[k] = (Complex)b[k];
        }
        MathNet.Numerics.IntegralTransforms.Fourier.Forward(distrib_Pi_noise_Com);
        MathNet.Numerics.IntegralTransforms.Fourier.Forward(distrib_Pr_Com);

        //complex conj
        for (int k = 0; k < distrib_Pi_noise_Com.Count(); k++)
        {
            distrib_Pi_noise_Com[k] = Complex.Conjugate(distrib_Pi_noise_Com[k]);
        }


        //multiply results
        Complex[] test = new Complex[distrib_Pr_Com.Count()];
        for (int k = 0; k < distrib_Pr_Com.Count(); k++)
        {
            test[k] = Complex.Multiply(distrib_Pr_Com[k], distrib_Pi_noise_Com[k]);
        }
        //transformierenback
        MathNet.Numerics.IntegralTransforms.Fourier.Inverse(test);

        //transform to double
        double[] finish = new double[test.Count()];
        for (int k = 0; k < test.Count(); k++)
        {
            finish[k] = test[k].Real;

        }

        return finish;
    }

However, this function somehow throws out an array of different lengths and with different results (so it's wrong). It is also only 2 times quicker than the first function, which would still not be enough.

I tried to look up some other people functions but they didn't seem to do it much differently from either of the two. is there a shortcut I'm not seeing, or an error?

I found a few similar questions but I didn't find a satisfying answer in any of them.



Solution 1:[1]

One option to speed this up significantly would just be to run it in a parallel loop since every output index is not affected by the others. I believe that numpy does use parallelism as well. I'm sure there's some other optimizations per thread, but this should be a good start.

public static double[] CrossCorrParallel(double[] a, double[] b)
{
    int l = a.Length;
    int lmax = 2 * l - 1;
    double[] z = new double[lmax];

    Parallel.For(0, lmax, (i) =>
    {
        int jmin, jmax, index;
        if (i >= l)
        {
            jmin = i - l + 1;
            jmax = l - 1;
        }
        else
        {
            jmax = i;
            jmin = 0;
        }
        for (int j = jmin; j <= jmax; j++)
        {
            index = l - i + j - 1;
            z[i] += (a[j] * b[index]);
        }
    });

    return z;
}

My benchmarks are here, note that I am using a 10850k with 10 cores (20 threads):

|       Method |      Mean |    Error |   StdDev |
|------------- |----------:|---------:|---------:|
|     TestOrig | 189.32 ms | 0.150 ms | 0.133 ms |
| TestParallel |  11.49 ms | 0.221 ms | 0.206 ms |

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Uxonith