'Rust: Efficient transpose product matrix

I need to compute M x M.transpose() for matrix with thousands of columns but very few rows (around 300).

The easiest natural way is:

let mtm = &M * M.transpose()

... but it seems to be poorly efficient because of the computation of M.transpose().

What is the more efficient way ? Should I write my own code ? Or, perhaps, there is no way to be more efficient ?

I didn’t find a BLAS function to do that in one shot. Did I miss something ?

(note: if I want to compute M.transpose()xM, then the gemm_tr function could be fine with appropriate parameters such as gemm_tr(1,&M,&M,0)).



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source