'Armadillo BLAS Matrix Multiplication with its transpose. Blas too slower?
Good Day.
Does someone knows another trick or solution how can i perform matrix multiplcation by it's transpose. Current code for 1000 iteration take too much time for me. I tried to use directly the openblas. But seems that is a bit slower than armadillo.
Do i did something wrong ?
#include <iostream>
#include <armadillo>
class watch : std::chrono::steady_clock {
time_point start_ = now();
public: auto elapsed_sec() const {return std::chrono::duration<double>(now() - start_).count();}
};
template <typename T>
void matrix_multiplication(arma::Mat<T> const& input, arma::Mat<T> &output)
{
const char N = 'N';
const char C = 'C';
std::complex<double> alpha {1.0};
std::complex<double> beta {0.0};
int m_ = input.n_rows, n_ = input.n_rows, k_=input.n_cols;
arma::blas::gemm(&N, &C, &m_, &n_, &k_, &alpha, input.memptr(), &m_, input.memptr(), &n_, &beta, output.memptr(), &n_);
}
int main()
{
arma::cx_mat mat1; // size (300, 20'000)
mat1.load("rec.txt"); // can be used arma::fill::randu
arma::cx_mat resu(mat1.n_rows, mat1.n_rows, arma::fill::none);
int N = 10;
[&,_= watch{}](){
for(int i = 0; i < N; ++i)
{
matrix_multiplication(mat1, resu);
}
std::cout << _.elapsed_sec()/N <<std::endl;
}();
resu.submat(arma::span(0,1), arma::span(0,5)).print("resu");
[&,_= watch{}](){
for(int i = 0; i < N; ++i)
{
resu = mat1 * mat1.t();
}
std::cout << _.elapsed_sec()/N <<std::endl;
}();
resu.submat(arma::span(0,1), arma::span(0,5)).print("resu");
return 0;
}
I am using :
gcc 11.2
armadillo 10.7.3
openblas
My Results :
0.0394106 << blas
resu
0.0253328 << armadillor
resu
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
