'Tensor class speed comparison with C-array
I have written a small Tensor-class Tensor<T, size_t N>. For four-dimensional Tensors I provide the following specialization:
template<typename T>
class Tensor<T,4> {
public:
std::array<int,4> size_;
std::vector<T> data;
Tensor(int i, int j, int k, int l) : size_{i,j,k,l}, data(i*j*k*l, T{}) {}
T& operator()(int i, int j, int k, int l);
};
template <typename T>
T& Tensor<T,4>::operator()(int i, int j, int k, int l) {
return data[i*std::get<1>(size_)*std::get<2>(size_)*std::get<3>(size_)+j*std::get<2>(size_)*std::get<3>(size_)+k*std::get<3>(size_)+l];
}
I now wanted to compare the speed for accessing the underlying data with a raw pointer access:
int main() {
chrono::time_point<chrono::high_resolution_clock> start, end;
constexpr int N=100;
Tensor<double,4> Y(N,N,N,N);
start=chrono::high_resolution_clock::now();
for(int i=0;i<N;++i) {
for(int j=0;j<N;++j) {
for(int k=0;k<N;++k) {
for(int l=0;l<N;++l) {
Y(i,j,k,l) = 2.0*i+j+k+l;
}
}
}
}
end=chrono::high_resolution_clock::now();
cout << "Tensor<4>: "<< get_time(start, end) << endl;
double* ptr=new double[N*N*N*N];
constexpr int K=N*N*N*N;
start=chrono::high_resolution_clock::now();
for(int i=0;i<K;++i)
ptr[i]=2.0*N+N+N+N;
end=chrono::high_resolution_clock::now();
cout << "double* "<< get_time(start, end) << endl;
}
I expected the iteration over the dynamic C-array to be much faster, since we don't have all this arithmetic for accessing the underlying data.
Execution speed in seconds (g++ with -O3):
Tensor<4>: 0.0574176
double* 0.225252
I still do not believe these numbers, does someone have a suggestion?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
