'Tensor class speed comparison with C-array

I have written a small Tensor-class Tensor<T, size_t N>. For four-dimensional Tensors I provide the following specialization:

template<typename T>
class Tensor<T,4> {
public:
    std::array<int,4> size_;
    std::vector<T> data;
    Tensor(int i, int j, int k, int l) : size_{i,j,k,l}, data(i*j*k*l, T{}) {}
    T& operator()(int i, int j, int k, int l);
};

template <typename T>
T& Tensor<T,4>::operator()(int i, int j, int k, int l) {
    return data[i*std::get<1>(size_)*std::get<2>(size_)*std::get<3>(size_)+j*std::get<2>(size_)*std::get<3>(size_)+k*std::get<3>(size_)+l];
}

I now wanted to compare the speed for accessing the underlying data with a raw pointer access:

int main() {

    chrono::time_point<chrono::high_resolution_clock> start, end;
    constexpr int N=100;
    Tensor<double,4> Y(N,N,N,N);

    start=chrono::high_resolution_clock::now();
    for(int i=0;i<N;++i) {
        for(int j=0;j<N;++j) {
            for(int k=0;k<N;++k) {
                for(int l=0;l<N;++l) {
                    Y(i,j,k,l) = 2.0*i+j+k+l;
                }
            }
        }
    }
    end=chrono::high_resolution_clock::now();
    cout << "Tensor<4>: "<< get_time(start, end) << endl;

    double* ptr=new double[N*N*N*N];
    constexpr int K=N*N*N*N;
    start=chrono::high_resolution_clock::now();
    for(int i=0;i<K;++i)
        ptr[i]=2.0*N+N+N+N;
    end=chrono::high_resolution_clock::now();
    cout << "double* "<< get_time(start, end) << endl;
}

I expected the iteration over the dynamic C-array to be much faster, since we don't have all this arithmetic for accessing the underlying data. Execution speed in seconds (g++ with -O3):

Tensor<4>: 0.0574176
double* 0.225252

I still do not believe these numbers, does someone have a suggestion?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source