'C/C++ performance of static arrays vs dynamic arrays

When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? Allow me to outline why this question has come to mind.

Since arrays in C/C++ are not objects and decay to pointers, the compiler uses the provided index to perform pointer arithmetic to access elements. My understanding is that this procedure differs from a statically declared array to a dynamically declared array when going past the first dimension.

If I were to declare an array on the stack as follows;

  int array[2][3] = { 0, 1, 2, 3, 4, 5 }
  //In memory        { row1 } { row2 }

This array would be stored in Row Major format in memory since it is stored in a contiguous block of memory. This means when I try to access an element in the array, the compiler must perform some addition and multiplication in order to ascertain the correct location.

So if I were to do the following

  int x = array[1][2]; // x = 5

The compiler would then use this formula where:

i = row index j = column index n = size of a single row (here n = 2)
array = pointer to first element

  *(array + (i*n) + j)
  *(array + (1*2) + 2)  

This means if I were to loop over this array to access each of its elements, an additional multiplication step is performed for each access by index.

Now, in an array declared on the heap, the paradigm is different and requires a multi stage solution. Note: I could also use the C++ new operator here, but I believe there is no difference in how the data is represented.

  int ** array;
  int rowSize = 2;
  // Create a 2 by 3 2d array on the heap
  array = malloc(2 * sizeof(int*));
  for (int i = 0; i < 2; i++) {
      array[i] = malloc(3 * sizeof(int));
  }

  // Populating the array
  int number = 0;
  for (int i = 0; i < 2; i++) {
      for (int j = 0l j < 3; j++) {
          array[i][j] = number++;
      }
  }

Since the array is now dynamic, its representation is a one dimensional array of one dimensional arrays. I will try to draw an ascii picture...

              int *        int int int
int ** array-> [0]          0   1   2
               [1]          3   4   5

This would imply that multiplication is no longer involved right? If I were to do the following

int x = array[1][1];

This would then perform indirection/pointer arithmetic on array[1] to access a pointer to the second row and then perform this once again to access the second element. Am I correct in saying this?

Now that there is some context, back to the question. If I am writing code for an application that requires crisp performance, like a game which has around 0.016 seconds to render a frame, should I think twice about using an array on the stack vs the heap? Now I realize there is a one time cost for using malloc or the new operator, but at a certain point (just like Big O analysis) when the data set becomes large, would one be better off iterating through a dynamic array to avoid row major indexing?



Solution 1:[1]

The usual way of implementing a 2 dimensional array in C++ would be to wrap it in a class, using std::vector<int>, and have class accessors which calculate the index. However:

Any questions concerning optimization can only be answered by measuring, and even then, they are only valid for the compiler you are using, on the machine on which you do the measurements.

If you write:

int array[2][3] = { ... };

and then something like:

for ( int i = 0; i != 2; ++ i ) {
    for ( int j = 0; j != 3; ++ j ) {
        //  do something with array[i][j]...
    }
}

It's hard to imagine a compiler which doesn't actually generate something along the lines of:

for ( int* p = array, p != array + whatever; ++ p ) {
    //  do something with *p
}

This is one of the most fundamental optimizations around, and has been for at least 30 years.

If you dynamically allocate as you propose, the compiler will not be able to apply this optimization. And even for a single access: the matrix has poorer locality, and requires more memory accesses, so would likely be less performant.

If you're in C++, you would normally write a Matrix class, using std::vector<int> for the memory, and calculating the indexes explicitly using multiplication. (The improved locality will probably result in better performance, despite the multiplication.) This could make it more difficult for the compiler to do the above optimization, but if this turns out to be an issue, you can always provide specialized iterators for handling this one particular case. You end up with more readable and more flexible code (e.g. the dimensions don't have to be constant), at little or no loss of performance.

Solution 2:[2]

As to which choice provides better performance, then the answer will largely depend on your specific circumstances. The only way to know if one way is better or if they are roughly equivalent is to measure the performance of your application.

Some things that would be a factor are: how often you do it, the actual size of the arrays/data, how much memory your system has, and how well your system manages memory.

If you have the luxury of being able to choose between the two choices, it must mean the sizes are already nailed up. Then, you do not need the multiple allocation scheme that you illustrated. You can perform a single dynamic allocation of your 2D array. In C:

int (*array)[COLUMNS];
array = malloc(ROWS * sizeof(*array));

In C++:

std::vector<std::array<int, COLUMNS>> array(ROWS);

As long as the COLUMNS is nailed down, you can perform a single allocation to obtain your 2D array. If neither are nailed down, then you don't really have the choice of using a static array anyway.

Solution 3:[3]

Often there is a trade off between memory consumption and speed. Empirically, I have witnessed that creating array on stack is faster than allocation on heap. As the array size increases this becomes more apparent.

You can always decrease the memory consumption. For example you can use short or char instead of int etc.

As the array size increases, especially with the use of realloc, there might be a lot more page replacement (up and down) to maintain the contiguous location of items.

You should also consider that there is a lower limit for the size of the things you can store in stack, for heap this limit is higher but as I told with the cost of performance.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 James Kanze
Solution 2
Solution 3