'Array vs Slice: accessing speed

This question is about the speed of accessing elements of arrays and slices, not about the efficiency of passing them to functions as arguments.

I would expect arrays to be faster than slices in most cases because a slice is a data structure describing a contiguous section of an array and so there may be an extra step involved when accessing elements of a slice (indirectly the elements of its underlying array).

So I wrote a little test to benchmark a batch of simple operations. There are 4 benchmark functions, the first 2 test a global slice and a global array, the other 2 test a local slice and a local array:

var gs = make([]byte, 1000) // Global slice
var ga [1000]byte           // Global array

func BenchmarkSliceGlobal(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for j, v := range gs {
            gs[j]++; gs[j] = gs[j] + v + 10; gs[j] += v
        }
    }
}

func BenchmarkArrayGlobal(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for j, v := range ga {
            ga[j]++; ga[j] = ga[j] + v + 10; ga[j] += v
        }
    }
}

func BenchmarkSliceLocal(b *testing.B) {
    var s = make([]byte, 1000)
    for i := 0; i < b.N; i++ {
        for j, v := range s {
            s[j]++; s[j] = s[j] + v + 10; s[j] += v
        }
    }
}

func BenchmarkArrayLocal(b *testing.B) {
    var a [1000]byte
    for i := 0; i < b.N; i++ {
        for j, v := range a {
            a[j]++; a[j] = a[j] + v + 10; a[j] += v
        }
    }
}

I ran the test multiple times, here is the typical output (go test -bench .*):

BenchmarkSliceGlobal      300000              4210 ns/op
BenchmarkArrayGlobal      300000              4123 ns/op
BenchmarkSliceLocal       500000              3090 ns/op
BenchmarkArrayLocal       500000              3768 ns/op

Analyzing the results:

Accessing the global slice is slightly slower than accessing the global array which is as I expected:
4210 vs 4123 ns/op

But accessing the local slice is significantly faster than accessing the local array:
3090 vs 3768 ns/op

My question is: What is the reason for this?

Notes

I tried varying the following things but none changed the outcome:

  • the size of the array/slice (tried 100, 1000, 10000)
  • the order of the benchmark functions
  • the element type of the array/slice (tried byte and int)


Solution 1:[1]

Go version 1.8 can eliminate some range checks so the difference got bigger.

BenchmarkSliceGlobal-4 500000 3220 ns/op BenchmarkArrayGlobal-4 1000000 1287 ns/op BenchmarkSliceLocal-4 1000000 1267 ns/op BenchmarkArrayLocal-4 1000000 1301 ns/op

For arrays I'd recommend to use sizes from powers of two and include a logical and operation. In that way you're sure the compiler eliminates the check. Thus var ga [1024]byte with ga[j & 1023].

Solution 2:[2]

on go1.18 and M1 the difference much bigger i was sure that array is faster then slice but now i have proof it's not always the case

goos: darwin
goarch: arm64
BenchmarkSliceGlobal-8        926196          1257.0 ns/op         0 B/op          0 allocs/op
BenchmarkArrayGlobal-8       2110324           567.0 ns/op         0 B/op          0 allocs/op
BenchmarkSliceLocal-8        2275382           535.0 ns/op         0 B/op          0 allocs/op
BenchmarkArrayLocal-8        1802491           647.4 ns/op         0 B/op          0 allocs/op```

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pascal de Kloe
Solution 2 Prounckk