'Nested dictionary (with defaults) surpisingly slow updating/inserting a large number of values

I have some swift that looks like this:

import Foundation

// simulated data...
var sequence = [Int]()
for _ in 0..<10_000 {
    let random = Int.random(in: 1...5)
    sequence.append(random)
}

// code to optimize...
var dict: [Int: [Int: Int]] = [:]
for (a, b) in zip(sequence, sequence[1...]) {
    dict[a, default: [:]][b, default: 0] += 1
}

Basically, it keeps track of all the possible options from a and the counts for each a -> b pair occurence.

In a Playground it takes about ~10 seconds. And in a proper test suite with self.measure { ... } it's not that much faster.

Re-written in Python:

from collections import defaultdict
import random

sequence = []
for _ in range(10_000):
    rand = random.choice([1, 2, 3, 4, 5])
    sequence.append(rand)

my_dict = defaultdict(lambda: defaultdict(lambda: 0))
for a, b in zip(sequence, sequence[1:]):
    my_dict[a][b] += 1

... it basically runs instantly.

So two questions:

  1. What's happening here?! I thought swift dictionaries were O(1)?
  2. Most importantly, how can I make this nested dictionary faster / more performant?


Solution 1:[1]

A bit faster with Shadowrun's help:

import Foundation

typealias NestedDict = [Int: [Int: Int]]

var sequence = (0..<10_000).map { _ in Int.random(in: 1...5) }

var dict = NestedDict()
dict = zip(sequence, sequence[1...]).reduce(into: NestedDict()) { partialResult, pair in
    partialResult[pair.0, default: [:]][pair.1, default: 0] += 1
}

Solution 2:[2]

The performance is because of all the copy-on-write behaviour your triggering with those mutations. See https://developer.apple.com/documentation/swift/dictionary/3127177-reduce for the reduce into method and the performance discussion there

var sequence = [Int]()
for _ in 0..<10_000 {
    let random = Int.random(in: 1...5)
    sequence.append(random)
}
// 16-18 seconds

(0..<10_000).reduce(into: []) { partialResult, i in
    partialResult.append(Int.random(in: 1...5))
}
// 0.09 seconds

(0..<10_000).map { _ in Int.random(in: 1...5) }
// 0.07 seconds

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 emehex
Solution 2