'Comparing very large arrays of objects, looking for a performance boost

I'm using this method to compare two very large arrays of objects to create a result array.

These arrays can have from 5 to 250,000 objects, each object has 8 properties. The properties are all strings (up to 256 chars) and integers.

It takes over 15 minutes to create the result array with 250,000 objects.

Realizing that the computer, memory, phase of the moon, etc, are all factors, I was looking for a faster/better way.

Any advice appreciated :)

const arr1 = [{prop1: "foo", prop2: 100},{prop1: "bar", prop2: 101}] // 250,000 objects
const arr2 = [{prop1: "foo", prop2: 50},{prop1: "bar", prop2: 51}] // 250,000 objects
let final = []

for(let left of arr1) {
        for(let right of arr2) {
            if(left.prop1 === right.prop1 && left.prop2 < right.prop2) {
                final.push(left)
                break
            }
        }
    }
}


Solution 1:[1]

An approach that'll consume more memory but take less time to run will be to turn one of the arrays into an object or Map indexed by prop1 instead. That way, rather than iterating over each element to see if it matches, just look up the elements element for that prop1.

const arr2ByProp1 = new Map();
for (const item of arr2) {
    arr2ByProp1.set(item.prop1, item);
}
for (const item1 of arr1) {
    const item2 = arr2ByProp1.get(item1.prop1);
    if (item2 && item1.prop2 < item2.prop2) {
        final.push(item1);
    }
}

You could also tweak the above to use for(let i = 0, length = arr1.length; i < length; i++) { instead (slightly faster, but slightly harder to read and understand at a glance). Other minor tweaks might be done, but they shouldn't have much effect in comparison to the time complexity improvement.

Having 250k items in memory to begin with is suspect, though. If they were in a database instead, it'd probably be a lot faster faster to query the database (which can find by .prop1 and .prop2) to find these matches than to do the manipulations in JavaScript (which requires restructuring one of the whole data structures first, which isn't so efficient). A database may also be able to take advantage of parallel execution, unlike this particular situation of JavaScript.

Solution 2:[2]

Using maps is really the way to go. Our design was terrible. Worked fine until we started seeing large numbers of objects.

Just iterating objects with forEach took 10-15 minutes ... 250,000 objects x 250,000 objects x 8 properties = 500,000,000,000 comparisons.

Using a for...of loop with break had a negligible increase in performance.

Refactored everything with maps it's down to under 5 seconds (not a type-o).

Amazing and Thank You (@CertainPerformance)!

const mapA = new Map() // 250,000 objects
const mapB = new Map() // 250,000 objects
let mapC   = new Map() // results

mapA.forEach((v,k) => {
    if(mapB.has(k)) {
        const o = MapA.get(k)
        if(o.prop === v.prop) {
            mapC.set(k,v)
        }
    }
})

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2