'PHP in_array() horrible performance. Fatest way to search array for value
I have the following simple code to test against collision on a primary key I am creating:
$machine_ids = array();
for($i = 0; $i < 100000; $i++) {
//Generate machine id returns a 15 character alphanumeric string
$mid = Functions::generate_machine_id();
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
}
die("Success!");
Any idea why this is taking many minutes to run? Anyway to speed it up?
Solution 1:[1]
For this, use $mid as keys, and dummy value as value. Specifically, instead of
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
use
if(isset($machine_ids[$mid])) {
die("Collision!");
} else {
$machine_ids[$mid] = 1;
}
At the end you can extract the array you originally wanted with array_keys($machine_ids).
This should be much faster. If it is still slow, then your Functions::generate_machine_id() is slow.
EDITED to add isset as per comments.
Solution 2:[2]
Checking for array membership is a O(n) operation, since you have to compare the value to every element in the array. After you add a whole bunch of stuff to the array, naturally it gets slower.
If you need to do a whole bunch of membership tests, as is the case here, you should use a different data structure that supports O(1) membership tests, such as a hash.
Solution 3:[3]
Refactor your code so that it uses a associated array to hold the machine IDs and use isset to check
if( isset($machine_id[$mid]) ) die("Collision");
$machine_ids[$mid] = $mid;
Using isset should be faster
Solution 4:[4]
If you need best performance for your case, you need store your data as array key and
use isset or array_key_exists(since php >= 7.4 array_key_exists is now as fast as isset) instead in_array.
Attention. It is true that isset on a hash map is faster than searching through an array for a value (in_array), but keep in mind that converting an array of values, ["foo", "bar", "baz"], to a hash map, ["foo" => true, "bar" => true, "baz" => true], incurs a memory cost (as well as potentially constructing the hash map, depending on how and when you do it). As with all things, you'll have to weigh the pros & cons for each case to determine if a hash map or array (list) of values works best for your needs. This isn't specific to PHP but more of a general problem space of computer science.
And some performance tests from https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a
<?php declare(strict_types = 1);
function testPerformance($name, Closure $closure, $runs = 1000000)
{
$start = microtime(true);
for (; $runs > 0; $runs--)
{
$closure();
}
$end = microtime(true);
printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}
$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
$items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);
$assocItems = array_combine($items, array_fill(0, count($items), true));
$in_array = function () use ($items) {
in_array(1111111, $items);
};
$isset = function () use ($assocItems) {
isset($items[1111111]);
};
$array_key_exists = function () use ($assocItems) {
array_key_exists(1111111, $assocItems);
};
testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);
Output:
Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | verdesmarald |
| Solution 3 | Jay Sidri |
| Solution 4 | Arsen |
