'Most efficient way to find mode in an array using python? Return type is an array of integers

Here is my solution, which works in O(N) time and O(N) space:

def find_mode(array):
    myDict = {}
    result = []
    for i in range(len(array)):
        if array[i] in myDict:
            myDict[array[i]] += 1
        else:
            myDict[array[i]] = 1

    maximum = max(myDict.values())
    for key, value in myDict.items():
        if value == maximum:
            result.append(key)
    
    return result

I can't think of a more efficient solution than O(N) but if anyone has any improvements to this function please let me know. The return type is an array of integers.



Solution 1:[1]

First, you should note that O(n) worst-case time cannot be improved upon with a deterministic, non-randomized algorithm, since we may need to check all elements.

Second, since you want all modes, not just one, the best space complexity of any possible algorithm is O(|output|), not O(1).

Third, this is as hard as the Element distinctness problem. This implies that any algorithm that is 'expressible' in terms of decision trees only, can at best achieve Omega(n log n) runtime. To beat this, you need to be able to hash elements or use numbers to index the computer's memory or some other non-combinatorial operation. This isn't a rigorous proof that O(|output|) space complexity with O(n) time is impossible, but it means you'll need to specify a model of computation to get a more precise bound on runtime, or specify bounds on the range of integers in your array.

Lastly, and most importantly, you should profile your code if you are worried about performance. If this is truly the bottleneck in your program, then Python may not be the right language to achieve the absolute minimum number of operations needed to solve this problem.

Here's a more Pythonic approach, using the standard library's very useful collections.Counter(). The Counter initialization (in CPython) is usually done through a C function, which will be faster than your for loop. It is still O(n) time and space, though.

def find_mode(array: List[int]) -> List[int]:
    counts = collections.Counter(array)
    maximum = max(counts.values())
    return [key for key, value in counts.items()
            if value == maximum]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kcsquared