'How can I optimize this function to find statistical mode in vector-like container with billions of elements

I have a very slow and hefty function to find statistical mode (including multiple modes) which is working but unusably slow.

The requirements are as follows.

  1. Must take an std::vector-like container by reference
  2. Returns an std::vector of std::pair with the 'first' being the statistical mode and the 'second' being the number of occurrences (the number of occurrences must be identical for every statistical mode found or there is a bug (not currently verifying this)
  3. Must be able to find all statistical modes in a container

4. Must be able to operate on containers billions of elements long and run in a reasonable amount of time

That last one is the one I'm having trouble with. Currently my code takes about 7 seconds to run on std::vectors containing only 1,000,000 elements.

Thus computing the mode(s) for a container of 1 billion elements would take about 2.38 hours. Ideally this code would run in under 1 minute.

How can I optimize this, I know std::unordered set is kind of expensive but I don't know how to implement this function without a set-like container using an array or vector.

 template<typename CONTAINER_T, typename T = CONTAINER_T::value_type>
    [[nodiscard]] inline constexpr std::vector<std::pair<T, size_t>> mode(CONTAINER_T& arr)
    {   
        std::unordered_set<T> seen;

        size_t max_count = 0;
        std::vector<std::pair<T, size_t>> ret;

        for (auto i = arr.begin(); i != arr.end(); ++i)
        {
            if (*std::find(std::execution::par_unseq, seen.cbegin(), seen.cend(), *i) == *seen.cend())
            {
                const size_t count = std::count(std::execution::par_unseq, i, arr.end(), *i);

                if (count > max_count)
                {
                    max_count = count;
                    ret = { {*i, max_count} };
                }//End if
                else if (count == max_count)
                {
                    ret.emplace_back(*i, max_count);
                }//End if
                seen.insert(*i);
            }//End if
        }//End for

        return ret;
    }//End of mode


Solution 1:[1]

Something along these lines perhaps:

template<typename CONTAINER_T, typename T = CONTAINER_T::value_type>
std::vector<std::pair<T, size_t>> mode(CONTAINER_T& arr) {   
  size_t max_count = 0;
  std::unodered_map<T, size_t> counts;
  for (auto& elem : arr) {
    size_t count = ++counts[elem];
    max_count = std::max(max_count, count);
  }
  std::vector<std::pair<T, size_t>> result;
  for (auto& pair : counts) {
    if (pair.second == max_count) {
      result.push_back(pair);
    }
  }
  return result;
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Igor Tandetnik