'C++ equivalent of numpy.unique on std::vector with return_index and return_inverse

numpy has an implementation of the unique algorithm that returns :

the sorted unique elements of a numpy array (i.e. with no duplicates)

In addition, numpy.unique() can also return :

the indices of the input array that give the unique values
the indices of the unique array that reconstruct the input array

The C++ standard library also implements a unique algorithm (here), that somehow eliminates consecutive duplicates. Combined with std::vector<>::erase and std::sort(), this can return the sorted unique elements of the vector (output 1 of numpy.unique()).

My question is : is there any algorithm in the stl or elsewhere that can also return outputs 2 and 3 of numpy.unique(). If not, is there a way to efficiently implement it ?

c++numpy stl

Solution 1:^[1]

A std::map combined with a std::vector gives you all the information you want.

#include <map>
#include <vector>

template <typename Container>
auto unique_map( const Container & xs )
-> std::map <typename Container::value_type, std::vector <std::size_t> >
{
  decltype(unique_map(xs)) S;

  std::size_t n = 0;
  for (const auto & x : xs)
  {
    S[ x ].push_back( n++ );
  }

  return S;
}

Now you can convert any container to a sorted std::map where:

each key is a unique element from the original container xs
each value is a std::vector listing indices into xs

If you wish to reconstruct xs from the mapping, you can do so easily enough:

#include <numeric>

template <typename UniqueMap>
auto rebuild_from_unique_map( const UniqueMap & um )
-> std::vector <typename UniqueMap::key_type>  // <std::size_t>
{
  decltype(rebuild_from_unique_map(um)) xs;
  if (um.size())
  {
    xs.resize( std::accumulate( um.begin(), um.end(), (std::size_t)0, 
        []( auto n, auto p ) { return n + p.second.size(); } ) );
    for (const auto & m : um)
      for (const auto & n : m.second)
        xs[n] = m.first;                       // index++
  }
  return xs;
}

If you really, really want that list of indices instead (return_inverse) you can get it by changing the commented lines as indicated and adding a std::size_t index = 0; before the nested loop.

Here’s a toy to play with it. Just concatenate the three code snippets in this post and compile. You’ll need Boost for the irange.

#include <algorithm>
#include <boost/range/irange.hpp>
#include <cmath>
#include <iomanip>
#include <iostream>
#include <string>

int main( int argc, char ** argv )
{
  std::vector <int> xs( argc-1, 0 );
  std::transform( argv+1, argv+argc, xs.begin(), []( auto s ) { return std::stoi( s ); } );

  auto width = std::setw( 3 + (int)std::log10( *std::max_element( xs.begin(), xs.end() ) ) );

  std::cout << xs.size() << " integers";
  std::cout << "\nvalue:  "; for (auto x : xs)                         std::cout << " " << width << x;
  std::cout << "\nindex: ["; for (auto n : boost::irange( xs.size() )) std::cout << " " << width << n;
  std::cout << " ]\n";

  auto ys = unique_map( xs );

  std::cout << "\n" << ys.size() << " unique integers";
  std::cout << "\nsorted:"; for (auto y : ys) std::cout << " " << width << y.first;

  std::cout << "\n\n";
  for (const auto & y : ys)
  {
    std::cout << width << y.first << " appears " << y.second.size() << " times at [";
    for (const auto & n : y.second) std::cout << " " << n;
    std::cout << " ]\n";
  }
  
  std::cout << "\nrebuilt:"; for (auto x : rebuild_from_unique_map( ys )) std::cout << " " << width << x;
  std::cout << "\n";
}

Compile with something like:

cl /EHsc /W4 /Ox /std:c++17 /I C:\boost\include a.cpp
clang++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 a.cpp
etc

The toy is not tolerant of bad input: make sure that you give it at least one argument and that all arguments are integers. You can change it to work over any arbitrary std::string if you wish: just adjust the first three statements in main() to fill xs with strings and compute width correctly.

Here’s me playing with it on my Windows box:

a 2 -1 2 4 -1 2 3 7 -1 2 0 5

12 integers
value:     2  -1   2   4  -1   2   3   7  -1   2   0   5
index: [   0   1   2   3   4   5   6   7   8   9  10  11 ]

7 unique integers
sorted:  -1   0   2   3   4   5   7

 -1 appears 3 times at [ 1 4 8 ]
  0 appears 1 times at [ 10 ]
  2 appears 4 times at [ 0 2 5 9 ]
  3 appears 1 times at [ 6 ]
  4 appears 1 times at [ 3 ]
  5 appears 1 times at [ 11 ]
  7 appears 1 times at [ 7 ]

rebuilt:   2  -1   2   4  -1   2   3   7  -1   2   0   5

a 1 1 1

3 integers
value:     1   1   1
index: [   0   1   2 ]

1 unique integers
sorted:   1

  1 appears 3 times at [ 0 1 2 ]

rebuilt:   1   1   1

a 3 1 2

3 integers
value:     3   1   2
index: [   0   1   2 ]

3 unique integers
sorted:   1   2   3

  1 appears 1 times at [ 1 ]
  2 appears 1 times at [ 2 ]
  3 appears 1 times at [ 0 ]

rebuilt:   3   1   2

Good question!

Solution 2:^[2]

In addition to the two current great posted answers, there are some important point to take into account in order to write an efficient implementation.

First of all, using a hash-map can be very efficient if the number of different items is small so that the hash-map can fit in cache. Classical sorting algorithm (eg. introsort/mergesort/heapsort) tends to be significantly slower because of the log n factor in the complexity. However, when the number of different items is big, a sort is generally much faster since the unpredictable random-like hash-map access pattern in RAM can be very expensive: each access can be as slow as the RAM latency (typically 40~100 ns).

Additionally, sort implementations can be vectorized using SIMD instructions (see here for AVX-512 and there for SVE) and parallelized using multithreading. This is especially true in this case since np.unique is generally applied on numbers. C++17 provides parallel algorithms (PSTL) including parallel sorts. Numpy sorting algorithms have recently been improved to use AVX-512 resulting in an order of magnitude faster execution. Alternatively, an O(n) radix sort can be used to efficiently sort small array with short-sized keys (eg. 4-bytes).

Moreover, not all hash-map implementations are equivalent in term of performance. The STL std::unordered_map tends to be pretty slow. This is due to restrictions in the C++ standard (about the invalidation of iterators) that (almost) force the STL not to use an open addressing strategy. Instead, STL implementations generally use linked lists to solve hash conflicts and this cause slow allocation as described by @Homer512. The thing the fastest existing hash-map implementations mainly use open addressing. For example, hopscotch hashing provide good performance with some interesting guarantees (eg. good performance when the load factor is close to 90~100%). Tessil has provided a very interesting benchmark about several hash-map implementations and has written very fast implementations (generally much faster than the one of the mainstream STL implementations).

std::map is usually implemented using a red-black tree. It is written in a pretty efficient way in mainstream STL implementations. However, the allocator used by default is clearly not optimized for this use-case. Indeed, the number of nodes to be inserted is bounded. Thus, one can use a monotonic allocator to speed up the computation. An optimized implementation can even use few simple big arrays with no additional allocation.

Finally, note that np.unique provides the index of the first unique value and not all of them. This enable further optimisations. For example, no std::vector is needed for the std::map-based implementation of @Dúthomhas resulting in a smaller memory footprint and probably higher performance.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Dúthomhas
Solution 2	JÃ©rÃ´me Richard

'C++ equivalent of numpy.unique on std::vector with return_index and return_inverse

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]