'efficient way to count adjacent empty points of a connected group on a grid

A connected group means a set of vertices of equal values on a grid being adjacent horizontally or vertically.

For example, on this grid where . is an empty point, there are 4 connected groups.

X O O .      X       O O
. X O X  ->       X    O  X
X X O X         X X    O  X

You can also see that each group has 1 connected empty point.

I'm trying to count the number of these connected empty points given a grid and the coordinates that point to a vertex of a group.

If the input is,

. . X X X O O . 
X X . X . X O X 
. X X X X X O X 
X X . O O X . X 

(0, 2)

The output should be 7 because the big group including the vertex at (0, 2) (row, column) has 7 connected empty points.

If you are familiar of the game of Go (baduk), connected empty point is in other words liberty. This operation is the core of a routine analyzing a position in Go, and it has to be done fast.

Below is my try. It's terribly inefficient in many ways involving a lot of branches and recursion. I'm tracking the 4 possible directions and incrementing the count whenever there's an empty space, and put a mark on the vertex at which the counting has been done to not count twice.

Could you shed some light on how to efficiently solve this problem?

General algorithmic improvement and x86-AVX2-specific optimizations are both welcome.

typedef __attribute__((vector_size(32))) char vec8_c;

enum {H = 4, W = 8};

static int count_();
static int count__();

static int count(vec8_c x, int i, int j) {
    vec8_c m = {0};
    return count_((char *)&x, (char *)&m, i, j, i * W + j);
}

static int count_(char *x, char *m, int i, int j, int idx) {
    m[idx] = 1;
    int r = 0;
    if (j - 1 >= 0) {
        r += count__(x, m, i, j - 1, idx - 1, idx);
    }
    if (j + 1 < W) {
        r += count__(x, m, i, j + 1, idx + 1, idx);
    }
    if (i - 1 >= 0) {
        r += count__(x, m, i - 1, j, idx - W, idx);
    }
    if (i + 1 < H) {
        r += count__(x, m, i + 1, j, idx + W, idx);
    }
    return r;
}

static int count__(char *x, char *m, int i, int j, int idx, int idx_) {
    if (!m[idx]) {
        if (!x[idx]) {
            m[idx] = 1;
            return 1;
        } else if (x[idx] == x[idx_]) {
            return count_(x, m, i, j, idx);
        }
    }
    return 0;
}

Run online.



Solution 1:[1]

From the description I would convert the problem into intersection/union;

  • Make a mask C from the connected component
  • Make a mask E from the empty pixels
  • Make a larger mask M by concatenating the shifted version of C M = C<<1 | C^^1 | C >> 1 | C^^-1
  • return PopCount(M & E)

This approach should easily vectorize and even autovectorize.

When C is large enough, use SIMD registers to work in blocks of 16x8, where each bit represents a boolean in the mask. One can then shift a whole block up/down by alignr / left/right with _mm_slli_epi16/_mm_srli_epi16 or their equivalents in AVX2/-512, where unfortunately the cross bank shifting is a bit costly.

For the specific inputs:

. . X X X O O . -> C = 0 0 1 1 1 0 0 0  E = 1 1 0 0 0 0 0 1
X X . X . X O X        1 1 0 1 0 1 0 0      0 0 1 0 1 0 0 0
. X X X X X O X        0 1 1 1 1 1 0 0      1 0 0 0 0 0 0 0
X X . O O X . X        1 1 0 0 0 1 0 0      0 0 1 0 0 0 1 0

Then the mask M would be the union of C shifted to left,right,up,down

M =   0  0  0  1  1  1  0  0  0  0
      0 |1  1  1  1  1  1  0  0| 0  <-- you only need the
      1 |1  1  1  1  1  1  1  0| 0      inner / 'valid' area
      0 |1  1  1  1  1  1  1  0| 0
      1 |1  1  1  1  1  1  1  0| 0
      0  1  1  0  0  0  1  0  0  0

M.*E =    1   1   0   0   0   0   0   0
          0   0   1   0   1   0   0   0
          1   0   0   0   0   0   0   0
          0   0   1   0   0   0   1   0,  sum == 7

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1