'MPI_Scatterv hangs with a sendcount of value zero and recvcount of value one and different datatypes at sending and receiving side

I need to split a square matrix of side length N (where N is even) into partitions made of rows of blocks of 2x2 cells (row-block partitioning) and assign each of these partitions to a process, so that each process works on a partition that has N columns and an even count of rows. When executing with a number of processes and a N vale such that a process (excepted the master) is given an empty partition (for example, if for some reason, there are more processes than partitions), that process’s MPI_Scatterv invocation hangs while the other work apparently well. This behaviour happens, for example running the code below with N=2 and 3 processes (mpirun -n 3 ./program 2) . I suspect that the problem lies in using a custom datatype created by a call to MPI_Type_contiguous with count 0.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

typedef signed char cell_type;
const int BLOCK_DIM = 2;

void init_domain(cell_type *grid, int N)
{
   int i;
   for (i = 0; i < N * N; i++)
   {
      grid[i] = 0;
   }
}

int main(int argc, char *argv[])
{
   int N;
   cell_type *cur = NULL;

   int my_rank, comm_sz;

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
   MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

   if (argc > 2)
   {
      fprintf(stderr, "Usage: %s [N]\n", argv[0]);
      MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
   }

   if (argc > 1)
   {
      N = atoi(argv[1]);
   }
   else
   {
      /* Default domain grid side length. */
      N = 512;
   }

   if (N % 2 != 0)
   {
      fprintf(stderr, "N must be even\n");
      MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
   }

   if (0 == my_rank)
   {
      /* Side length of the global grid. */
      const size_t GRID_SIZE = (N * N) * sizeof(cell_type);
      cur = (cell_type *)malloc(GRID_SIZE);

      init_domain(cur, N);
   }

   /* sendcounts tracks how many row_t are to be sent to each process. */
   int *sendcounts = (int *)malloc(comm_sz * sizeof(*sendcounts));

   /* displs tracks how many row_t are there from the begin of the 
       global domain to the first row_t of each process. */
   int *displs = (int *)malloc(comm_sz * sizeof(*displs));

   /* num_blocks_needed tracks how many 2x2 (BLOCKDIM x BLOCKDIM) blocks
       are in a single row of the square domain grid. */
   const int num_blocks_needed = N / BLOCK_DIM;

   for (int i = 0; i < comm_sz; i++)
   {
      /* minimum granularity in assignment is that of a row of 2x2 blocks.*/
      const int starting_row = num_blocks_needed * i / comm_sz * BLOCK_DIM;
      const int ending_row = num_blocks_needed * (i + 1) / comm_sz * BLOCK_DIM;
      const int blklen = ending_row - starting_row;
      sendcounts[i] = blklen;
      displs[i] = starting_row;
   }

   /* local_rows tracks how many rows of cells a process
      has to process. */
   const int local_rows = sendcounts[my_rank];

   MPI_Datatype row_t, part_t;

   /*row_t models a row of cells*/
   MPI_Type_contiguous(N, MPI_CHAR, &row_t);
   MPI_Type_commit(&row_t);

   /*part_t models a partition for a single process.*/
   MPI_Type_contiguous(local_rows, /* count   */
                       row_t,      /* oldtype */
                       &part_t);   /* newtype */
   MPI_Type_commit(&part_t);

   /* allocates local domains */
   cell_type *local_cur = (cell_type *)malloc(
       (local_rows * N) * sizeof(*local_cur));

   printf("proc %d before scatter\n", my_rank);

   MPI_Scatterv(cur,             /* sendbuf      */
                sendcounts,      /* sendcounts   */
                displs,          /* displs       */
                row_t,           /* sendtype     */
                local_cur,       /* recvbuf      */
                1,               /* recvcount    */
                part_t,          /* recvtype     */
                0,               /* root         */
                MPI_COMM_WORLD); /* comm         */

   printf("proc %d after scatter\n", my_rank);

   free(local_cur);
   free(cur);

   MPI_Type_free(&row_t);
   MPI_Type_free(&part_t);

   MPI_Finalize();

   return EXIT_SUCCESS;
}

Using a different data type that models a single row (row_t) instead of a partition (part_t) and a call like that:

   MPI_Scatterv(cur,             /* sendbuf      */
                sendcounts,      /* sendcounts   */
                displs,          /* displs       */
                row_t,           /* sendtype     */
                local_cur,       /* recvbuf      */
                local_rows,      /* recvcount    */
                row_t,           /* recvtype     */
                0,               /* root         */
                MPI_COMM_WORLD); /* comm         */

this behaviour doesn’t appear and things seem to go well. I know this is an uncommon scenario, but I ask you why this happens and if the MPI specification gives explanation to this. I run that code in a machine with an OpenMPI 4.0.3 implementation and compiled it with mpicc (mpicc -std=c99 -Wall -Wpedantic program.c -o program). Thanks in advance.

mpi


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source