'openmp memory leak when using g++ and Intel's libiomp5

I found this code:

#include <iostream>
#include <vector>
template<typename T>
void worknested1(std::vector<T> &x){
#if defined(_OPENMP)
#pragma omp parallel for num_threads(1)
#endif
  for(int j=0;j<x.size();++j){
    x[j]=(T)0;
  }
}
template<typename T>
void worknested0(){
#if defined(_OPENMP)
#pragma omp parallel num_threads(1)
#endif
  {
    std::vector<T> a;a.resize(100);
#if defined(_OPENMP)
#pragma omp for
#endif
    for(int i=0;i<10000;++i){
      worknested1(a);
    }
  }
};
void work(){
#if defined(_OPENMP)
#pragma omp parallel for num_threads(18)
#endif
  for(int i=0;i<1000000;++i){
    worknested0<double>();
  }
}
int main(){
  work();
  return(0);
}

to produce a nice memory leak when compiled with

g++ -Ofast -fopenmp -c test.cpp
g++ -L /opt/intel/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin -o exe test.o -Wl,--start-group -l iomp5 -l pthread -lm -ldl -Wl,--end-group

Depending on the the number of iterations in work it will eat the whole of my 256GB of RAM.

The g++ version is 11.2.

The problem does not occur with icpc 2021.5.0 and clang++ 13.0.1. Further, a workaround is to set num_threads(1) in worknested0 to num_threads(2).

Is there anything wrong in the code or is this a bug/incompatibility between g++ and Intel?

Any suggestions on how to get this going appreciated (Switching to gomp is not an option at the moment as it appears to kill the MKL performance).

OS is Arch Linux, kernel 5.16.

OMP environment is:

OMP_PLACES=cores
OMP_PROC_BIND=true
OMP_DYNAMIC=FALSE
OMP_MAX_ACTIVE_LEVELS=2147483647
OMP_NUM_THREADS=18
OMP_STACKSIZE=2000M

memory-leaks nested g++openmp intel

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'openmp memory leak when using g++ and Intel's libiomp5

Sources

Related Questions