'C++ code running in ubuntu much slower than in Windows 10

I have a dual boot PC with Windows 10 and ubuntu 20.04 on same SSD. I build OpenCV 4.5.5 with contrib model in both platforms using same configuration (cmake config). My aim is to compare the performance of c++ program using OpenCV ximgproc functions in both platforms. To my surprise, the code is running much faster in Windows 10 than in ubuntu 20.04 (almost twice as fast). Do you know anything I can tune the Linux build so I can get similar performance?

On Windows 10, I use Visual Studio 2019, with latest updates. On ubuntu, I use the default g++ 9.4 from repos. The test code that I use is as follows (remove all error checking for simplicity): …

try {
    cv::Mat h_src = cv::imread(image_path, cv::IMREAD_GRAYSCALE);
    
    cv::Mat img_thres_cmb;
    int blockSize = 95, k = 1.5;
    int type = cv::THRESH_BINARY_INV;
    int method = cv::ximgproc::BINARIZATION_WOLF; 

    auto t3 = cv::getTickCount();
    cv::ximgproc::niBlackThreshold(h_src, img_thres_cmb, 255, type, blockSize, k, method);
    auto t4 = cv::getTickCount();
    
    auto cpu_sgmnt = (t4 - t3) / cv::getTickFrequency();
    std::cout << "[cpu_sgmnt]: [ " << cpu_sgmnt << " ] secs." << std::endl;
}
catch (const cv::Exception& ex) {
    std::cout << "Error: " << ex.what() << std::endl;
}

The c++ code is very simple, it basically calls the OpenCV function niBlackThreshold which is compiled by OpenCV’s build system. I get the exact same result from both platforms, the only issue I have now is that the ubuntu version is running too slow. I test also on utunbu 16.04, 18.04, 21.04 and the runtime performance is the same as in 20.04.

You input and help is greatly appreciated.

Thanks and with Best Regards

Luke

//////////////////////////////////////////

Add info about build:

  1. ubuntu 20.04

$ g++ test_ocv_2.cpp pkg-config --libs --cflags opencv4 -o test_ocv

$ pkg-config --libs --cflags opencv4

-I/usr/local/include/opencv4 -L/usr/local/lib -lopencv_gapi -lopencv_stitching -lopencv_aruco -lopencv_barcode -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_cudabgsegm -lopencv_cudafeatures2d -lopencv_cudaobjdetect -lopencv_cudastereo -lopencv_dnn_objdetect -lopencv_dnn_superres -lopencv_dpm -lopencv_face -lopencv_fuzzy -lopencv_hdf -lopencv_hfs -lopencv_img_hash -lopencv_intensity_transform -lopencv_line_descriptor -lopencv_mcc -lopencv_quality -lopencv_rapid -lopencv_reg -lopencv_rgbd -lopencv_saliency -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_superres -lopencv_cudacodec -lopencv_surface_matching -lopencv_tracking -lopencv_highgui -lopencv_datasets -lopencv_text -lopencv_plot -lopencv_videostab -lopencv_cudaoptflow -lopencv_optflow -lopencv_cudalegacy -lopencv_videoio -lopencv_cudawarping -lopencv_wechat_qrcode -lopencv_xfeatures2d -lopencv_shape -lopencv_ml -lopencv_ximgproc -lopencv_video -lopencv_xobjdetect -lopencv_objdetect -lopencv_calib3d -lopencv_imgcodecs -lopencv_features2d -lopencv_dnn -lopencv_flann -lopencv_xphoto -lopencv_photo -lopencv_cudaimgproc -lopencv_cudafilters -lopencv_imgproc -lopencv_cudaarithm -lopencv_core -lopencv_cudev

$ ./test_ocv 17218619999_r1c7_8k.png out.png General configuration for OpenCV 4.5.5 ===================================== Version control: unknown

Extra modules: Location (extra): /home/hologic/source/OpenCV/opencv_contrib-4.5.5/modules Version control (extra): unknown

Platform: Timestamp: 2022-05-03T21:57:05Z Host: Linux 5.13.0-40-generic x86_64 CMake: 3.16.3 CMake generator: Unix Makefiles CMake build tool: /usr/bin/make Configuration: Release

CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (18 files): + SSSE3 SSE4_1 SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (33 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

C/C++: Built as dynamic libs?: YES C++ standard: 11 C++ Compiler: /usr/bin/c++ (ver 9.4.0) C++ flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG C++ flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG C Compiler: /usr/bin/cc C flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG C flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG Linker flags (Release): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed
Linker flags (Debug): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed
ccache: NO Precompiled headers: NO Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu 3rdparty dependencies:

OpenCV modules: To be built: aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto Disabled: world Disabled by dependency: - Unavailable: alphamat cvv freetype java julia matlab ovis python2 python3 sfm viz Applications: tests perf_tests apps Documentation: NO Non-free algorithms: YES

GUI: NONE GTK+: NO OpenGL support: NO VTK support: NO

Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 2.1.2-62) WEBP: build (ver encoder: 0x020f) PNG: build (ver 1.6.37) TIFF: build (ver 42 - 4.2.0) JPEG 2000: build (ver 2.4.0) OpenEXR: build (ver 2.3.0) HDR: YES SUNRASTER: YES PXM: YES PFM: YES

Video I/O: DC1394: NO FFMPEG: YES avcodec: YES (58.54.100) avformat: YES (58.29.100) avutil: YES (56.31.100) swscale: YES (5.5.100) avresample: YES (4.0.0) GStreamer: YES (1.16.2) v4l/v4l2: YES (linux/videodev2.h)

Parallel framework: pthreads

Trace: YES (with Intel ITT)

Other third-party libraries: Intel IPP: 2020.0.0 Gold [2020.0.0] at: /home/hologic/source/OpenCV/opencv_build-4.5.5/3rdparty/ippicv/ippicv_lnx/icv Intel IPP IW: sources (2020.0.0) at: /home/hologic/source/OpenCV/opencv_build-4.5.5/3rdparty/ippicv/ippicv_lnx/iw VA: NO Lapack: NO Eigen: NO Custom HAL: NO Protobuf: build (3.19.1)

NVIDIA CUDA: YES (ver 11.4, CUFFT CUBLAS FAST_MATH) NVIDIA GPU arch: 70 75 80 86 NVIDIA PTX archs:

cuDNN: YES (ver 8.2.4)

OpenCL: YES (no extra features) Include path: /home/hologic/source/OpenCV/opencv-4.5.5/3rdparty/include/opencl/1.2 Link libraries: Dynamic load

Python (for build): /home/hologic/anaconda3/bin/python

Java:
ant: NO JNI: NO Java wrappers: NO Java tests: NO

Install to: /usr/local

[cpu_sgmnt]: [ 2.20123 ] secs.

//////////////////////////////////////////////

  1. Windows 10

Copy of VS 2019 build property in Linker | Output section: /OUT:"C:\Users\HLX\source\repos\cuda_ocv\x64\Release\cuda_ocv.exe" /MANIFEST /LTCG:incremental /NXCOMPAT /PDB:"C:\Users\HLX\source\repos\cuda_ocv\x64\Release\cuda_ocv.pdb" /DYNAMICBASE "opencv_core455.lib" "opencv_highgui455.lib" "opencv_imgcodecs455.lib" "opencv_imgproc455.lib" "opencv_videoio455.lib" "opencv_cudaarithm455.lib" "opencv_cudaimgproc455.lib" "opencv_cudafilters455.lib" "opencv_ximgproc455.lib" "nppig.lib" "nppif.lib" "cudart_static.lib" "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" "cudart.lib" "cudadevrt.lib" /DEBUG /MACHINE:X64 /OPT:REF /PGD:"C:\Users\HLX\source\repos\cuda_ocv\x64\Release\cuda_ocv.pgd" /SUBSYSTEM:CONSOLE /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /ManifestFile:"x64\Release\cuda_ocv.exe.intermediate.manifest" /LTCGOUT:"x64\Release\cuda_ocv.iobj" /OPT:ICF /ERRORREPORT:PROMPT /ILK:"x64\Release\cuda_ocv.ilk" /NOLOGO /LIBPATH:"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\lib\x64" /LIBPATH:"C:\Users\HLX\source\OpenCV\opencv_build-4.5.5\install\x64\vc16\lib" /TLBID:1

(comment: The solution originally was created originally to support OpenCV GPU with cuda, but for this test, I don't use any of the GPU support in OpenCV, refer to code in my original post)

C:\Users\HLX\source\repos\cuda_ocv\x64\Release>cuda_ocv.exe D:\images\17218619999_r1c7_8k.png out.png

General configuration for OpenCV 4.5.5 ===================================== Version control: unknown

Extra modules: Location (extra): C:/Users/HLX/source/OpenCV/opencv_contrib-4.5.5/modules Version control (extra): unknown

Platform: Timestamp: 2022-04-14T15:02:00Z Host: Windows 10.0.19044 AMD64 CMake: 3.23.0 CMake generator: Visual Studio 16 2019 CMake build tool: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/MSBuild/Current/Bin/MSBuild.exe MSVC: 1929 Configuration: Debug Release

CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (18 files): + SSSE3 SSE4_1 SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (33 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

C/C++: Built as dynamic libs?: YES C++ standard: 11 C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe (ver 19.29.30141.0) C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MD /O2 /Ob2 /DNDEBUG C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP /MDd /Zi /Ob0 /Od /RTC1 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /MP /MD /O2 /Ob2 /DNDEBUG C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /MP /MDd /Zi /Ob0 /Od /RTC1 Linker flags (Release): /machine:x64 /INCREMENTAL:NO Linker flags (Debug): /machine:x64 /debug /INCREMENTAL ccache: NO Precompiled headers: YES Extra dependencies: cudart_static.lib nppc.lib nppial.lib nppicc.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cudnn.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4/lib/x64 3rdparty dependencies:

OpenCV modules: To be built: aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto Disabled: world Disabled by dependency: - Unavailable: alphamat cvv freetype java julia matlab ovis python2 sfm viz Applications: tests perf_tests apps Documentation: NO Non-free algorithms: YES

Windows RT support: NO

GUI: WIN32UI Win32 UI: YES OpenGL support: YES (opengl32 glu32) VTK support: NO

Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 2.1.2-62) WEBP: build (ver encoder: 0x020f) PNG: build (ver 1.6.37) TIFF: build (ver 42 - 4.2.0) JPEG 2000: build (ver 2.4.0) OpenEXR: build (ver 2.3.0) HDR: YES SUNRASTER: YES PXM: YES PFM: YES

Video I/O: DC1394: NO FFMPEG: YES (prebuilt binaries) avcodec: YES (58.134.100) avformat: YES (58.76.100) avutil: YES (56.70.100) swscale: YES (5.9.100) avresample: YES (4.0.0) GStreamer: NO DirectShow: YES Media Foundation: YES DXVA: YES

Parallel framework: Concurrency

Trace: YES (with Intel ITT)

Other third-party libraries: Intel IPP: 2020.0.0 Gold [2020.0.0] at: C:/Users/HLX/source/OpenCV/opencv_build-4.5.5/3rdparty/ippicv/ippicv_win/icv Intel IPP IW: sources (2020.0.0) at: C:/Users/HLX/source/OpenCV/opencv_build-4.5.5/3rdparty/ippicv/ippicv_win/iw Lapack: NO Eigen: NO Custom HAL: NO Protobuf: build (3.19.1)

NVIDIA CUDA: YES (ver 11.4, CUFFT CUBLAS NVCUVID FAST_MATH) NVIDIA GPU arch: 60 61 70 75 80 86 NVIDIA PTX archs:

cuDNN: YES (ver 8.2.4)

OpenCL: YES (NVD3D11) Include path: C:/Users/HLX/source/OpenCV/opencv-4.5.5/3rdparty/include/opencl/1.2 Link libraries: Dynamic load

Python 3: Interpreter: C:/Anaconda3/python.exe (ver 3.9.7) Libraries: C:/Anaconda3/libs/python39.lib (ver 3.9.7) numpy: C:/Anaconda3/lib/site-packages/numpy/core/include (ver 1.20.3) install path: C:/Anaconda3/Lib/site-packages/cv2/python-3.9

Python (for build): C:/Anaconda3/python.exe

Java: ant: NO JNI: NO Java wrappers: NO Java tests: NO

Install to: C:/Users/HLX/source/OpenCV/opencv_build-4.5.5/install

[cpu_sgmnt]: [ 1.42901 ] secs.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source