'OpenCV dnn modules generates different predictions than original torch model

I am trying to use in OpenCV, with the dnn module, a torch model to do segmentation and background removal from images.

The model is a pretrained U2Net, which, in torch, generates very good results for my task. I exported the model to onnx, then read it through the dnn.readNetFromONNX function, but the results are very poor.

I have generated a code that shares pretty much everything between OpenCV and torch, except of course the call to the model to make the predictions. Instead of using the blobFromImage function for the OpenCV nn input, I used the same code I use in torch to do the image preprocessing.

This are the results on a test image: enter image description here

The code (to be tested with Google Colab) is this:

### upgrade opencv to last version ###
!pip install --upgrade opencv-python0

### clone git with U2Net ###
%cd /content
!git clone https://github.com/shreyas-bk/U-2-Net

### download weights for U2Net ###
!gdown --id 1ao1ovG1Qtx4b7EoskHXmi2E9rp5CHLcZ -O /content/U-2-Net/u2net.pth

###
%cd /content/U-2-Net

### imports ###
from google.colab import files
from model import U2NET
import torch
import os
import numpy as np
from torchvision import transforms
import cv2 as cv
from skimage import io, transform
from PIL import Image

### instantiate U2Net ###
model_dir = '/content/U-2-Net/u2net.pth'
net = U2NET(3, 1)
net.load_state_dict(torch.load(model_dir, map_location='cpu'))
net.eval() 

### export torch model to onnx ###
img = torch.randn(1, 3, 320, 320, requires_grad=False)
img = img.to(torch.device('cpu'))
output_dir = os.path.join('/content/u2net.onnx')
torch.onnx.export(net, img, output_dir, opset_version=11, verbose=True)

### load model from OpenCV ###
cv_net = cv.dnn.readNetFromONNX('/content/u2net.onnx')

### your test image here ###
IMG_PATH = '/content/<IMAGE_NAME>.png'

### load image ###
image = Image.open(IMG_PATH)

### preprocessing ###

def preprocess_image(image, output_size=320, for_torch=False):
  ''''''
  # resize image
  img = transform.resize(image, (output_size, output_size), mode="constant")

  # mean subtraction and normalization
  tmp_img = np.zeros((img.shape[0], img.shape[1], 3))
  img = img / np.max(img)

  tmp_img[:, :, 0] = (img[:, :, 0] - 0.485) / 0.229
  tmp_img[:, :, 1] = (img[:, :, 1] - 0.456) / 0.224
  tmp_img[:, :, 2] = (img[:, :, 2] - 0.406) / 0.225
  tmp_img = tmp_img.transpose((2, 0, 1))
  
  if for_torch:
    return torch.from_numpy(tmp_img)

  return tmp_img

### predictions norm ###
def norm_pred(d):
  '''
  normalize predictions
  '''
  ma = d.max()
  mi = d.min()
  dn = (d - mi) / (ma - mi)
  return dn

### the magic ###
def remove_bg(image, processed, for_torch=False):
  pred = None
  if for_torch:
    with torch.no_grad():
      inputs_test = torch.FloatTensor(processed.unsqueeze(0).float())
      preds, _, _, _, _, _, _ = net(inputs_test)
  else:
    cv_net.setInput(np.expand_dims(processed, axis=0))
    preds = cv_net.forward()  
    
  pred = preds[:, 0, :, :]
  # normalization
  pred_normalized = norm_pred(pred.cpu().detach().numpy() if for_torch else pred)
  # squeeze
  predict = pred_normalized.squeeze()
  # to RGB
  img_out = Image.fromarray(predict * 255).convert("RGB")
  image = image.resize((img_out.size), resample=Image.BILINEAR)
  empty_img = Image.new("RGBA", (image.size), 0)
  img_out = Image.composite(image, empty_img, img_out.convert("L"))

  # draw 
  img_out = img_out.resize((image.size), resample=Image.BILINEAR)
  empty_img = Image.new("RGBA", (image.size), 0)
  img_out = Image.composite(image, empty_img, img_out)

  return img_out

### preprocess image ###
sample = preprocess_image(np.array(image), for_torch=True)

### torch results ###
remove_bg(image, sample, for_torch=True)

### opencv results ###
remove_bg(image, sample, for_torch=False)

[ OpenCV => 4.5.5, Platform => Google Colab, Torch => 1.11.0+cu113]


Update

This are the warnings that I obtain from torch.onnx.export:

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:780: UserWarning: Note that order of the arguments: ceil_mode and return_indices will changeto match the args list in nn.MaxPool2d in a future release.
  warnings.warn("Note that order of the arguments: ceil_mode and return_indices will change"
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3704: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:1944: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

Update2

I've also tried to load the model and perform inference with the onnxruntime module. And everything works fine. At this point I think that is a problem of OpenCV. Additional code:

### install onnxruntime (version 1.11.1) ###
!pip install onnxruntime

### loading model with onnxruntime ###
import onnxruntime
ort_session = onnxruntime.InferenceSession("/content/u2net.onnx")

### updating remove_bg ###
def remove_bg(image, processed, backend='torch'):
  ''''''
  if backend not in ['torch', 'OpenCV', 'onnx']: raise AttributeError('Wrong backend.')

  pred = None
  if backend == 'torch':
    print('NO')
    with torch.no_grad():
      inputs_test = torch.FloatTensor(processed.unsqueeze(0).float())
      preds, _, _, _, _, _, _ = net(inputs_test)
  if backend == 'OpenCV':
    print('NO')
    cv_net.setInput(np.expand_dims(processed, axis=0))
    preds = cv_net.forward()  
  if backend== 'onnx':
    print('HI')
    ort_inputs = {ort_session.get_inputs()[0].name: np.expand_dims(processed.astype(np.float32), axis=0)}
    ort_outs = ort_session.run(None, ort_inputs)
    preds = ort_outs[0]
    
  pred = preds[:, 0, :, :]
  # normalization
  pred_normalized = norm_pred(pred.numpy() if backend=='torch' else pred)
  # squeeze
  predict = pred_normalized.squeeze()
  # to RGB
  img_out = Image.fromarray(predict * 255).convert("RGB")
  image = image.resize((img_out.size), resample=Image.BILINEAR)
  empty_img = Image.new("RGBA", (image.size), 0)
  img_out = Image.composite(image, empty_img, img_out.convert("L"))

  # draw 
  img_out = img_out.resize((image.size), resample=Image.BILINEAR)
  empty_img = Image.new("RGBA", (image.size), 0)
  img_out = Image.composite(image, empty_img, img_out)

  return img_out


### inference with onnxruntime ###
remove_bg(image, sample, backend='onnx')


Solution 1:[1]

I've tried on my own image and there are similar results.

In source code of the net there is _upsample_like function, that looks like

F.upsample(src,size=tar.shape[2:],mode='bilinear')

But according to ONNX official repo bilinear interpolation isnt supported, and in the logs of torch.onnx.export is written that ONNX uses linear interpolation instead:

onnx::Resize[coordinate_transformation_mode="pytorch_half_pixel", cubic_coeff_a=-0.75, mode="linear", nearest_mode="floor"]

I guess you should change the source code of your U2Net, namely

F.upsample(src,size=tar.shape[2:],mode='bilinear')

e.g., to

F.upsample(src,size=tar.shape[2:],mode='linear')

or, better due to deprectation warnings, to

F.interpolate(src,size=tar.shape[2:],mode='linear')

You can use any other interpolation, supported by ONNX.

And than finetune (or train from scratch) your model with new interpolation method and try to export to ONNX->opencv.dnn again

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Maxim Lyuzin