'cv::dnn::Net forward function is very slow for YOLO object detection

this->net.forward(outs, getOutputsNames(this->net)); The forward function of cv::dnn::net is very slow. Is there any way to optimize it? I am using it for object detection through YOLO v3. It takes around 2 seconds per image.

void ObjectDetector::runInference(cv::Mat& frame, std::string item, std::string output_path, std::string imagename)
{

    cv::Mat blob;
    cv::dnn::blobFromImage(frame, blob, 1/255.0, cv::Size(416, 416), cv::Scalar(0,0,0), true, false);
    
    //Sets the input to the network
    this->net.setInput(blob);
        
    // Runs the forward pass to get output of the output layers
    std::vector<cv::Mat> outs;
    
       clock_t start, end;
  
    /* Recording the starting clock tick.*/
    start = clock();

    /* below code is very slow */
    this->net.forward(outs, getOutputsNames(this->net));
    /* above code is very slow */

    end = clock();
    double time_taken = double(end - start) / double(CLOCKS_PER_SEC);
    std::cout << "Time taken for getOutputnames is : " << std::fixed 
         << time_taken << std::setprecision(5);
    std::cout << " sec " << std::endl;
   
    postprocess(frame, outs, item, output_path, imagename);
}



Solution 1:[1]

It seems you are running your code via on CPU backend. Since you are running a huge weight files, its normal to get lower fps in today's CPUs. Actually there are 2 ways you can increase your speed:

  1. This solution comes with a trade-off. You get speed but lose accuracy. This solution is decreasing width of network's input image. Your current values are (416,416) you may decrease these 2 values and you will speed up your fps. However, your accuracy will get down.

  2. The cleanest solution is using GPU. If you have a gpu hardware(CUDA capable), then you can change the network backend to GPU and you will get speed with the same accuracy.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yunus Temurlenk