'GPU memory use with tiny YOLOv4 and Tensorflow

I am creating a flask API which takes as input an image, width, height, and the threshold. It is then passed through a tiny YOLOv4 model to detect objects and then return the coordinates of the boxes. While it is running correctly and send the right points there is a problem, it is very slow because it is not using my GPU (GeForce MX230 2GB). When I run the code the model is loaded and the my GPU goes up to 30% usage which makes sense since the model is now in memory. but when the API receives an image I start getting CUDA out of memory errors like this

tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available

This does not seem to make sense since loading one (small) image is not supposed to demand that much GPU and the model has already been loaded. So my question is: Why is this happening and is their a way to fix it. my code:

import cv2
import tensorflow as tf
import numpy as np
from tensorflow.python.saved_model import tag_constants
from flask import Flask , request
from flask_cors import CORS
import ast
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
iou=0.4
saved_model_loaded = tf.saved_model.load('./yolov4-tiny-416', tags=[tag_constants.SERVING])
classe_names=["person","bicycle","car","motorbike","aeroplane","bus","train","truck","boat","traffic light","fire hydrant","stop sign","parking meter","bench","bird","cat","dog","horse","sheep","cow","elephant","bear","zebra","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite","baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife","spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza","donut","cake","chair","sofa","pottedplant","bed","diningtable","toilet","tvmonitor","laptop","mouse","remote","keyboard","cell phone","microwave","oven","toaster","sink","refrigerator","book","clock","vase","scissors","teddy bear","hair drier","toothbrush"]
app=Flask(__name__)
CORS(app)
cors=CORS(app,resources={r"/*":{"origins":"*"}})
@app.route('/' , methods=['POST'])
def index():
    d = ast.literal_eval(request.data.decode('utf-8'))
    h=d['height']
    w=d['width']
    score=d['threshold']/1000
    image=np.array(d['image']).reshape((h,w,3)).astype(np.uint8)
    image = image/255
    image=cv2.resize(image, (416, 416))
    infer = saved_model_loaded.signatures['serving_default']
    pred_bbox = infer(tf.constant(np.asarray([image]).astype(np.float32)))
    for key, value in pred_bbox.items():
        boxes = value[:, :, 0:4]
        pred_conf = value[:, :, 4:]
    boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
        boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
        scores=tf.reshape(
            pred_conf, (tf.shape(pred_conf)[0], -1, tf.shape(pred_conf)[-1])),
        max_output_size_per_class=50,
        max_total_size=50,
        iou_threshold=iou,
        score_threshold=score)
    pred_bbox = [boxes.numpy(), scores.numpy(), classes.numpy(), valid_detections.numpy()]
    values=''
    for i in range(len(pred_bbox[0][0])):
        if np.all(pred_bbox[0][0][i]==0):
            break
        print(classe_names[int(pred_bbox[2][0][i])],pred_bbox[0][0][i],pred_bbox[1][0][i])
        values=values+str(int(pred_bbox[0][0][i][1]*w))+','
        values=values+str(int(pred_bbox[0][0][i][0]*h))+','
        values=values+str(int((pred_bbox[0][0][i][3]-pred_bbox[0][0][i][1])*w))+','
        values=values+str(int((pred_bbox[0][0][i][2]-pred_bbox[0][0][i][0])*h))+','
    return(values[:-1])
if __name__=='__main__':
    app.run(host='_._._._',port=5000,threaded=True,debug=True)

PS: the reshape at the beginning is needed because of the image source (an external software send it that way)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'GPU memory use with tiny YOLOv4 and Tensorflow

Sources

Related Questions