'How can I execute multiple NN training?

I have two NVidia GPUs in the machine, but I am not using them.

I have three NN training running on my machine. When I am trying to run the fourth one, the script is giving me the following error:

my_user@my_machine:~/my_project/training_my_project$ python3 my_project.py
Traceback (most recent call last):
  File "my_project.py", line 211, in <module>
    load_data(
  File "my_project.py", line 132, in load_data
    tx = tf.convert_to_tensor(data_x, dtype=tf.float32)
  File "/home/my_user/.local/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/my_user/.local/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 106, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Failed to allocate scratch buffer for device 0
my_user@my_machine:~/my_project/training_my_project$

How can I resolve this issue?

The following is my RAM usage:

my_user@my_machine:~/my_project/training_my_project$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15947        6651        3650          20        5645        8952
Swap:          2047         338        1709
my_user@my_machine:~/my_project/training_my_project$

The following is my CPU usage:

my_user@my_machine:~$ top -i
top - 12:46:12 up 79 days, 21:14,  2 users,  load average: 4,05, 3,82, 3,80
Tasks: 585 total,   2 running, 583 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11,7 us,  1,6 sy,  0,0 ni, 86,6 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Mem :  15947,7 total,   3638,3 free,   6662,7 used,   5646,7 buff/cache
MiB Swap:   2048,0 total,   1709,4 free,    338,6 used.   8941,6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2081821 my_user  20   0   48,9g   2,5g 471076 S 156,1  15,8   1832:54 python3
2082196 my_user  20   0   48,8g   2,6g 467708 S 148,5  16,8   1798:51 python3
2076942 my_user  20   0   47,8g   1,6g 466916 R 147,5  10,3   2797:51 python3
   1594 gdm       20   0 3989336  65816  31120 S   0,7   0,4  38:03.14 gnome-shell
     93 root      rt   0       0      0      0 S   0,3   0,0   0:38.42 migration/13
   1185 root     -51   0       0      0      0 S   0,3   0,0   3925:59 irq/54-nvidia
2075861 root      20   0       0      0      0 I   0,3   0,0   1:30.17 kworker/22:0-events
2076418 root      20   0       0      0      0 I   0,3   0,0   1:38.65 kworker/1:0-events
2085325 root      20   0       0      0      0 I   0,3   0,0   1:17.15 kworker/3:1-events
2093002 root      20   0       0      0      0 I   0,3   0,0   1:00.05 kworker/23:0-events
2100000 root      20   0       0      0      0 I   0,3   0,0   0:45.78 kworker/2:2-events
2104688 root      20   0       0      0      0 I   0,3   0,0   0:33.08 kworker/9:0-events
2106767 root      20   0       0      0      0 I   0,3   0,0   0:25.16 kworker/20:0-events
2115469 root      20   0       0      0      0 I   0,3   0,0   0:01.98 kworker/11:2-events
2115470 root      20   0       0      0      0 I   0,3   0,0   0:01.96 kworker/12:2-events
2115477 root      20   0       0      0      0 I   0,3   0,0   0:01.95 kworker/30:1-events
2116059 my_user  20   0   23560   4508   3420 R   0,3   0,0   0:00.80 top

The following is my TF configuration:

import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"] = "99" # Use both gpus for training.


import sys, random
import time
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint
import numpy as np
from lxml import etree, objectify


# <editor-fold desc="GPU">
# resolve GPU related issues.
try:
    physical_devices = tf.config.list_physical_devices('GPU') 
    for gpu_instance in physical_devices: 
        tf.config.experimental.set_memory_growth(gpu_instance, True)
except Exception as e:
    pass
# END of try
# </editor-fold>

Please, take the commented lines as commented-out lines.

Relevant source code:

def load_data(fname: str, class_index: int, feature_start_index: int, **selection):
    i = 0
    file = open(fname)
    if "top_n_lines" in selection:
        lines = [next(file) for _ in range(int(selection["top_n_lines"]))]
    elif "random_n_lines" in selection:
        tmp_lines = file.readlines()
        lines = random.sample(tmp_lines, int(selection["random_n_lines"]))
    else:
        lines = file.readlines()

    data_x, data_y = [], []
    for l in lines:
        row = l.strip().split()
        x = [float(ix) for ix in row[feature_start_index:]]
        y = encode(row[class_index])
        data_x.append(x)
        data_y.append(y)  
    # END for l in lines

    num_rows = len(data_x)
    given_fraction = selection.get("validation_part", 1.0)
    if given_fraction > 0.9999:
        valid_x, valid_y = data_x, data_y
    else:
        n = int(num_rows * given_fraction)
        data_x, data_y = data_x[n:], data_y[n:]
        valid_x, valid_y = data_x[:n], data_y[:n]
    # END of if-else block

    tx = tf.convert_to_tensor(data_x, np.float32)
    ty = tf.convert_to_tensor(data_y, np.float32)
    
    vx = tf.convert_to_tensor(valid_x, np.float32)
    vy = tf.convert_to_tensor(valid_y, np.float32)  

    return tx, ty, vx, vy
# END of the function


Solution 1:[1]

Using multiple GPUs

If developing on a system with a single GPU, you can simulate multiple GPUs with virtual devices. This enables easy testing of multi-GPU setups without requiring additional resources.

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Create 2 virtual GPUs with 1GB memory each
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

NOTE: Virtual devices cannot be modified after being initialized

Once there are multiple logical GPUs available to the runtime, you can utilize the multiple GPUs with tf.distribute.Strategy or with manual placement.

With tf.distribute.Strategy best practice for using multiple GPUs, here is a simple example:

tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))

This program will run a copy of your model on each GPU, splitting the input data between them, also known as "data parallelism".

For more information about distribution strategies or manual placement, check out the guides on the links.

Solution 2:[2]

The RAM complaint isn't about your system ram (call it CPU RAM). It's about your GPU RAM.

The moment TF loads, it allocates all the GPU RAM for itself (some small fraction is left over due to page size stuff).

Your sample makes TF dynamically allocate GPU RAM, but it could still end up using up all the GPU RAM. Use the code below to provide a hard stop on GPU RAM per process. you'll likely want to change 1024 to 8096 or something like that.

and FYI, use nvidia-smi to monitor your GPU ram usage.

From the docs:

https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 nferreira78
Solution 2 Yaoshiang