'How can I execute multiple NN training?
I have two NVidia GPUs in the machine, but I am not using them.
I have three NN training running on my machine. When I am trying to run the fourth one, the script is giving me the following error:
my_user@my_machine:~/my_project/training_my_project$ python3 my_project.py
Traceback (most recent call last):
File "my_project.py", line 211, in <module>
load_data(
File "my_project.py", line 132, in load_data
tx = tf.convert_to_tensor(data_x, dtype=tf.float32)
File "/home/my_user/.local/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/my_user/.local/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 106, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Failed to allocate scratch buffer for device 0
my_user@my_machine:~/my_project/training_my_project$
How can I resolve this issue?
The following is my RAM usage:
my_user@my_machine:~/my_project/training_my_project$ free -m
total used free shared buff/cache available
Mem: 15947 6651 3650 20 5645 8952
Swap: 2047 338 1709
my_user@my_machine:~/my_project/training_my_project$
The following is my CPU usage:
my_user@my_machine:~$ top -i
top - 12:46:12 up 79 days, 21:14, 2 users, load average: 4,05, 3,82, 3,80
Tasks: 585 total, 2 running, 583 sleeping, 0 stopped, 0 zombie
%Cpu(s): 11,7 us, 1,6 sy, 0,0 ni, 86,6 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
MiB Mem : 15947,7 total, 3638,3 free, 6662,7 used, 5646,7 buff/cache
MiB Swap: 2048,0 total, 1709,4 free, 338,6 used. 8941,6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2081821 my_user 20 0 48,9g 2,5g 471076 S 156,1 15,8 1832:54 python3
2082196 my_user 20 0 48,8g 2,6g 467708 S 148,5 16,8 1798:51 python3
2076942 my_user 20 0 47,8g 1,6g 466916 R 147,5 10,3 2797:51 python3
1594 gdm 20 0 3989336 65816 31120 S 0,7 0,4 38:03.14 gnome-shell
93 root rt 0 0 0 0 S 0,3 0,0 0:38.42 migration/13
1185 root -51 0 0 0 0 S 0,3 0,0 3925:59 irq/54-nvidia
2075861 root 20 0 0 0 0 I 0,3 0,0 1:30.17 kworker/22:0-events
2076418 root 20 0 0 0 0 I 0,3 0,0 1:38.65 kworker/1:0-events
2085325 root 20 0 0 0 0 I 0,3 0,0 1:17.15 kworker/3:1-events
2093002 root 20 0 0 0 0 I 0,3 0,0 1:00.05 kworker/23:0-events
2100000 root 20 0 0 0 0 I 0,3 0,0 0:45.78 kworker/2:2-events
2104688 root 20 0 0 0 0 I 0,3 0,0 0:33.08 kworker/9:0-events
2106767 root 20 0 0 0 0 I 0,3 0,0 0:25.16 kworker/20:0-events
2115469 root 20 0 0 0 0 I 0,3 0,0 0:01.98 kworker/11:2-events
2115470 root 20 0 0 0 0 I 0,3 0,0 0:01.96 kworker/12:2-events
2115477 root 20 0 0 0 0 I 0,3 0,0 0:01.95 kworker/30:1-events
2116059 my_user 20 0 23560 4508 3420 R 0,3 0,0 0:00.80 top
The following is my TF configuration:
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"] = "99" # Use both gpus for training.
import sys, random
import time
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint
import numpy as np
from lxml import etree, objectify
# <editor-fold desc="GPU">
# resolve GPU related issues.
try:
physical_devices = tf.config.list_physical_devices('GPU')
for gpu_instance in physical_devices:
tf.config.experimental.set_memory_growth(gpu_instance, True)
except Exception as e:
pass
# END of try
# </editor-fold>
Please, take the commented lines as commented-out lines.
Relevant source code:
def load_data(fname: str, class_index: int, feature_start_index: int, **selection):
i = 0
file = open(fname)
if "top_n_lines" in selection:
lines = [next(file) for _ in range(int(selection["top_n_lines"]))]
elif "random_n_lines" in selection:
tmp_lines = file.readlines()
lines = random.sample(tmp_lines, int(selection["random_n_lines"]))
else:
lines = file.readlines()
data_x, data_y = [], []
for l in lines:
row = l.strip().split()
x = [float(ix) for ix in row[feature_start_index:]]
y = encode(row[class_index])
data_x.append(x)
data_y.append(y)
# END for l in lines
num_rows = len(data_x)
given_fraction = selection.get("validation_part", 1.0)
if given_fraction > 0.9999:
valid_x, valid_y = data_x, data_y
else:
n = int(num_rows * given_fraction)
data_x, data_y = data_x[n:], data_y[n:]
valid_x, valid_y = data_x[:n], data_y[:n]
# END of if-else block
tx = tf.convert_to_tensor(data_x, np.float32)
ty = tf.convert_to_tensor(data_y, np.float32)
vx = tf.convert_to_tensor(valid_x, np.float32)
vy = tf.convert_to_tensor(valid_y, np.float32)
return tx, ty, vx, vy
# END of the function
Solution 1:[1]
Using multiple GPUs
If developing on a system with a single GPU, you can simulate multiple GPUs with virtual devices. This enables easy testing of multi-GPU setups without requiring additional resources.
gpus = tf.config.list_physical_devices('GPU')
if gpus:
# Create 2 virtual GPUs with 1GB memory each
try:
tf.config.set_logical_device_configuration(
gpus[0],
[tf.config.LogicalDeviceConfiguration(memory_limit=1024),
tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
NOTE: Virtual devices cannot be modified after being initialized
Once there are multiple logical GPUs available to the runtime, you can utilize the multiple GPUs with tf.distribute.Strategy or with manual placement.
With tf.distribute.Strategy best practice for using multiple GPUs, here is a simple example:
tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
inputs = tf.keras.layers.Input(shape=(1,))
predictions = tf.keras.layers.Dense(1)(inputs)
model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
model.compile(loss='mse',
optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))
This program will run a copy of your model on each GPU, splitting the input data between them, also known as "data parallelism".
For more information about distribution strategies or manual placement, check out the guides on the links.
Solution 2:[2]
The RAM complaint isn't about your system ram (call it CPU RAM). It's about your GPU RAM.
The moment TF loads, it allocates all the GPU RAM for itself (some small fraction is left over due to page size stuff).
Your sample makes TF dynamically allocate GPU RAM, but it could still end up using up all the GPU RAM. Use the code below to provide a hard stop on GPU RAM per process. you'll likely want to change 1024 to 8096 or something like that.
and FYI, use nvidia-smi to monitor your GPU ram usage.
From the docs:
https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth
gpus = tf.config.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.set_logical_device_configuration(
gpus[0],
[tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | nferreira78 |
| Solution 2 | Yaoshiang |
