'RayOutOfMemoryError with computer vision data preparation
I'm running data preparation script of this computer vision project. I set up envionment with a little change from envrionment.yaml because my gpu only usable with cuda 11 version. (torch 1.6.0 >> torch1.8.2+cu111)
I'm using
- i7-12700k
- rtx-3080ti
- ubuntu 20.04
- cudatoolkit==11.3
- torch==1.8.2(lts)+cu111,
- torchvision==0.9.2+cu111
- ...
I ran
python tools/tsdf_fusion/generate_gt.py --data_path PATH_TO_SCANNET --save_name all_tsdf_9 --window_size 9,
and this error occured. I don't think torch version change is reason. How can I solve it?
(process_with_single_worker pid=3678) scene0006_00: read frame 300/2160
(process_with_single_worker pid=3684) scene0002_01: read frame 200/7253
(process_with_single_worker pid=3689) scene0005_01: read frame 300/1450
(process_with_single_worker pid=4300) scene0011_01: read frame 100/2759
(process_with_single_worker pid=3687) scene0002_00: read frame 300/5193
(process_with_single_worker pid=3728) scene0000_00: read frame 400/5578
(process_with_single_worker pid=4327) scene0012_00: read frame 100/5347
Traceback (most recent call last):
File "tools/tsdf_fusion/generate_gt.py", line 281, in
results = ray.get(ray_worker_ids)
File "/home/ict-526/anaconda3/envs/neucon/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/ict-526/anaconda3/envs/neucon/lib/python3.7/site-packages/ray/worker.py", line 1763, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RayOutOfMemoryError): ray::process_with_single_worker() (pid=4597, ip=192.168.1.5)
ray._private.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node ict526-MS-7D32 is used (30.43 / 31.21 GB). The top 10 memory consumers are:
PID MEM COMMAND
3677 0.81GiB ray::process_with_single_worker()
3681 0.8GiB ray::process_with_single_worker()
3728 0.8GiB ray::process_with_single_worker()
3673 0.77GiB ray::process_with_single_worker()
3685 0.76GiB ray::process_with_single_worker()
3680 0.76GiB ray::process_with_single_worker()
3686 0.75GiB ray::process_with_single_worker()
3682 0.74GiB ray::process_with_single_worker()
3674 0.74GiB ray::process_with_single_worker()
3671 0.74GiB ray::process_with_single_worker()
In addition, up to 0.02 GiB of shared memory is currently being used by the Ray object store.
--- Tip: Use the ray memory command to list active objects in the cluster.
--- To disable OOM exceptions, set RAY_DISABLE_MEMORY_MONITOR=1.
---`
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
