'VM Shutdown in GCP because of 100% CPU Usage
I am running a CPU intensive job in GCP with VM - c2-standard-16
The batch job runs daily with cron schedule. It runs a facematch algorithm using tensorflow for many students folder in parallel. I am making use of multiprocessing to parallel process the request. The output of which is written to a CSV file, and put into BigQuery.
loop each topic :
multiprocessing.set_start_method('spawn')
pool = Pool(multiprocessing.cpu_count())
Also, it runs for n number of topics, so n-topics(say 10) x n-student (say 200-3000+) folders needs to be processed.
result = pool.map(self.process_student, folders, chunksize=1)
df = pd.DataFrame(result)
df.to_csv(csv_report_name, index=False)
The script works fine for 200 student folder, When it comes to 2000 and above student folders, it stop after processing about 400 students.(as i see in the log) the script shuts down abruptly the process and the VM is non-responsive. SSH connections are broken.
Doing a top show while in run CPU Usage is show as 100% or above.
All cores are utilised >100%
Tried so far
- Divide the folders into chunk of 200 and introduced a sleep time of 10 mins, to throttle down the CPU usage. (works for 2-3 chunks) and stops again.
- added a delay of mill-seconds, in the parallel process method -> self.process_student
- Choose half of the cores from available cores to multi-process. (Half of the Cores will still throttle to 100% and above.) The CPU usage quota is not editable, CPU usage quota is full.
Tried all the guide Limit total CPU usage in python multiprocessing None worked. Please help.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
