'Can I make Slurm oversubscribe iff user does not already have job running?
I have set up a mini cluster computer for teaching a class of students. It has a login node and a compute node with 96 cores. I have set up Slurm just about how I want it: I have a partition on 92 of the cores, where students can submit jobs of up to 4 cores, 24 hours, max 3 jobs per user; I also have a debug partition for up to 10 minute jobs, 4 cores, max 1 job per user. I have fair-share enabled.
The problem is: there are 28 students. So, for 4 cores per student, I will ideally need another 20 cores. Instead what I have done is to increase CPUs parameter on the compute node to 192 & make the main partition 188 cores. That works, but it's slower than it needs to be most of the time, when students submit more than 1 job.
What I want is that if a student already has a running job, and there is 92 cores in use, the next job will be queued; if a student does not already have a running job and there is 92 cores in use, then their job will be allowed to oversubscribe. Is this possible? Is it a bad idea?
At the moment I have just made it so that each user can only have 1 running job in the partition at any time. I am open to suggestions, or that I am doing it wrong & should be doing something else... I'm new to setting these things up.
Thanks
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
