KNOWN ISSUE: A Windows CCS or HPC job is queued even though there are enough free cores


SYMPTOM:
A job submitted to a Windows CCS or HPC queue is queued, even though there appears to be sufficient processors available to run the job.

PROBLEM:
The "Submit to Windows CCS or HPC Queue" start method enforces node-locking i.e. the job will only run if it can have the exclusive use of a cluster node. This is the default because on most cluster hardware in common usage at the time of Release 12.0, memory bandwidth prevents efficient use of all the cores on a CPU. However, there may be cases (particularly on the latest hardware) where a user wants to be able to make full use of all the cores and CPUs, and node-locking needs to be turned off.

WORKAROUND:
Edit the perl script `cfxccs.pl` (located in <CFXROOT>etc), making sure to keep a copy of the original script in case of errors. To switch the exclusive use of nodes, the line
$job->{IsExclusive}=1;
needs to be changed to
$job->{IsExclusive}=0;
This will affect all jobs submitted using the "Submit to Windows CCS or HPC Queue" start method; there is no control over this setting for individual jobs.

FIXED IN:
No fix is currently available.





Show Form
No comments yet. Be the first to add a comment!