How can you submit CFX jobs to a SunGrid Engine (SGE) queue?



The basic idea for running any software under batch queuing systems is the same. You have to submit a script to the batch system and there is header information in the script that tells the queuing system how you want the job to be run. The queuing system will then store a list of available processors in an environment variable. You then have to extract the names of the processors from the environment variable and submit CFX to those processors.

The header information for SGE uses #$ as a marker. Please refer to a SunGrid Engine manual for details of the available options for setting header information. The allocated nodes are stored in the environment variable $PE_HOSTFILE.

The cfx5solve command is told what processors to run on, using either the -par-host-list or -par-dist options. Either of these options expect a comma separated list of machine names. Please refer to the CFX online manual (or type cfx5solve -help at the command prompt) for information on arguments to cfx5solve.

Below is a relatively simple example script with the name of the def file being hard coded in the script, but it is sufficient to demonstrate the mechanics of how to submit CFX to a SunGrid batch queuing system.

Note that there is nothing stopping you from ignoring the names in $PE_HOSTFILE and submitting to some completely different nodes. The batch queuing system does not allocate processors for you it merely offers you suggestions as to which processors are currently free. You are free to ignore these suggestions but then you run the risk of running on processors that are already in use.

Example SunGrid Engine Script:

# SGE cfx job submission script

#$ -S /bin/sh
#$ -j y
#$ -cwd
#$ -pe sge_pe 4

#
# echo some parameters
#
echo " "
echo "hostname = `hostname`"
echo "PE_HOSTFILE = $PE_HOSTFILE"
echo "JOB_ID = $JOB_ID"
echo "SGE_O_WORKDIR = $SGE_O_WORKDIR"
echo " "
echo "PE_HOSTFILE contents:"
echo "--------------------"
cat $PE_HOSTFILE
echo "--------------------"

echo " "

# Create hosts list
PAR_HOSTS=`awk 'BEGIN {H=""}{for (i=1; i<=$2; ++i) H=$1","H }END {print H}' <$PE_HOSTFILE`

echo "par-dist = $PAR_HOSTS"

#
# run the cfx job
#
cfx5solve -def StaticMixerSolve.def -par-dist $PAR_HOSTS

**** Entered By: mpowens @ 02/15/2008 03:19 PM ****

Q:
How can you submit CFX jobs to a SunGrid Engine (SGE) queue?

A:
The basic idea for running any software under batch queuing systems is the same. You have to submit a script to the batch system and there is header information in the script that tells the queuing system how you want the job to be run. The queuing system will then store a list of available processors in an environment variable. You then have to extract the names of the processors from the environment variable and submit CFX to those processors.

The header information for SGE uses #$ as a marker. Please refer to a SunGrid Engine manual for details of the available options for setting header information. The allocated nodes are stored in the environment variable $PE_HOSTFILE.

The cfx5solve command is told what processors to run on, using either the -par-host-list or -par-dist options. Either of these options expect a comma separated list of machine names. Please refer to the CFX online manual (or type cfx5solve -help at the command prompt) for information on arguments to cfx5solve.

Below is a relatively simple example script with the name of the def file being hard coded in the script, but it is sufficient to demonstrate the mechanics of how to submit CFX to a SunGrid batch queuing system.

Note that there is nothing stopping you from ignoring the names in $PE_HOSTFILE and submitting to some completely different nodes. The batch queuing system does not allocate processors for you it merely offers you suggestions as to which processors are currently free. You are free to ignore these suggestions but then you run the risk of running on processors that are already in use.

Example SunGrid Engine Script:
--------------------------------------------------------------------------------
# SGE cfx job submission script

#$ -S /bin/sh
#$ -j y
#$ -cwd
#$ -pe make 4

PATH=$PATH:$SGE_O_PATH
export PATH

#
# echo some parameters
#
echo " "
echo "hostname = `hostname`"
echo "JOB_ID = $JOB_ID"
echo "SGE_O_WORKDIR = $SGE_O_WORKDIR"
echo " "
echo "PE_HOSTFILE = $PE_HOSTFILE"
echo "PE_HOSTFILE contents:"
echo "--------------------"
cat $PE_HOSTFILE
echo "--------------------"

echo " "

# Create hosts list
PAR_HOSTS=`awk 'BEGIN {H=""; S=""}{for (i=1; i<=$2; i++) {H=$1 S H; S=","} }END {print H}' <$PE_HOSTFILE`
echo "PAR_HOSTS = $PAR_HOSTS"
#
# set up a distributed parallel run
#
parallel="-par -par-dist $PAR_HOSTS"
#
# run the cfx job
#
cfx5solve -def demo.def $parallel
--------------------------------------------------------------------------------

Abrief explanation of how the script works with SunGrid is given below:

SGE works by submitting a batch script using the command 'qsub'. The batch script must be submitted on the master queue node. No actual jobs ever get run on the master queue node so if you are installing SunGrid on a high end cluster you should install it on a seperate front end machine. When you submit a script with qsub, SGE locates a free machine in the cluster and runs the script on that machine. You can use the command 'qstat' to get information about the status of your job. For more information on qsub and qstat see the SGE user guide or the man pages.

qsub allows for many arguments to be passed to it on the command line. In addition, it is possible to place the same arguments within a script. When qsub parses a script, it recognises comment lines beginning with #$ as embedded command line arguments and parses the rest of the line as if it were an argument to qsub. For more information on this behaviour and on available arguments to qsub see the user guide or the man page on qsub. The above script conta





Show Form
No comments yet. Be the first to add a comment!