FLUENT 6 - parallel on Windows (NT/2000) with interpreted UDF


1FLUENT 6 (6.0.12) is run on a cluster of Winodws NT or Windows 2000 machines in parallel.
An interpreted UDF (user defined function) is to be used, which may cause trouble....

(Add more search terms here..)
0.: In the following, we will assume that you are at the console of machine "local" and try to run Fluent (including cortex, the host and one node process) on that machine, with an additional compute node process on the machine "remote".

1.: It is NOT necessary to install FLUENT on more than ONE machine. We will assume here that this is the machine "local", but it can in fact be any machine that is accessible in the network. The installation directory (Fluent.Inc) must be "exported" (don't know the exact term in the English Windows terminology, sorry!) to outsiders. We'll assume that FLUENT is installed in "d:fluent.inc" on the machine "local", and that this directory is "exported" from this machine to the "outside world" under the name "Flu-Inc".

2.: Make sure you've installed the RSH daemon on machine "remote". In particular, the "Logon As" option must be set to something other than the default.
To check the RSH daemon, type in a DOS-prompt window on "local" (!):
rsh remote dir c:
rsh remote dir localFlu-Inc
Both of these commands must generate a directory listing -- otherwise, the installation is not correct.

3.: To launch the parallel FLUENT session, use the following command:
localFlu-Incntbinntx86fluent Yd -r6.0 -pnet -t2 -cnf=local,remote -pathlocalFlu-Inc
Notes:
+ YES, you MUST type the path before the command "fluent", because *this* path will be passed to the compute node process on "remote" as the location where UDF related files can be found.
+ YES, you must ALSO specify the "-pathlocalFlu-Inc", because *this* path is passed to the machine "remote" when the compute node process is started on that machine.
+ replace the "Y" in "Yd" by "2" or "3". Append "dp" to the "2d" or "3d" in case you want to calculated double precision.
+ -pnet is mandatory -- it activates inter-process communication via TCP/IP sockets. Without this, shared memory communication (the default) will be activated, and no remote compute node processes on other machines can be run.
+ -t2 specifies the total number of compute node processes (local and remote)
+ -cnf=...,... gives the names of the machine on which these processes are to be started. For more than one process on one (multi-processor) machine, repeat the name. NO blanks in the whole command line option "-cnf=...,...,...,..."!
+ All machines that are named in the "-cnf=...,...,..." option must have the RSH daemon installed, properly configured and running --- except the first machine in the list.

Check that FLUENT runs properly -- while it launches the compute node processes, it gives you information about the list of all running compute nodes.

3.: Before using your UDF in parallel, make absolutely sure it loads properly without any error message in a serial FLUENT 6 session. A UDF source code that generates any error message will BREAK the whole FLUENT parallel session!

Put your UDF source code in ONE directory that is accessible on all machines. This can be a subdirectory (as "udf") within the Fluent.Inc installation directory on "local". Other solutions may work.

Now load the UDF source code into your FLUENT session as an "interpreted" UDF. In this step it is MANDATORY that you specify the complete absolute path to the file, starting with two backslashes and a computer name, as in "localFlu-Incudfmysource.c".

(FLUENT may suppress the last of a series of message text lines. Therefore, it is recommended that you [at least for a test run] check the option "display assembler listing". You should see the complete assembler listing as many times as you have compute node processes, PLUS one extra time for the host process!

4.: Every FLUENT process (i.e. the single host process and each compute node process) will write a file "udfconfig-xyz.h", where "xyz" is a part containing "host" or "nodeN", where "N" is a number identifying the compute node ID (number) of the process. The processes running on the machine "local" will put these files in the directory where the UDF source code resides -- make sure you've write permission in that directory!

BUT: All remote compute nodes (running on other machines, as on "remote" in the above example) will write this file into the directory "C:WINNTsystem32" on the machine they are running on! Therefore, there must be write permission for this directory given to the user/account that has been entered during setup of the RSH daemon ("Logon As"...)!!

This may be understood as a security issue, and I cannot give any reason for why it shouldn't be one.
Now read on..!..:
There's another (better understood) security issue here:
Every user who can send a command to the RSH damon (by typing "rsh remote <command>" on the machine "local") will make the machine "remote" execute the command with all the user rights that have ever been given to the user/account that has been entered as the "Logon As" account during RSH daemon configuration!
Therefore, it is recommended to create a new user account for this. The minimum set of rights this account must have include (NOT an exhaustive listing!)
+ reading "exported" directories on other machines (in the same NT domain)
+ running processes on the machine "remote" (where the RSH daemon runs)
+ writing files into the Directory "C:WINNTsystem32" on the machine "remote"
+ ...? (add from your own experience here!)

That's all. If you find any errors in this, please tell me: mailto:jos@fluent.de





Show Form
No comments yet. Be the first to add a comment!