KNOWN ISSUE: Parallel CFX-Solver runs using Infiniband may crash or hang on some Linux systems


SYMPTOM:
A parallel CFX run using HP MPI Distributed Parallel and Infiniband may crash or hang when run on a Linux system with an older kernel, most often at the point where a results file (including a backup or full transient file) is written.

PROBLEM:
The underlying problem is documented in the HP MPI release notes: "Applications using fork() might crash on configurations with Infiniband using OFED on kernels earlier than v2.6.18.". This problem generally hits CFX runs at the point where the monitor data is written by the CFX-Solver into a backup file or full transient file. It only occurs in Release 12.0 and later, since in earlier releases the CFX-Solver process itself never wrote the monitor data directly into the results file (this was handled only at the end of the run after the solution stage was complete). One instance of a hang at the start of the run was also resolved by upgrading the Linux kernel.

WORKAROUND:
Upgrade the Linux kernel to a kernel newer than v2.6.18.
If the older Linux kernel is retained, then most problems can be worked around by setting the expert parameter "merge monitor into backup files = f". This stops the CFX-Solver process from writing the monitor data into transient and backup files during the run itself (the data is still written into these files at the end of a successful run, as in Release 11.0).

FIXED IN:
No fix is currently available.





Show Form
No comments yet. Be the first to add a comment!