If I start 2 serial solvers on a dual processor Opteron machine they seem to both make use of both processors and I am worried that this decreases performance. Can I set it up so that one of the serial solvers uses one of the processors and the other uses the second processor? The workstation is a HP XW9300 workstation (AMD Opteron 64-bit processor) with Linux RedHat Enterprise 4 installed on it.
For the first processor (CPU0) run: /usr/bin/numactl --cpunodebind=0 `-localalloc cfx5solve `def etc` For the second processor (CPU1) run: /usr/bin/numactl --cpunodebind=1 `-localalloc cfx5solve `def etc` Note: you should not use this for parallel runs. Extra information: A dual-CPU Opteron looks something like this: [Memory bank 0] <------> [CPU0] <------> [CPU1] <------> [Memory bank 1] CPU0 can access memory bank 0 approx twice as fast a memory bank 1 (non uniform memory access or numa). CPU0 & memory bank 0 is known as a numa node: [Memory bank 0] <------> [CPU0] <------> [CPU2] <------> [Memory bank 1] [---------NUMA Node 0---------------] [---------NUMA Node 1----------] The most efficient setup would be to have the first solver running on CPU0 with all of its memory allocated on memory bank 0 and the second solver running on CPU1 with all of its memory allocated on memory bank 1. This command will lock the process to node 0 and specify that memory should only be allocated from that node: /usr/bin/numactl --cpunodebind=0 `-localalloc cfx5solve `def etc` If there is not enough memory available on that node then the solver will fail. If this occurs then either free memory on that node (see below) or remove the `localalloc option. numactl -`hardware will list the available memory on each node available: 2 nodes (0-1) node 0 size: 3277 MB node 0 free: 1352 MB node 1 size: 4040 MB node 1 free: 3208 MB node distances: node 0 1 0: 10 20 1: 20 10A large amount of memory on node 0 is often in use bythe linux file cache (to improve the overall system performance). There is no easy way to clear this, but a trick is to create a very large file on local disk (e.g 10GB) and then remove it. Performance for each serial solver could be improved by up to 10% using this method. |
||
|