If I start 2 serial solvers on a dual processor Opteron machine they seem to both make use of both processors and I am worried that this decreases performance. Can I set it up so that one of the serial solvers uses one of the processors and the other uses the second processor? The workstation is a HP XW9300 workstation (AMD Opteron 64-bit processor) with Linux RedHat Enterprise 4 installed on it.


For the first processor (CPU0) run:
/usr/bin/numactl --cpunodebind=0 `-localalloc cfx5solve `def etc`

For the second processor (CPU1) run:
/usr/bin/numactl --cpunodebind=1 `-localalloc cfx5solve `def etc`

Note: you should not use this for parallel runs.

Extra information:

A dual-CPU Opteron looks something like this:

[Memory bank 0] <------> [CPU0] <------> [CPU1] <------> [Memory bank 1]

CPU0 can access memory bank 0 approx twice as fast a memory bank 1 (non uniform memory access or numa). CPU0 & memory bank 0 is known as a numa node:

[Memory bank 0] <------> [CPU0] <------> [CPU2] <------> [Memory bank 1]
[---------NUMA Node 0---------------] [---------NUMA Node 1----------]

The most efficient setup would be to have the first solver running on CPU0 with all of its memory allocated on memory bank 0 and the second solver running on CPU1 with all of its memory allocated on memory bank 1.

This command will lock the process to node 0 and specify that memory should only be allocated from that node:

/usr/bin/numactl --cpunodebind=0 `-localalloc cfx5solve `def etc`

If there is not enough memory available on that node then the solver will fail. If this occurs then either free memory on that node (see below) or remove the `localalloc option.

numactl -`hardware will list the available memory on each node

available: 2 nodes (0-1)
node 0 size: 3277 MB
node 0 free: 1352 MB
node 1 size: 4040 MB
node 1 free: 3208 MB

node distances:
node 0 1
0: 10 20
1: 20 10A large amount of memory on node 0 is often in use bythe linux file cache (to improve the overall system performance). There is no easy way to clear this, but a trick is to create a very large file on local disk (e.g 10GB) and then remove it.

Performance for each serial solver could be improved by up to 10% using this method.





Show Form
No comments yet. Be the first to add a comment!