USQCD Machine Performance
| Machine | Processor | Nodes | DWF per node | Clover per node | asqtad per node |
| pion | 3.2 GHz Single CPU Single Core Pentium 940 | 518 | 1729 MFlops | 1120 MFlops | 1594 MFlops |
| kaon | 2.0 GHz Dual CPU Dual Core Opteron | 600 | 4696 MFlops | 3180 MFlops | 3832 MFlops |
| jpsi | 2.1 GHz Dual CPU Quad Core Opteron | 586 | 10061 MFlops | 7423 MFlops | 9563 MFlops |
| 7N | 1.9 GHz Dual CPU Quad Core Opteron | 396 | 8800 MFlops | 5148 MFlops | 6300 MFlops |
| 6N | 3.0 GHz Single CPU Dual Core Pentium | 256 | 2900 MFlops | 1408 MFlops | 1960 MFlops |
| 4G | 2.8 GHz Single CPU Single Core Xeon | 384 | 1582 MFlops | 636 MFlops | 1249 MFlops |
| QCDOC | 400 MHz PPC Core estimated | 12288 | 336 MFlops | | 360 MFlops |
| BlueGene/P | 850 MHz Quad Core PowerPC 850 | 4 cores/node, 1024 nodes/rack | 2560 MFlops | | 2680 MFlops |
| Cray XT4 | 2.6 GHz Dual Core Opteron | ??? | 2660 MFlops | 2340 MFlops | |
The table above shows the measured performance of DWF, anisotropic clover, and asqtad inverters on
the pion, kaon, jpsi, 6N, and 4G clusters, and on the ANL BG/P, the ORNL XT4
and the QCDOC. For qcd and
pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice
per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging
the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The
DWF, Clover and asqtad performance figures for kaon, jpsi, 6N, and 7N use 128-process (32-node, 16-node, 64-node,
and 16-node respectively)
runs, with 4, 2, or 8 processes per node, one process per core. Clover performance on 7N used
128 processes with 4^3x8 local volumes per process. The DWF and Clover performance
runs for 4G used single panels (128 node jobs, 1 core/node) with mesh
layouts of 1x4x4x8.
The QCDOC DWF (double precision) and asqtad (single precision) estimates are based on the observed peak performance of the
double precision conjugate gradient codes on early motherboards, scaled to 400 MHz.
The BG/P and XT4 DWF performance measurements used local volumes of
4^4 (Ls=16) and 6x6x6x4 per core, respectively. The BG/P asqtad result is the average of the performance of 6^4 and 8^4 local
volumes, and is single precision. The BG/P DWF result is double precision.
|
|