USQCD Machine Performance

MachineProcessor per nodetotal no. of nodestotal no. of coresDWF per nodeClover per nodeasqtad per nodeJpsi Equivalence
jpsi2.1 GHz Dual CPU Quad Core Opteron856684810061 MFlops7423 MFlops9563 MFlops1 Jpsi-core-hour
ds2.0 GHz Quad CPU Eight Core Opteron245784051520 MFlops42048 MFlops50547 MFlops1.33 Jpsi-core-hour
9q2.4 GHz Dual CPU Quad Core Nehalem320256019928 MFlops15056 MFlops18128 MFlops1.96 Jpsi-core-hour
10q2.53 GHz Dual CPU Quad Core Nehalem224179220408 MFlops15656 MFlops18046 MFlops2.00 Jpsi-core-hour
BlueGene/P850 MHz Quad Core PowerPC 850 1024 per rack4096 per rack2560 MFlops2511 MFlops2680 MFlops0.54 Jpsi-core-hour
Cray XT52.1 GHz Quad Core Opteron783231328-2232 MFlops2260 MFlops0.50 Jpsi-core-hour


The table above shows the measured performance of DWF, anisotropic clover, and asqtad inverters on the jpsi, Ds, 9q and 10q clusters, on the ANL BG/P, and the ORNL XT5. All performance numbers are single precision unless otherwise noted.

The DWF, Clover and asqtad performance figures for jpsi, Ds, 9q and 10q used 128-process (16-node, 4-node, 16-node,and 16-node respectively) runs, with 8 or 32 processes per node, one process per core. DWF and Clover data were taken with Chroma. Clover runs used 6^3x64 local (per core) lattices, and DWF runs used 14x7x7x16 local (per core) lattices with Ls=16. The runs for asqtad used 14^4 local (per core) lattices. Clover and DWF performance measurements used the CG_INVERTER in Chroma.

The BG/P asqtad result is the average of the performance of 6^4 and 8^4 local volumes, and is single precision. The DWF result is double precision, using 4^4 (Ls=16) local volumes. The Clover result used 4096 cores.

The XT5 Clover performance figure is based on anisotropic Clover calculations on 32^3x256 global volume run on 24 cores (Robert Edwards) and HISQ runs on 64^3x128 lattices on 2k cores (Steve Gottlieb).

The final column of the table gives the Jpsi-equivalence for each of the USQCD resources. All except the Cray XT5 use the ratio of the average performance of asqtad and DWF; the XT5 uses the ratio of the average performance of the asqtad (HISQ) and clover inverters.