USQCD Machine Performance
The table above shows the measured performance of DWF, anisotropic clover, and asqtad inverters on the jpsi, Ds, 9q and 10q clusters, on the ANL BG/P, and the ORNL XT5. All performance numbers are single precision unless otherwise noted.
The DWF, Clover and asqtad performance figures for jpsi, Ds, 9q and 10q used 128-process (16-node, 4-node, 16-node,and 16-node respectively) runs, with 8 or 32 processes per node, one process per core. DWF and Clover data were taken with Chroma. Clover runs used 6^3x64 local (per core) lattices, and DWF runs used 14x7x7x16 local (per core) lattices with Ls=16. The runs for asqtad used 14^4 local (per core) lattices. Clover and DWF performance measurements used the CG_INVERTER in Chroma.
The DWF, Clover and asqtad performance figures for 12s are estimates taken from single node benchmarks and an assumed 0.9 scaling factor between single node (16 rank) and eight node (128 rank) runs.
The BG/P asqtad result is the average of the performance of 6^4 and 8^4 local volumes, and is single precision. The DWF result is double precision, using 4^4 (Ls=16) local volumes. The Clover result used 4096 cores.
The XT5 Clover performance figure is based on anisotropic Clover calculations on 32^3x256 global volume run on 24 cores (Robert Edwards) and HISQ runs on 64^3x128 lattices on 2k cores (Steve Gottlieb).
The final column of the table gives the Jpsi-equivalence for each of the USQCD resources. All except the Cray XT5 use the ratio of the average performance of asqtad and DWF; the XT5 uses the ratio of the average performance of the asqtad (HISQ) and clover inverters.