USQCD Machine Performance
The table above shows the measured performance of DWF, anisotropic clover, and asqtad inverters on the jpsi, Ds, 9q and 10q clusters, on the ANL BG/P, and the ORNL XT5. All performance numbers are single precision unless otherwise noted.
The DWF, Clover and asqtad performance figures for jpsi, Ds, 9q and 10q used 128-process (16-node, 4-node, 16-node,and 16-node respectively) runs, with 8 or 32 processes per node, one process per core. DWF and Clover data were taken with Chroma. Clover runs used 6^3x64 local (per core) lattices, and DWF runs used 14x7x7x16 local (per core) lattices with Ls=16. The runs for asqtad used 14^4 local (per core) lattices. Clover and DWF performance measurements used the CG_INVERTER in Chroma.
The BG/P asqtad result is the average of the performance of 6^4 and 8^4 local volumes, and is single precision. The DWF result is double precision, using 4^4 (Ls=16) local volumes. The Clover result used 4096 cores.
The XT5 Clover performance figure is based on anisotropic Clover calculations on 32^3x256 global volume run on 24 cores (Robert Edwards) and HISQ runs on 64^3x128 lattices on 2k cores (Steve Gottlieb).
The final column of the table gives the Jpsi-equivalence for each of the USQCD resources. All except the Cray XT5 use the ratio of the average performance of asqtad and DWF; the XT5 uses the ratio of the average performance of the asqtad (HISQ) and clover inverters.