FNAL - LQCD Documentation

New Users / Account Renewal

User Authentication

Kerberos and SSH Troubleshooting

Submitting jobs to the TORQUE Batch System

Project Allocations

Software Documentation Details

Hardware Details

Filesystem Details

Mass/Tape Storage Details

Transferring Files

Compilers

Using GPUs

FAQs

Contact Us

Fermilab Lattice QCD Computing Hardware

Modern computing hardware used in lattice gauge theory calculations (such as in Fermilab's 127-node Myrinet cluster shown to the right) had a price/performance that was better than 50¢/Megaflop. This can be compared with approximately $1,000,000/MF on the VAX 11/780s on which the first numerical lattice calculations were done 20 years ago, and $100/MF for Fermilab's ACPMAPS computer in the early 1990s.




Our past and current production clusters are:

  1. 120-node cluster (qcd) (decommissioned April 2010) with single-socket 2.8 GHz Pentium 4 processors and a Myrinet fabric,
  2. 486-node cluster (pion) (decommissioned April 2010) with single-socket 3.2 GHz Pentium 640 processors and an Infiniband fabric,
  3. 600-node cluster (kaon) (decommissioned August 2013) with dual-socket dual-core Opteron 270 (2.0 GHz) processors and a double-data-rate Infiniband fabric,
  4. 856-node cluster (jpsi) (decommissioned May 19 2014) with dual-socket quad-core Opteron 2352 (2.1 GHz) processors and a double-data-rate Infiniband fabric and
  5. 420-node cluster (ds) with quad-socket eight-core Opteron 6128 (2.0 GHz) processors and a quad-data-rate Infiniband fabric.
  6. 76-node cluster (dsg) with dual-socket four-core Intel Xeon E5630 processors, two NVidia Tesla M2050 GPUs per node and a quad-data-rate Infiniband fabric.
  7. 224-node cluster (bc) with quad-socket eight-core Opteron 6320 (2.8 GHz) processors and a quad-data-rate Infiniband fabric.
  8. 314-node cluster (pi0) with dual-socket eight-core Intel E5-2650v2 "Ivy Bridge" (2.6 GHz) processors and a quad-data-rate Infiniband fabric.
  9. 32-node cluster (pi0g) with dual-socket eight-core Intel E5-2650v2 "Ivy Bridge" (2.6 GHz) processors, four NVidia Tesla K40m GPUs per node and a quad-data-rate Infiniband fabric.

The Pentium processors on the qcd and pion clusters had an 800 MHz front side bus. qcd used DDR memory, and pion DDR2 memory. The Opteron processors on the kaon also have 800 MHz front side buses, and use DDR memory. The processors on the jpsi cluster use 1066 MHz DDR memory, on the ds cluster use 1333 MHz DDR memory, on the bc cluster use 1866 MHz DDR memory and on the pi0 cluster use 1866 MHz DDR3 memory. Pictured on the top left is one of the jpsi nodes.

The table below shows the measured performance of DWF and asqtad inverters on all the Fermilab LQCD clusters. For qcd and pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The DWF and asqtad performance figures for kaon use 128-process (32-node) runs, with 4 processes per node, one process per core. The DWF and asqtad performance figures for jpsi use 128-process (16-node) runs, with 8 processes per node, one process per core. The DWF and asqtad performance figures for ds and bc use 128-process (4-node) runs, with 32 processes per node, one process per core.

Cluster Processor Nodes DWF
Performance
asqtad Performance
qcd 2.8GHz Single CPU Single Core P4E 127 1400 MFlops/node 1017 MFlops/node
pion 3.2GHz Single CPU Single Core Pentium 640 486 1729 MFlops/node 1594 MFlops/node
kaon 2.0GHz Dual CPU Dual Core Opteron 600 4703 MFlops/node 3832 MFlops/node
jpsi 2.1GHz Dual CPU Quad Core Opteron 856 10061 MFlops/node 9563 MFlops/node
ds 2GHz Quad CPU Eight Core Opteron 420 51520 MFlops/node 50547 MFlops/node
bc 2.8GHz Quad CPU Eight Core Opteron 224 57408 MFlops/node 56224 MFlops/node
pi0 2.6GHz Dual CPU Eight Core Intel 314 78310 MFlops/node 61490 MFlops/node

The jpsi cluster used PCI Express Infiniband network interface cards in each node, 50 24-port leaf Infiniband switches and one 288-port spine Infiniband switch. All nodes connected to the leaves and the leaves with 3:1 oversubscription (6 uplinks per leaf) connected to the spine. Each jpsi node achieved a peak (measured over MPI) unidirectional bandwidth of 1315 MB/sec and bidirectional bandwidth of 2160 MB/sec. While jpsi used double data rate Infiniband, the ds, bc and pi0 cluster uses quad rate Infiniband, with measured maximum unidirectional bandwidth of 2640 MB/sec and maximum bidirectional bandwidth of 4980 MB/sec. Pictured on the right is a 288-port Infiniband spine and a 36-port switch.








usqcd-webmaster@usqcd.org