Fermilab Lattice QCD Computing Hardware
Modern computing hardware used in lattice gauge theory calculations
(such as in Fermilab's 127-node Myrinet cluster shown at right)
has a price/performance that is better than 50¢/Megaflop.
This can be compared with approximately $1,000,000/MF on the VAX 11/780s
on which the first numerical lattice calculations were done 20 years ago,
and $100/MF for Fermilab's ACPMAPS computer in the early 1990s.
Our current production clusters: a 120-node cluster (qcd) with single
2.8 GHz Pentium 4 processors and a Myrinet fabric, a 486-node cluster
(pion) with single 3.2 GHz Pentium 640 processors and an Infiniband
fabric, a 600-node cluster (kaon) with dual dual-core Opteron 270 (2.0 GHz)
processors and a double-data-rate Infiniband fabric and a 856-node cluster (jpsi)
with dual quad-core Opteron 2352 (2.1 GHz) processors and a double-data-rate Infiniband
fabric. The Pentium processors on
the qcd and pion clusters have an 800 MHz front side
bus. qcd uses DDR memory, and pion DDR2 memory. The Opteron processors on the
kaon and jpsi cluster also have 800 MHz front side buses, and use DDR memory.
Pictured on the left is one of the jpsi nodes.
The table below shows the measured performance of DWF and asqtad inverters on
the qcd, pion, kaon and jpsi clusters. For qcd and
pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice
per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging
the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The
DWF and asqtad performance figures for kaon use 128-process (32-node)
runs, with 4 processes per node, one process per core. The DWF and asqtad performance figures
for jpsi use 128-process (16-node) runs, with 8 processes per node, one process per core.
| Cluster | Processor | Nodes | DWF Performance |
asqtad Performance |
| qcd | 2.8 GHz Single CPU Single Core P4E | 127 | 1400 MFlops/node | 1017 MFlops/node |
| pion | 3.2 GHz Single CPU Single Core Pentium 640 | 486 | 1729 MFlops/node | 1594 MFlops/node |
| kaon | 2.0 GHz Dual CPU Dual Core Opteron | 600 | 4703 MFlops/node | 3832 MFlops/node |
| jpsi | 2.1 GHz Dual CPU Quad Core Opteron | 856 | 10061 MFlops/node | 9563 MFlops/node |
The pion cluster uses PCI Express Infiniband network interface cards in
each node and sixteen 24-port leaf and one 144-port spine Infiniband
switches. All nodes connect to the leaves and the leaves with 4:1
oversubscription (4 uplinks per leaf) connect to the spine. Each pion
node achieves a peak unidirectional bandwidth of 710 MB/sec and bidirectional
bandwidth of 1320 MB/sec. Pictured on the right is the 144-port Infiniband
spine switch.
|