Over the past twelve years, a wide range of processors and communications
systems has been evaluated, and both switched and mesh communications
systems have been studied. Myrinet and InfiniBand fabrics have been
tested for switched clusters, and gigabit ethernet has been used
for the mesh ones. Current clusters are all based on Infiniband,
with the most recent clusters at JLab and FNAL including GPU
acceleration. The conventional (non-GPU) clusters provide in 2012
a total throughput of approximately 40 Teraflop/s
on lattice QCD production code. (This is the equivalent of 140 to 180
TFlops in terms of the Linpack benchmarks.) GPU-accelerated clusters
at JLab and FNAL provide a total of 620 NVidia Fermi-class
GPUs. The most recent conventional cluster built at JLab is shown
at left. It consists of 224 nodes of Intel Westmere
(quad-core) CPUs connected via QDR InifiniBand switched networks,
with a throughput of 4.3 TFops on lattice QCD code. Ds, the latest conventional
cluster built at FNAL, consists of 421 nodes with 2.0 GHz quad CPU
eight core Opterons with a QDR Infiniband fabric, sustaining 21.5
TFlops on lattice QCD code. This cluster is shown at the right.