All Hands' Meeting 2010

Date: Thu, 14 Oct 2010 23:06:06 -0400 (EDT)
From: The Scientific Program Committee
To: All members of the USQCD Collaboration
Subject: [sdac] 2010/11 USQCD Call for Proposals


Dear Colleagues,

This message is a Call for Proposals for additional awards of 
time on USQCD computer resources during the current allocation 
period 2010/2011.

Upgrades of the GPU cluster at JLab and the acquisition of new 
hardware at Fermilab make it possible to distribute new resources 
for the second half of our regular allocation period, i.e. for the 
period January 2011 to June 2011.

In this call for proposals we expect to distribute

         1.79 M GPU-hours (489 GPUs) at JLab
         37.5 M Jpsi-core hours on the new Ds cluster at FNAL

Compared to our last call we thus will distribute an additional 30% of
cluster resources and about 130% additional GPU-hours.

For this Call rules apply as outlined in the last Call for Proposals
issued in February 2010 
(/meetings/allHands2010/call.html).

Important dates for the current Call for Proposals are;

   Deadline for the submission of proposals is   November 13th, 2010.
   Allocations will be announced on              December 15th, 2010.
   They may be used starting                     January 1st,   2011.


To apply for the resources detailed above you may either submit 
new project proposals of type A or type B or request additional 
allocations for your current project. In the latter case there 
is no need for a project description. You should, however, justify
the additional request. As usual you also should include storage 
requirements in your request.

We will NOT hold an All Hands Meeting to discuss new proposals. 
The Scientific Program Committee will discuss the proposals and 
will suggests allocations to the Executive Committee.


Some information on the resources available:

New cluster at FNAL:

245 node cluster ("Ds")
     32 cores per node
     64 GB memory/node
     1 Ds core-hour =  1.33 Jpsi-equivalent core-hour
     1 Ds node-hour = 43.56 Jpsi-equivalent core-hours
     total: 3600*254*43.56=  37.5 M Jpsi-equivalent core-hours

GPU cluster at JLab in short:

32 node cluster equipped with 4GPUs NVIDIA C2050 (Fermi Tesla)
46 node cluster equipped with 4GPUs GTX-480 (Fermi gaming card) 
32 node cluster equipped with 4GPUs GTX 285 (last year's model)
  2 node cluster equipped with 4GPUs GT200b Tesla (last year's model)
50 node cluster equipped with 1GPU  GTX285
       total: 3600*498 =  1,790,000 GPU hours

Further details and comments on the JLab GPU clusters:

1) 32 nodes of quad NVIDIA C2050 (Fermi Tesla) GPUs
    GPU memory (ECC on) 2.6 GB / GPU
    dual Intel Xeon 2.53 GHz quad core (Westmere)
    48 GB memory
    QDR Infiniband

    These are the most flexible nodes, in that the GPUs
    have ECC memory, so are suitable for everything that
    can be run on a GPU, even configuration generation.
    All 32 are on a single switch, so a job can exploit
    128 GPUs.

2) 46 nodes of quad GTX-480 (Fermi gaming card)
    1.5 GB memory / GPU
    20 are 2.53 GHz Xeon, 28 are 2.4 GHz Xeon
    48 GB memory
    SDR Infiniband, up to 20 nodes per switch

    These are the fastest nodes, 35% faster than Tesla.
    They exhibit a non-zero rate of memory errors, as
    high as one a minute under extreme testing, and so
    are only suitable for inverters, and the application
    must test the residual and discard results if too high.

3) 32 nodes of quad GTX 285 (last year's model)
    2 GB memory / GPU
    dual quad core 2.4 GHz Xeon (Nehalem)
    48 GB memory
    SDR Infiniband

    These are also gaming cards, but show only extremely
    rare memory errors (4+ orders of magnitude lower
    than the 480s).  Some people use these for more than
    inverters.

4) 2 nodes of quad GT200b Tesla (last year's model)
    4 GB memory / GPU
    dual quad core 2.4 GHz Xeon (Nehalem)
    48 GB memory
    SDR Infiniband

    While technically "Tesla" professional cards, these
    do not have ECC memory, and are essentially the
    same as the 285s with twice the memory but ~20%
    lower performance.

5) 50 nodes of single GTX285
    2 GB memory / GPU
    dual quad core 2.4 GHz Xeon (Nehalem)
    24 GB memory
    QDR Infiniband

    These nodes are actually a part of the 10q Infiniband
    cluster, and sometimes run non-GPU jobs.

    These nodes are suitable for small jobs, or for
    multi-GPU jobs in which more CPU cores are
    needed since there are 8 Xeon cores per GPU.
    Up to 32 nodes may be in a single job
    (one rack, full bandwidth, non oversubscribed QDR)

Measured anisotropic Clover inverter performance, on 
a 24^3 x 128 lattice, multi-GPU running, per GPU:

    MODEL   GFLOPS
    ------------   -----------
    C2050          189
    GTX480         256
    GTX285         154
    old Tesla      120

For further information see also http://lqcd.jlab.org/
Andreas Kronfeld
Legal Notices
Supplemental Call for Proposals