FNAL - LQCD Documentation

New Users

User Authentication

Building your code - The Runtime Environment

Submitting jobs to the TORQUE Batch System

Project Allocations

Software Documentation Details

Hardware Details

Filesystem Details

Mass/Tape Storage Details

Transferring Files

Compilers

FAQs

FAQs

  1. What is the best way to contact the LQCD system administrators at Fermilab ?
  2. Is there an email list to contact the other users ?
  3. How do I access the archives of the mails send to the email lists ?
  4. How do I check my project allocation balance?
  5. Is there a command to check the status (running, free, busy etc.) of cluster worker nodes?
  6. I am unable to delete my job, and I am getting an email message every 10s.?
  7. What storage space do I use?
  8. Why should I use fcp instead of rcp?
  9. Is one hour utililization of kaon node the same as one hour utilization of pion node?
  10. What are the batch queue scheduler terms soft and hard node limits?

  1. What is the best way to contact the LQCD system administrators at Fermilab?

    The best way to contact the LQCD system administrators is by sending an email to lqcd-admin@fnal.gov.

  2. Is there an email list to contact the other users?

    Yes, the email list lqcd-users@fnal.gov has been setup for that purpose.

  3. How do I access the archives of the mails send to the email lists?

    The archives of the lqcd-admin and lqcd-users email lists are available at:
    http://listserv.fnal.gov/archives/lqcd-admin.html
    http://listserv.fnal.gov/archives/lqcd-users.html

  4. How do I check my project allocation balance?

    Run the following command on lqcd.fnal.gov:

    [@lqcd]$> /usr/local/bin/lquota
    As of Wed Nov 1 10:01, account lqcdproject used 
    1230274 of 4245202 allocated nodehours, which is 29.0 %
    of total allocation
    

    Note: lquota prints the project allocation balance for all Fermilab LQCD clusters.

  5. Is there a command to check the status (running, free, busy etc.) of cluster worker nodes?

    On lqcd.fnal.gov, which is the head-node for the QCD cluster:

    [@lqcd]$> /usr/local/bin/lqstat -h
    
    Print the number of free, busy, down or offline
    nodes in the Fermilab LQCD cluster.
    
    Usage: lqstat [print-option] 
    
    PRINT-OPTION free,busy,down,offline,online
    
    ** no option will print the number of free nodes
    of all types. There are two types of nodes
    

    On kaon1.fnal.gov, which is the head-node for the KAON and PION(64-bit) cluster:

    [@kaon2]$ /usr/local/bin/lqstat -h
    
    Print the number of free, busy, down or offline
    
    nodes in the Fermilab LQCD cluster.
    
    Usage: lqstat [print-option]
    
    PRINT-OPTION free,busy,down,offline,online
    
    ** no option will print the number of free nodes.
    
          kaon  (AMD Opteron dual-processor dual-core, 64-bit)
          pion  (Intel P4 single-processor, 64-bit)
    
          printed in the following order
    
          kaon pion
    

  6. I am unable to delete my job, and I am getting an email message every 10s.?

    This often occurs due to one or more cluster worker nodes crashing as a result of hardware failure. When a cluster worker node crashes, the PBS server has no way to contact the PBS client on the failed node. The server attempts to kill or rerun the job every 10 seconds and almost always fails, generating an email message at every attempt. If you encounter this problem please contact the Fermilab LQCD system administrators by sending email to lqcd-admin@fnal.gov.

  7. What storage space do I use?

    The following is a summary of available storage space on the Fermilab LQCD clusters. All attempts have been made to keep this table current.

    Area
    Description
    /project/xxxxx Area typically used for approved projects.
    Visible on all cluster worker nodes via NFS.
    Backups nightly. Suitable for output logs, meson correlators and other small data files NOT suitable for fields e.g configs, quark propagators
    /home/<username>
    Home area. Backups nightly. Visible on all cluster worker nodes via NFS.
    Not suitable for configs or props.
    Can be used as "run" directory for light production or testing.
    Quota of about 4 GB per home directory.
    /data/raidX
    Raid storage. NO backups.
    Visible on all head-nodes ONLY.
    No quotas, unmanaged common area available to all users.
    Individual disks are subject to filling up.
    Must use rcp or rsync to copy data files from cluster worker nodes to head-nodes.
    Suitable for configurations and propagators.
    /pnfs/volatile
    Dcache storage. NO backups.
    Visible on all cluster worker nodes.
    Ideal for temporary storage (~month) of very large data files.
    Must use special copy command: 'dccp'
    File deleted by a "Least Recenty Used" policy when space is tight.
    /lqcdproj
    Lustre storage. NO backups.
    Visible on all cluster worker nodes.
    Ideal for temporary storage (~month) of very large data files.
    Disk space usage monitored and disk quotas enforced.

  8. Why should I use fcp instead of rcp?

    If your jobs need to copy data to/from areas in /data/raidx, or /project and you are currently using commands like

        rcp kaon1:/data/raid4/my.file /scratch/my.file
    or
        rcp lqcd:/data/raid4/my.file /scratch/my.file
    

    please instead use

        fcp kaon1:/data/raid4/my.file /scratch/my.file
    or
        fcp lqcd:/data/raid4/my.file /scratch/my.file
    

    The simplest invocation of fcp:

        fcp   src.file   kaon1:dst.file
    

    will use rcp to do the transfer and can only transfer a single file (that is, wildcards will not work). Further, switches cannot be passed to rcp.

    If you want to use wildcards, or add switches to rcp such as -r (recursive copy) or -p (preserve attributes), use instead the form:

        fcp  -c rcp  [switches] src dest:dst
    

    For example:

        fcp  -c rcp -r my_root_dir  kaon1:dest_dir/
    

    will do a recursive copy of my_root_dir to dest_dir/ on kaon1, and

        rcp  -c rcp file_spec_* kaon1:dest_dir/
    

    will transfer all files named file_spec_* to dest_dir on kaon1.

    You can specify any command with "-c", so if your scripts use rsync rather than rcp, try

       fcp  -c rsync  [rsync switches] src dst 
    

    "fcp" has the same command syntax as "rcp". Unlike rcp, fcp throttles access to individual file systems so that only a limited number of accesses are attempted at a time (currently set to 2 per filesystem). Your fcp command will block, waiting in line, until the file access is finished.

    Overall throughput from these disk areas is higher when we limit the number of simultaneous I/O transactions. When many transactions occur simultaneously, data throughput is limited by motion of the heads on the disk drives.

  9. Is one hour utililization of kaon node the same as one hour utilization of pion node?

    Projects are billed in equivalent 6n node hours. The conversion factors below are from the 2008 USQCD call for proposals. Charges are based on node hours actually used by a job and not the requested maximum walltime. The surest way to maximize your physics output per charge unit is to benchmark your code on both kaon and pion.

    USQCD has clusters with several kinds of nodes, from single-processor, single-core, to dual-processor, quad-core. The Scientific Program Committee will use the following table to convert:

            1 QCDOC node-hour = 0.122 6n node-hour
    	1 pion  node-hour = 0.683 6n node-hour
    	1  6n   node-hour = 1     6n node-hour
    	1 kaon  node-hour = 1.757 6n node-hour
    	1  7n   node-hour = 3.1   6n node-hour
            1 J/psi node-hour = 4.04  6n node-hour
            1 BG/P  node-hour = 1.08  6n node-hour
            1 XT4   node-hour = 2.22  6n node-hour 
    

    The above numbers are based on the average of asqtad and DWF fermion inverters. In the case of XT4 we used the average of asqtad and clover inverters. See http://lqcd.fnal.gov/performance.html for details.

  10. What are the batch queue scheduler terms soft and hard node limits?

    Please refer to this section of our documentation.

usqcd-webmaster@usqcd.org