FNAL - LQCD Documentation

New Users / Account Renewal

User Authentication

Kerberos and SSH Troubleshooting

Submitting jobs to the TORQUE Batch System

Project Allocations

Software Documentation Details

Hardware Details

Filesystem Details

Mass/Tape Storage Details

Transferring Files

Compilers

Using GPUs

FAQs

Contact Us

Quick Links

  1. Compilers and Communication Libraries
  2. pi0 cluster: Available Compilers
  3. pi0 cluster: Launching jobs
  4. pi0g cluster: Compiling code & launching jobs
  5. Bc cluster: Available Compilers
  6. Bc cluster: Launching jobs
  7. Dsg cluster: Compiling code & launching jobs
  8. Ds cluster: Available Compilers
  9. Ds cluster: Launching jobs
  10. Jpsi cluster: Available Compilers (Cluster decomissioned)
  11. Jpsi cluster: Launching jobs (Cluster decomissioned)
  12. kaon cluster: Available Compilers (Cluster decomissioned)
  13. kaon Cluster: Launching jobs (Cluster decomissioned)
  14. Compiling an MPI application

1. Compilers and Communication Libraries

The tables below list all the compilers installed on the various Fermilab LQCD clusters. Unix man page documentation is available for individual compilers. The Portland Group compilers are not binary compatible with GNU meaning that they will not accept object files or libraries created by the GNU compilers. Compiler drivers, such as mpicc, are provided for use when compiling programs containing MPI message passing directives.

2. Compilers Available on the pi0 cluster (cluster head-node: pi0.fnal.gov)

Compiler Name and Version Binary Location Binary Name(s)
Gnu Compilers Version 4.4.7 /usr/bin g++, gcc, gfortran
Gnu Compilers Version 4.5.1(*)
/usr/local/gcc-4.5.1/bin g++, gcc, gfortran
Gnu Compilers Version 4.7.4 (*)
/usr/local/gcc-4.7.4/bin g++, gcc, gfortran
Gnu Compilers Version 4.9.1 (*)
/usr/local/gcc-4.9.1/bin g++, gcc, gfortran
Gnu Compilers Version 5.1.0 (*)
/usr/local/gcc-5.1.0/bin g++, gcc, gfortran
Intel C/Fortran Compiler (**) /usr/local/intel/bin icc, ifort
MVAPICH Version 1.2rc1
/usr/local/mvapich/bin
mpicc, mpif77, mpif90
MVAPICH2 Version 2.0
/usr/local/mvapich2/bin
mpicc, mpif77, mpif90
Open MPI Version 1.8.1
/usr/local/openmpi/bin
mpicc, mpif77, mpif90
Open MPI Version 1.8.2
/usr/local/openmpi-1.8.2/bin
mpicc, mpif77, mpif90

(*) For any of the above compilers in /usr/local, at compile time add the lib and lib64 directories to your LD_LIBRARY_PATH, e.g.

$> export PATH=/usr/local/gcc-4.9.1/bin:$PATH
$> export LD_LIBRARY_PATH=/usr/local/gcc-4.9.1/lib:/usr/local/gcc-4.9.1/lib64

At run time add the lib64 directory to your LD_LIBRARY_PATH.

You may also add these lines to your .bashrc file for permanence. For gcc-4.7.4 substitute 4.7.4 for 4.9.1 in the above export commands.

(**) Use the following:

$> source /usr/local/intel/bin/iccvars.sh intel64
or
source /usr/local/intel/bin/iccvars.csh intel64

to setup $PATH and $LD_LIBRARY_PATH. Do so at runtime as well to resolve shared libraries.

You may also add these lines to your .bashrc file for permanence.

3. Launching jobs on the pi0 cluster (cluster head-node: pi0.fnal.gov)

The following job launch procedure applies when using MVAPICH. pi0 nodes are eight-core, dual processor, so each node will run up to 16 processes at one process per core.

Use /usr/local/mvapich/bin/mpirun as follows to launch jobs:

mpirun - will automatically launch 16 processes on each pi0 node.

The nodes listed in $PBS_NODEFILE will each be used 16 times, and will be assigned 16 consecutive MPI rank numbers. For example:

         qsub -q pi -l nodes=4 my.script     # request four nodes
         # then, in the run script:
         mpirun -np 64 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 pi0 nodes, eg, pi101-pi104, and on each of these nodes 16 instances of "milc.binary" will be run.

"mpirun" is actually mpiexec from OSC (see http://www.osc.edu/~djohnson/mpiexec/index.php). Use "mpirun --help" to see additional information.

The following job launch procedure applies when using MVAPICH2.

Use /usr/local/mvapich2/bin/mpirun as follows to launch jobs:

mpirun - will automatically launch 16 processes on each pi0 node.

The nodes listed in $PBS_NODEFILE will each be used 16 times, and will be assigned 16 consecutive MPI rank numbers. For example:

         qsub -q pi -l nodes=4 my.script     # request four nodes
         # then, in the run script:
         mpirun -np 64 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 bc nodes, eg, pi101-pi104, and on each of these nodes 16 instances of "milc.binary" will be run.

"mpirun" is a wrapper script for mpirun_rsh. Use "mpirun_rsh --help" to see available switches. "mpirun" will supply the "-rsh" and "-hostfile" arguments for you, so do not include these switches.

The following job launch procedure applies when using OpenMPI.

Use /usr/local/openmpi/bin/mpirun as follows to launch jobs:

mpirun - will automatically launch 16 processes on each pi0 node.

The nodes listed in $PBS_NODEFILE will each be used 16 times, and will be assigned 16 consecutive MPI rank numbers. For example:

         qsub -q pi -l nodes=4 my.script     # request four nodes
         # then, in the run script:
         mpirun -np 64 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 pi nodes, eg, pi101-pi104, and on each of these nodes 16 instances of "milc.binary" will be run.

"mpirun" is supplied with openmpi (see http://www.open-mpi.org/faq/). Use "mpirun -h" to see a list of switches.

4. Compiling code & launching jobs on the pi0g cluster (cluster head-node: pi0.fnal.gov)

Each pi0g cluster worker node has a pair of eight-core Intel processors (16 cores total) and four Nvidia K40m GPUs. Each GPU has, with ECC enabled, 11.5GB available memory per GPU.

As with all of the Fermilab clusters your jobs on the pi0g cluster will get exclusive access to the entire pi0g worker node and so your jobs will be billed four K40m GPU hours for each pi0g node hour.

CUDA 6 is installed at /usr/local/cuda-6.0 and /usr/local/cuda is a symbolic link to this directory. At runtime, you will need to set the environment variable

   LD_LIBRARY_PATH=/usr/local/cuda/lib64

A newer version of CUDA 6.5 is installed at /usr/local/cuda-6.5. To use this version, at runtime, you will need to set the environment variable

   LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64

Note that the CUDA kernel driver installed on all pi0g nodes (340.29) is compatible with both cuda-6.0 and cuda-6.5.

Our standard MPI's are MVAPICH, MVAPICH2 and OPENMPI, available at /usr/local/{mvapich,mvapich2,openmpi} respectively. These are all symbolic links; use ls -l to see the full path, which will include the version number.

5. Compilers Available on the Bc cluster (cluster head-node: bc1.fnal.gov)

Compiler Name and Version Binary Location Binary Name(s)
Gnu Compilers Version 4.1.2 /usr/bin g++, gcc, gfortran
Gnu Compilers Version 4.5.1 (*)
/usr/local/gcc-4.5.1/bin g++, gcc, gfortran
Gnu Compilers Version 4.8.4 (*)
/usr/local/gcc-4.8.4/bin g++, gcc, gfortran
Gnu Compilers Version 4.9.1 (*)
/usr/local/gcc-4.9.1/bin g++, gcc, gfortran
Gnu Compilers Version 5.1.0 (*)
/usr/local/gcc-5.1.0/bin g++, gcc, gfortran
MVAPICH Version 1.2
/usr/local/mvapich-1.2.0/bin
mpicc, mpif77, mpif90
MVAPICH2 Version 1.9
/usr/local/mvapich2-1.9/bin
mpicc, mpif77, mpif90
Open MPI Version 1.4.3
/usr/local/openmpi-1.4.3/bin
mpicc, mpif77, mpif90

(*) If you get the "error while loading shared libraries" then use the following:

Since the GCC 4.5.1 libraries conflict with the default GCC 4.1.2, users are required to do the following to point to the correct shared libraries when invoking the 4.5.1 GCC compiler command:

$> export PATH=/usr/local/gcc-4.5.1/bin:$PATH
$> export LD_LIBRARY_PATH=/usr/local/gcc-4.5.1/lib:/usr/local/gcc-4.5.1/lib64:$LD_LIBRARY_PATH

You may also add these lines to your .bashrc file for permanence. For gcc-4.8.1 substitute 4.8.1 for 4.5.1 in the above export commands.

6. Launching jobs on the Bc cluster (cluster head-node: bc1.fnal.gov)

The following job launch procedure applies when using MVAPICH. Bc nodes are eight-core, quad processor, so each node will run up to 32 processes at one process per core.

Use /usr/local/mvapich/bin/mpirun as follows to launch jobs:

mpirun - will automatically launch 32 processes on each bc node.

The nodes listed in $PBS_NODEFILE will each be used 32 times, and will be assigned 32 consecutive MPI rank numbers. For example:

         qsub -q bc -l nodes=4 my.script     # request four nodes
         # then, in the run script:
         mpirun -np 128 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 bc nodes, eg, bc0101-bc0104, and on each of these nodes 32 instances of "milc.binary" will be run.

"mpirun" is actually mpiexec from OSC (see http://www.osc.edu/~djohnson/mpiexec/index.php). Use "mpirun --help" to see additional information.

The processors on Bc have non-uniform paths to system memory. Your jobs may run with higher performance if you explicitly bind processes to NUMA nodes, and set a local memory allocation policy. To do this, at the beginning of your job script, do

      source /usr/local/mvapich/etc/mvapich.conf.sh
                   or
      source /usr/local/mvapich/etc/mvapich.conf.csh

and on your mpirun commands, invoke your binary via the wrapper numa_32_mv:

      mpirun -np xxx /usr/local/mvapich/bin/numa_32_mv ./your.binary 

(assuming that "your.binary" is in your current working directory, otherwise substitute the full path for your.binary).

The following job launch procedure applies when using MVAPICH2.

Use /usr/local/mvapich2/bin/mpirun as follows to launch jobs:

mpirun - will automatically launch 32 processes on each bc node.

The nodes listed in $PBS_NODEFILE will each be used 32 times, and will be assigned 32 consecutive MPI rank numbers. For example:

         qsub -q bc -l nodes=4 my.script     # request four nodes
         # then, in the run script:
         mpirun -np 128 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 bc nodes, eg, bc0101-bc0104, and on each of these nodes 32 instances of "milc.binary" will be run.

"mpirun" is a wrapper script for mpirun_rsh. Use "mpirun_rsh --help" to see available switches. "mpirun" will supply the "-rsh" and "-hostfile" arguments for you, so do not include these switches.

The processors on Bc have non-uniform paths to system memory. Your jobs may run with higher performance if you explicitly bind processes to NUMA nodes, and set a local memory allocation policy. To do this, at the beginning of your job script, do

      source /usr/local/mvapich2/etc/mvapich2.conf.sh
                   or
      source /usr/local/mvapich2/etc/mvapich2.conf.csh

and on your mpirun commands, invoke your binary via the wrapper numa_32_mv2:

      mpirun -np xxx /usr/local/mvapich2/bin/numa_32_mv2 ./your.binary

(assuming that "your.binary" is in your current working directory, otherwise substitute the full path for your.binary). If you desire to use fewer than 32 MPI ranks per node, add an integer argument immediately after mpirun:

      mpirun nn -np xxx ...

where nn is the number of cores to use per node. If nn is invalid, mpirun will default to using 32 cores per node.

The following job launch procedure applies when using OpenMPI.

Use /usr/local/openmpi/bin/mpirun as follows to launch jobs:

mpirun - will automatically launch 32 processes on each bc node.

The nodes listed in $PBS_NODEFILE will each be used 32 times, and will be assigned 32 consecutive MPI rank numbers. For example:

         qsub -q bc -l nodes=4 my.script     # request four nodes
         # then, in the run script:
         mpirun -np 128 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 bc nodes, eg, bc0101-bc0104, and on each of these nodes 32 instances of "milc.binary" will be run.

"mpirun" is supplied with openmpi (see http://www.open-mpi.org/faq/). Use "mpirun -h" to see a list of switches.

The processors on Bc have non-uniform paths to system memory. In general openmpi does a very good job of binding MPI ranks to cores and ensuring that memory allocations are local. However, your jobs may run with higher performance if you explicitly bind processes to NUMA nodes, and set a local memory allocation policy. To do this, on your mpirun commands, invoke your binary via the wrapper numa_32_ompi:

      mpirun -np xxx /usr/local/openmpi/bin/numa_32_ompi ./your.binary

(assuming that "your.binary" is in your current working directory, otherwise substitute the full path for your.binary). You will also need to set the openmpi parameter "mpi_paffinity_alone" to 1; this may be done in a file ~/.openmpi/mca-params.conf with the line

   mpi_paffinity_alone = 0

or you can use the mpirun switch

   --mca mpi_paffinity_alone 1

Note that all of these different MPI libraries obtain hostnames from Torque (PBS) directly. You do no need to worry about $PBS_NODEFILE contents. Torque is configured without a ppn setting for our nodes, so ppn switches to qsub will not function. If you want to assign fewer MPI ranks than 32 cores per node, you can use "-npernode" switches on mvapich and openmpi mpirun commands, or supply an integer argument immediately after "mpirun" on mvapich2 mpirun commands.

7. Compiling code & launching jobs on the Dsg cluster (cluster head-node: ds1.fnal.gov)

Each dsg cluster worker node has a pair of quad-core Intel processors (8 cores total) and a pair of Nvidia M2050 GPUs. Each GPU has 3GB on-board memory but since ECC is enabled on all GPUs, only 2.6GB is available.

As with all of the Fermilab clusters your jobs on the Dsg cluster will get exclusive access to the entire dsg worker node and so your jobs will be billed two GPU hours for each dsg node hour.

CUDA 4.0.17 is installed at /usr/local/cuda-4.0.17 and /usr/local/cuda is a symbolic link to this directory. At runtime, you will need to set the environment variable

LD_LIBRARY_PATH=/usr/local/cuda/lib64

Our standard MPI's are MVAPICH, MVAPICH2 and OPENMPI, available at /usr/local/{mvapich,mvapich2,openmpi} respectively. These are all symbolic links; use ls -l to see the full path, which will include the version number. OPENMPI currently is 1.4.2 and if you have difficulties on the GPU nodes please try /usr/local/openmpi-1.4.4. .

The MPI libraries listed above were built for the Ds cluster worker nodes (32 cores per node) and so they will automatically launch 32 MPI ranks per GPU node when run on the dsg cluster. This is almost certainly not what you want and you can control MPI process launch as explained below.

For MVAPICH and MVAPICH2 for 2 ranks per GPU node, use

mpiexec -np xx -npernode 2 ... 

For OPENMPI, use

mpiexec -np xx -npernode 2 --byslot 

Note: MVAPICH and MVAPICH2 use the OSU mpiexec. mpirun is a symbolic link to mpiexec, so you can use either mpiexec or mpirun.

For example if you want to launch two MPI processes per dsg node which each process using half of the cores (4 out of 8) or one CPU socket and 1 GPU, you will execute mpirun as follows:

   source /usr/local/mvapich/etc/mvapich.conf.sh
   mpirun -np 4 /usr/local/mvapich/bin/numa_2_gpu binary args &
   mpirun -np 4 /usr/local/mvapich/bin/numa_2_gpu binary args &

The numa_2_gpu wrapper script will distribute the MPI ranks evenly across the two sockets such that all MPI processes that communicate with a particular GPU reside on the same CPU socket. The numa_2_gpu wrapper script can be omitted if you wish to control MPI process affinity (process to cpu core bindings) yourself.

Note that all of these different MPI versions obtain hostnames from Torque (PBS) directly. You do no need to worry about $PBS_NODEFILE contents. Torque is configured without a ppn setting for our nodes, so ppn switches to qsub won't function.

8. Compilers Available on the Ds cluster (cluster head-node: ds1.fnal.gov)

Compiler Name and Version Binary Location Binary Name(s)
Gnu Compilers Version 3.4.6 /usr/local/gcc-3.4.6/bin g++, gcc, g77
Gnu Compilers Version 4.1.2 /usr/bin g++, gcc, gfortran
Gnu Compilers Version 4.5.1 (*)
/usr/local/gcc-4.5.1/bin g++, gcc, gfortran
Gnu Compilers Version 4.6.2 (*)
/usr/local/gcc-4.6.2/bin g++, gcc, gfortran
MVAPICH Version 1.2
/usr/local/mvapich-1.2rc1/bin
mpicc, mpif77, mpif90
MVAPICH2 Version 1.5
/usr/local/mvapich2-1.5.1p1/bin
mpicc, mpif77, mpif90
Open MPI Version 1.4.2
/usr/local/openmpi-1.4.2/bin
mpicc, mpif77, mpif90

(*) If you get the "error while loading shared libraries" then use the following:

Since the GCC 4.5.1 libraries conflict with the deafult GCC 4.1.2, users are required to do the following to point to the correct shared libraries when invoking the 4.5.1 GCC compiler command:

$> export PATH=/usr/local/gcc-4.5.1/bin:$PATH
$> export LD_LIBRARY_PATH=/usr/local/gcc-4.5.1/lib:/usr/local/gcc-4.5.1/lib64:$LD_LIBRARY_PATH

You may also add these lines to your .bashrc file for permanence. For gcc-4.6.2 substitute 4.6.2 for 4.5.1 in the above export commands.

9. Launching jobs on the Ds cluster

The following job launch procedure applies when using MVAPICH.  Ds nodes are eight-core, quad processor, so each node will run up to 32 processes at one process per core.

Use /usr/local/mvapich/bin/mpirun as follows to launch jobs:

  mpirun - will automatically launch 32 processes on each Ds node.

On Ds, the nodes listed in $PBS_NODEFILE will each be used 32 times, and  will be assigned 32 consecutive MPI rank numbers.  For example:

  qsub -q ds -l nodes=4 -A myproject my.script     # request four nodes
  # then, in the run script:
  mpirun_rsh -np 128 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 Ds nodes, eg, ds0101 - ds0104, and on each of these nodes 32 instances of "milc.binary" will be run.

mpirun is actually mpiexec from OSC (see http://www.osc.edu/~djohnson/mpiexec/index.php) or use mpirun --help to see additional information.

The processors on Ds have non-uniform paths to system memory.  Your  jobs may run with higher performance if you explicitly bind processes to
NUMA nodes, and set a local memory allocation policy.  To do this, at the beginning of your job script, do
      source /usr/local/mvapich/etc/mvapich.conf.sh
                   or
      source /usr/local/mvapich/etc/mvapich.conf.csh
  and on your mpirun commands, invoke your binary via the wrapper
  numa_32_mv:

      mpirun -np xxx /usr/local/mvapich/bin/numa_32_mv your.binary 

10. Compilers Available on the jpsi cluster (cluster head-node: jpsi1.fnal.gov) (Cluster decomissioned)

Compiler Name and Version Binary Location Binary Name(s)
Gnu Compilers Version 3.4.6 /usr/local/gcc-3.4.6/bin g++, gcc, g77
Gnu Compilers Version 4.1.2 /usr/bin g++, gcc, g77
Gnu Compilers Version 4.5.1(*) /usr/local/gcc-4.5.1/bin g++, gcc, g77
Gnu Compilers Version 4.7.4(*) /usr/local/gcc-4.7.4/bin g++, gcc, g77
Gnu Compilers Version 4.8.4(*) /usr/local/gcc-4.8.4/bin g++, gcc, g77
Gnu Compilers Version 4.9.1(*) /usr/local/gcc-4.9.1/bin g++, gcc, g77
Gnu Compilers Version 5.1.0(*) /usr/local/gcc-5.1.0/bin g++, gcc, g77
MVAPICH Version 1.1.0
/usr/local/mvapich-1.1.0-ofed-1.4.2/bin
mpicc, mpif77, mpif90
MVAPICH2 Version 1.2
/usr/local/mvapich2-1.2-ofed-1.4.2/bin
mpicc, mpif77, mpif90
Open MPI Version 1.3.4
/usr/local/openmpi-1.3.4/bin
mpicc, mpif77, mpif90

(*) If you get the "error while loading shared libraries" then use the following:

Since the GCC 4.5.1 libraries conflict with the default GCC 4.1.2, users are required to do the following to point to the correct shared libraries when invoking the 4.5.1 GCC compiler command:

$> export PATH=/usr/local/gcc-4.5.1/bin:$PATH
$> export LD_LIBRARY_PATH=/usr/local/gcc-4.5.1/lib:/usr/local/gcc-4.5.1/lib64:$LD_LIBRARY_PATH

You may also add these lines to your .bashrc file for permanence. For gcc-4.6.2 substitute 4.6.2 for 4.5.1 in the above export commands.

(**) Use the following:
[user@]$> . /usr/local/etc/setups.sh       or       source /usr/local/etc/setups.csh
[user@]$> setup pgi

to add the Portland Group compiler and libraries to your environment and to access the license manager. If you'd prefer not to use "setup", be sure to set LM_LICENSE_FILE=/usr/local/pgi-904/linux86-64/9.0-4/license.dat

The binaries that are generated using the PGI compilers are not GNU compatible. Applications compiled with the Portland Group compiler must be statically linked with their respective object libraries since the dynamically linkable libraries are not available from cluster worker nodes.

11. Launching jobs on the jpsi cluster (Cluster decomissioned)

The following job launch procedure applies when using MVAPICH.  Jspi nodes are quad-core, dual processor, so each node will run up to eight processes.

Use /usr/local/mvapich/bin/mpirun_rsh* as follows to launch jobs:

  mpirun_rsh - will automatically launch 8 processes on each jpsi node.

On Jpsi, the nodes listed in $PBS_NODEFILE will each be used 8 times, and  will be assigned 8 consecutive MPI rank numbers.  For example:

  qsub -q jpsi -l nodes=4 my.script     # request four nodes
  # then, in the run script:
  mpirun_rsh -np 32 milc.binary input.file > output.file

Here, $PBS_NODEFILE will have 4 jpsi nodes, eg, jpsi0101-jpsi0104, and on each of these nodes eight instances of "milc.binary" will be run. There are other versions of mpirun_rsh that can launch 1, 2 or 4 processes as follows:
  •   mpirun_rsh_4 - This will launch 4 processes on each node,  with two processes assigned to each processor socket.
  •   mpirun_rsh_2 - This will launch 2 processes on each node,  with each process assigned to a different processor socket.
  •   mpirun_rsh_1 - This will launch 1 process on each node.

12. Compilers Available on the kaon cluster (cluster head-node: kaon1.fnal.gov) (Cluster decomissioned)

Compiler Name and Version Binary Location Binary Name(s)
Gnu Compilers Version 3.3.4 /usr/local/gcc-3.3.4/bin gcc
Gnu Compilers Version 3.4.6 /usr/bin g++, gcc, g77
Gnu Compilers Version 4.1.0 /usr/bin
g++4, gcc4
Gnu Compilers Version 4.3.2 /usr/local/gcc-4.3.2/bin g++, gcc, gfortran
Intel C/Fortran Compiler (**) /fnal/ups/prd/intel/v10_1_008/bin/ icc, ifort
Portland Group (+) /fnal/ups/prd/pgi/v7.1-2/linux86-64/7.1-2/bin
pgCC, pgcc, pgf77, pgf90
MVAPICH Version 1.1.0
/usr/local/mvapich-1.1.0-ofed-1.4.2/bin
mpicc, mpif77, mpif90
MVAPICH2 Version 1.2
/usr/local/mvapich2-1.2-ofed-1.4.2/bin
mpicc, mpif77, mpif90
Open MPI Version 1.3.4
/usr/local/openmpi-1.3.4/bin
mpicc, mpif77, mpif90


(**) Use the following:

[user@]$> . /usr/local/etc/setups.sh       or       source /usr/local/etc/setups.csh
[user@]$> setup intel v10_1_008

to add the Intel compiler and libraries to your environment.

(+) Use the following:

[user@]$> . /usr/local/etc/setups.sh       or       source /usr/local/etc/setups.csh
[user@]$> setup pgi

to add the Portland Group compiler and libraries to your environment and to access the license manager. If you'd prefer not to use "setup", be sure to set LM_LICENSE_FILE=/fnal/ups/prd/pgi/v7.1-2/license.dat.

The binaries that are generated using the PGI compilers are not GNU compatible. Applications compiled with the Portland Group and Intel compilers must be statically linked with their respective object libraries since the dynamically linkable libraries are not available from cluster worker nodes.

13. Launching jobs on the kaon cluster (Cluster decomissioned)

The following job launch procedure applies when using MVAPICH. Kaon nodes are dual-core, dual processor, so each node will run up to four processes.

  Use /usr/local/mvapich/bin/mpirun_rsh* as follows to launch jobs:

  mpirun_rsh - will detect will automatically launch 4 processes on each Kaon node

On Kaon, the nodes listed in $PBS_NODEFILE will each be used 4 times, and will be assigned 4 consecutive MPI rank numbers.  For example:

      qsub -q kaon -l nodes=4 my.script     # request four nodes
         # then, in the run script:
         mpirun_rsh -np 16 milc.binary input.file > output.file

   Here, $PBS_NODEFILE will have 4 kaon nodes, eg, kaon0101-kaon0104, and on each of these nodes four instances of "milc.binary" will be run.
  •   mpirun_rsh_2 - This will launch 2 processes on each node, with each process assigned to a different processor socket.

  •   mpirun_rsh_1 - This will launch 1 process on each node.
The following jobs launch procedure applies when using Open MPI. Like MVAPICH above, Open MPI's mpirun has been patched to automatically use 4 processes on each kaon worker node.  The syntax is:

       mpirun -np xxx binary < input > output

14. Compiling an MPI application

Two MPI versions are available on the Infiniband clusters - MVAPICH, which uses the Ohio State University MPI and Open MPI. We recommend using mvapich

To build codes, use

$ /usr/local/mvapich/bin/mpicc     # or mpiCC, mpif77, etc...  (Infiniband)
or
$ /usr/local/openmpi/bin/mpicc # or mpiCC, mpif77, etc... (Infiniband)

One of the MPICH versions as compiled with the Portland Group or Intel compiler.
To use a compiler, first check that its executable path is in your shell's PATH environment variable. Use the UNIX env command to see how PATH is currently set. If the desired compiler is not in your PATH, edit your .bashrc (for bash or sh users) or .cshrc (for tcsh or csh users) to add the compiler's path. For example, to add mpiCC, mpicc and mpif77 to your path, add the following line to your shell's start-up script:

.bashrc .cshrc
export PATH=$PATH:/usr/local/mvapich/bin setenv PATH $PATH:/usr/local/mvapich/bin

Here is how to compile example program cpi.c which comes with the mvapich software distribution.

$ cp /usr/local/mvapich/example/cpi.c .      # copy source to your directory
$ mpicc cpi.c -o cpi # compile
usqcd-webmaster@usqcd.org