rosette128px1

Frequently Asked Questions

  1. What is the best way to contact the LQCD system administrators at Fermilab?
  2. Is there an email list to contact the other users?
  3. How do I access the archives of the mails sent to the email lists?
  4. How do I check my project allocation balance?
  5. Is there a command line utility to check the status (running, free, busy etc.) of cluster worker nodes?
  6. I am unable to delete my job using scancel?
  7. What storage space do I use?
  8. What are the batch queue limits?
  9. How do I change my Kerberos password?
  10. Is there a way to copy more than one file into Enstore tape using wild cards?
  11. Is there a quick way to copy a file from Enstore tape to my local disk?
  12. Can I use Globus Online to transfer data between Lustre?
  13. How do I verify that files have been successfully transferred to tape (Enstore)?
  14. I get a "no CUDA-capable device ..." error running on gpus?
Question: What is the best way to contact the LQCD system administrators at Fermilab?
Answer: The best way to contact the LQCD system administrators is by sending an email to lqcd-admin@fnal.gov.
 
Question: Is there an email list to contact the other users?
Answer: Yes, the email list lqcd-users@fnal.gov has been setup for that purpose but public posting of emails to this list is restricted.
 
 
Question: How do I access the archives of the mails sent to the email lists?
Answer: The archives of the lqcd-admin and lqcd-users email lists are available at:
 
 
https://listserv.fnal.gov/scripts/wa.exe?A0=LQCD-ADMIN
https://listserv.fnal.gov/scripts/wa.exe?A0=LQCD-USERS
 
Use "Search archives" search input menu on the right of each page to find the email thread you are looking for.
 
Question: How do I check my project allocation balance?
Answer: Refer to the section "Allocations usage" on the Allocations web page.
Question: Is there a command line utility to check the status (running, free, busy etc.) of cluster worker nodes?
Answer: Yes. For example on lq.fnal.gov, which is the SLURM submit node for the LQ1 cluster:
[@lq ~]$ /usr/bin/lqstat -h
 
Print the number of free, busy, down or offline
nodes in the Fermilab LQCD LQ1 cluster.
 
Usage: lqstat <PRINT-OPTION>
 
PRINT-OPTION free,busy,down,offline,online
 
** no option will print the number of free nodes
of all types. The type of nodes are
 
LQ1 (Intel 6248 dual-socket 20-core, Omni Path)
 

The same lqstat command provides similar outputs when executed on alternate SLURM submit nodes.

Question: I am unable to delete my job using scancel?
Answer: If you encounter this problem please contact the Fermilab LQCD system administrators by sending email to lqcd-admin@fnal.gov.
 
Question: What storage space do I use?
Answer: Please refer the filesystems web page for information on the various storage options.
 
Question: What are the batch queue limits?
Answer: Please refer to the following web page.
 
Question: How do I change my Kerberos password?
Answer: Please refer to the following web page.
Question: Is there a way to copy more than one file into Enstore tape using wild cards?
Answer: I'm trying to issue this command 

 
 
dccp -c -C 2000 filename.${i}* /pnfs/lqcd/myproject/subdir/
 
 
No. dccp doesn't allow wildcards. dccp works through a disk cache layer, meaning that files are initially copied to one of a large number of RAID disk arrays managed by the mass storage department, and then automatically migrated from there to tape. Generally the migrations occur very quickly, say within an hour.

 
There is a direct tape access command, encp, that also has the semantics of "cp" but which allows wildcards. The downside to encp is that your command obviously has to block until a tape drive is allocated, the tape is mounted, and finally the tape is positioned. dccp writes, on the other hand, commence immediately. On reads, dccp like encp will need to read from tape and will block, unless the file already is resident on one of the disk cache pools. It is possible to pre-stage files to the disk pools with a dccp command, so if you want to read in several hundred files you can issue a pre-stage request an hour or so before you need them, and then after the delay use dccp commands to read the actual files.
 
 
NOTE: Please use lqcdsrm.fnal.gov or lqio.fnal.gov as these data mover nodes have been configured with a 10 GbE and 100 GbE network connection respectively, which will allow tape streaming at full rate and would be best for encp commands.
 
 
Question: Is there a quick way to copy a file from Enstore tape to my local disk?
Answer: Your dccp command to copy a file from Enstore tape to your local disk will often seem stuck since it is waiting for the tape to be retrieved, mounted and read which can take several minutes to an hour or more. To avoid this, the command to use is "dccp -P" which prestages the request; for read requests only and returns immediately. But the "dccp -P" command will prestage the file from Enstore tape to a dCache "read pool" first, which is far quicker than copying the file directly from tape to your local disk. Once the file is on the dCache "read pool", you should execute another dccp command (you don't have to use the -P option this time) to copy the file from the dCache "read pool" to your local disk.
 
Use the "dccp -P -t -1" command to query if your file is on the "read pool" in dCache or not. Execute the "dccp -P" command with the "-t" option as follows: 
  
 
dccp -P -t xxxx source [destination]
 

where xxxx is the number of seconds before you'd like the file, 3600 is typical but if -t is not used, the default interval is zero and as explained above a value of -1 will return the status of your file in the dCache "read pool".

 
 
Question: Can I use Globus Online to transfer data between Lustre?
Answer:Yes you can use Globus Online to transfer data between Lustre. Please follow the simple steps listed here.
 
Question: How do I verify that files have been successfully transferred to tape (Enstore)?
Answer:The following is a command to check if files have been transferred successfully to tape (Enstore). In the example below the commands were run on lqcdsrm.fnal.gov.
 
[@lqcdsrm $>] en_check /pnfs/lqcd/test/testfile.tar ; echo $?
1
 
Results from en_check command (exit statuses):
 
0 file is on tape
1 file is not on tape
 
There are further details for files packaged with (Small File Aggregation) SFA so it is better to use en_check as it can be extended to do extra checks for SFA files as well.
 
Additional Documentation about Enstore commands:
 
USQCD: //fnal/storage.html
 
User's Guide : Transferring files via ENCP refer chapter 5
 
Question: I get a "no CUDA-capable device ..." error running on gpus?
Answer: If you get the following error make sure you are requesting GPU resources using the "--gres=gpu:4" option in your SLURM submit command(s).
 
FATAL[0] arch_cuda/generic.cu:91 CUDA error no CUDA-capable device is detected
Fermi National Accelerator Laboratory
Managed by Fermi Research Alliance, LLC
for the U.S. Department of Energy Office of Science
item5
Security, Privacy, Legal

 

 

 

 

peaceOpt2