run_trace
You already what is run_trace. If do not, go to
here.
The run_trace consists of three phases.
The 1st phase is to set TRACE's level and control for gathering trace data.
The 2nd is to execute the MILC application among several nodes.
The final phase is to reset TRACE mode and to gather traced files
from all nodes. When you see the script, you understand easily.
I hope that you do not throw run_trace on PBS, because
TRACE needs its own kernel for its running and buffer. If your cluster
supports suitable kernel and buffer size with PBS, you prepare a batch
file and throw it to your PBS.
If you want to investigate the timing by TRACE on several nodes, I recommend to run the script manually. There are two ways, with PBS and without PBS.
With PBS
If you need to use some nodes on the cluster with PBS, you have to take off your necessary nodes on PBS. WARNING: Please do not make offline without permission.
After check your target hosts are offline, you run the% pbsnodes -o <target_hostname_on_PBS>
run_trace.
In here, you can reboot the nodes by a diferrent kernel for increasing the
buffer size.Without PBS + With PBS
Before run the script, you MUST prepare a node file. A node file lists hostnames as follows:
You decide the master host in a node file, and login the host. You execute the---------- Node File(mynode) ------------- qcd0101 qcd0102 qcd0103 qcd0104 qcd0201 ------------------------------------------
run_trace in the directory.
% rlogin <master_host> % cd <your_directory> % ./run_trace -n 5 -t Myrinet -s mynode ./bin/su3_rmd_symzk1_asqtad.ch_gm-trace
After running, you get some files whose suffix is "tr" in the directory.
These are TRACE files.
Oh no. My explanation is not good. Please arreange it yourself... sorry and Thanks.