Manual execution of run_trace

You already what is run_trace. If do not, go to here.

The run_trace consists of three phases. The 1st phase is to set TRACE's level and control for gathering trace data. The 2nd is to execute the MILC application among several nodes. The final phase is to reset TRACE mode and to gather traced files from all nodes. When you see the script, you understand easily.

I hope that you do not throw run_trace on PBS, because TRACE needs its own kernel for its running and buffer. If your cluster supports suitable kernel and buffer size with PBS, you prepare a batch file and throw it to your PBS.

If you want to investigate the timing by TRACE on several nodes, I recommend to run the script manually. There are two ways, with PBS and without PBS.

With PBS

If you need to use some nodes on the cluster with PBS, you have to take off your necessary nodes on PBS. WARNING: Please do not make offline without permission.

% pbsnodes -o <target_hostname_on_PBS>
After check your target hosts are offline, you run the run_trace. In here, you can reboot the nodes by a diferrent kernel for increasing the buffer size.

Without PBS + With PBS

Before run the script, you MUST prepare a node file. A node file lists hostnames as follows:

---------- Node File(mynode) -------------
qcd0101
qcd0102
qcd0103
qcd0104
qcd0201
------------------------------------------
You decide the master host in a node file, and login the host. You execute the run_trace in the directory.

% rlogin <master_host>
% cd <your_directory>
% ./run_trace -n 5 -t Myrinet -s mynode ./bin/su3_rmd_symzk1_asqtad.ch_gm-trace

After running, you get some files whose suffix is "tr" in the directory. These are TRACE files.

Oh no. My explanation is not good. Please arreange it yourself... sorry and Thanks.

[back]

mats@fnal.gov