JobManagement » History » Version 7
Paweł Widera, 11/28/2008 08:43 PM
Maui grid status commands added
| 1 | 7 | Paweł Widera | |
|---|---|---|---|
| 2 | 7 | Paweł Widera | h1. Job Management |
| 3 | 7 | Paweł Widera | |
| 4 | 1 | Anonymous | The queuing system (resource manager) is the heart of the distributed computing on a cluster. It consists of three parts: the server, the scheduler, and the machine-oriented mini-server (MOM) executing the jobs. |
| 5 | 1 | Anonymous | |
| 6 | 1 | Anonymous | We are assuming the following configuration: |
| 7 | 1 | Anonymous | |
| 8 | 3 | Anonymous | ||PBS TORQUE|| version 2.1.8 ||server, basic scheduler, mom ||[source:Externals/Cluster/torque-2.1.8.tgz download from repository] |
| 9 | 3 | Anonymous | ||MAUI || version 3.2.6.p18 ||scheduler ||[source:Externals/Cluster/maui-3.2.6p18.tgz download from repository] |
| 10 | 1 | Anonymous | |
| 11 | 1 | Anonymous | |
| 12 | 1 | Anonymous | |
| 13 | 3 | Anonymous | Please check the distributors website's for newer versions: |
| 14 | 3 | Anonymous | |
| 15 | 1 | Anonymous | ||PBS TORQUE ||http://www.clusterresources.com/pages/products/torque-resource-manager.php |
| 16 | 1 | Anonymous | ||MAUI ||http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php |
| 17 | 1 | Anonymous | |
| 18 | 1 | Anonymous | |
| 19 | 7 | Paweł Widera | The install directories for _TORQUE_ and _MAUI_ will be: |
| 20 | 1 | Anonymous | |
| 21 | 7 | Paweł Widera | ||PBS TORQUE ||_/var/spool/torque_ |
| 22 | 7 | Paweł Widera | ||MAUI ||_/var/spool/maui_ |
| 23 | 1 | Anonymous | |
| 24 | 1 | Anonymous | |
| 25 | 1 | Anonymous | |
| 26 | 1 | Anonymous | |
| 27 | 7 | Paweł Widera | h2. TORQUE |
| 28 | 7 | Paweł Widera | |
| 29 | 7 | Paweł Widera | |
| 30 | 7 | Paweł Widera | |
| 31 | 7 | Paweł Widera | h3. Register new services |
| 32 | 7 | Paweł Widera | |
| 33 | 7 | Paweł Widera | Edit _/etc/services_ and add at the end: |
| 34 | 7 | Paweł Widera | <pre> |
| 35 | 1 | Anonymous | # PBS/Torque services |
| 36 | 1 | Anonymous | pbs 15001/tcp # pbs_server |
| 37 | 1 | Anonymous | pbs 15001/udp # pbs_server |
| 38 | 1 | Anonymous | pbs_mom 15002/tcp # pbs_mom <-> pbs_server |
| 39 | 1 | Anonymous | pbs_mom 15002/udp # pbs_mom <-> pbs_server |
| 40 | 1 | Anonymous | pbs_resmom 15003/tcp # pbs_mom resource management |
| 41 | 1 | Anonymous | pbs_resmom 15003/udp # pbs_mom resource management |
| 42 | 1 | Anonymous | pbs_sched 15004/tcp # pbs scheduler (pbs_sched) |
| 43 | 1 | Anonymous | pbs_sched 15004/udp # pbs scheduler (pbs_sched) |
| 44 | 7 | Paweł Widera | </pre> |
| 45 | 1 | Anonymous | |
| 46 | 1 | Anonymous | |
| 47 | 7 | Paweł Widera | |
| 48 | 7 | Paweł Widera | h3. Setup and Configuration on the Master Node |
| 49 | 7 | Paweł Widera | |
| 50 | 1 | Anonymous | Extract and build the distribution TORQUE on the master node. Configure server, monitor and clients to use secure file transfer (scp). |
| 51 | 7 | Paweł Widera | <pre> |
| 52 | 1 | Anonymous | export TORQUECFG=/var/spool/torque |
| 53 | 1 | Anonymous | tar -xzvf TORQUE.tar.gz |
| 54 | 1 | Anonymous | cd TORQUE |
| 55 | 7 | Paweł Widera | </pre> |
| 56 | 1 | Anonymous | |
| 57 | 1 | Anonymous | Configuration for a 64bit machine with the following compiler options: |
| 58 | 7 | Paweł Widera | <pre> |
| 59 | 1 | Anonymous | FFLAGS = "-m64 -march=[Add Architecture] -O3 -fPIC" |
| 60 | 1 | Anonymous | CFLAGS = "-m64 -march=[Add Architecture] -O3 -fPIC" |
| 61 | 1 | Anonymous | CXXFLAGS = "-m64 -march=[Add Architecture] -O3 -fPIC" |
| 62 | 1 | Anonymous | LDFLAGS = "-L/usr/local/lib -L/usr/local/lib64" |
| 63 | 7 | Paweł Widera | </pre> |
| 64 | 7 | Paweł Widera | *Attention*: For Intel Xenon processors use _-march=nocona_, for AMD Opteron processors use _-march=opteron_. |
| 65 | 1 | Anonymous | |
| 66 | 1 | Anonymous | Configure, build, and install: |
| 67 | 7 | Paweł Widera | <pre> |
| 68 | 1 | Anonymous | ./configure --prefix=/usr/local --with-spooldir=$TORQUECFG |
| 69 | 1 | Anonymous | make |
| 70 | 1 | Anonymous | make install |
| 71 | 7 | Paweł Widera | </pre> |
| 72 | 7 | Paweł Widera | If not configures otherwise, binaries are installed in _/usr/local/bin_ and _/usr/local/sbin_. |
| 73 | 1 | Anonymous | |
| 74 | 7 | Paweł Widera | Initialise/configure the queuing system's server daemon (_pbs_server_): |
| 75 | 7 | Paweł Widera | <pre> |
| 76 | 1 | Anonymous | pbs_server -t create |
| 77 | 7 | Paweł Widera | </pre> |
| 78 | 1 | Anonymous | |
| 79 | 1 | Anonymous | Set the PBS operator and manager (must be a valid user name). |
| 80 | 7 | Paweł Widera | <pre> |
| 81 | 1 | Anonymous | qmgr |
| 82 | 1 | Anonymous | > set server_name = master01.procksi.local |
| 83 | 1 | Anonymous | > set server scheduling = true |
| 84 | 1 | Anonymous | > set server operators = "root@master01.procksi.local,procksi@master01.procksi.local" |
| 85 | 1 | Anonymous | > set server managers = "root@master01.procksi.local,procksi@master01.procksi.local" |
| 86 | 7 | Paweł Widera | </pre> |
| 87 | 1 | Anonymous | |
| 88 | 7 | Paweł Widera | Allow only _procksi_ and _root_ to submit jobs into the queue: |
| 89 | 7 | Paweł Widera | <pre> |
| 90 | 1 | Anonymous | > set server acl_users = "root,procksi" |
| 91 | 2 | Anonymous | > set server acl_user_enable = true |
| 92 | 7 | Paweł Widera | </pre> |
| 93 | 1 | Anonymous | |
| 94 | 1 | Anonymous | Set email address for email that is sent by PBS: |
| 95 | 7 | Paweł Widera | <pre> |
| 96 | 1 | Anonymous | > set server mail_from = pbs@procksi.net |
| 97 | 7 | Paweł Widera | </pre> |
| 98 | 1 | Anonymous | |
| 99 | 1 | Anonymous | Allow submissions from slave hosts (only): |
| 100 | 7 | Paweł Widera | *ATTENTION: NEEDS TO BE CHECKED. DOES NOT WORK PROPERLY YET!! * |
| 101 | 7 | Paweł Widera | <pre> |
| 102 | 7 | Paweł Widera | <pre> |
| 103 | 1 | Anonymous | > set server allow_node_submit = true |
| 104 | 1 | Anonymous | > set server submit_hosts = master01.procksi.local |
| 105 | 1 | Anonymous | slave01.procksi.local |
| 106 | 1 | Anonymous | slave02.procksi.local |
| 107 | 1 | Anonymous | slave03.procksi.local |
| 108 | 1 | Anonymous | slave04.procksi.local |
| 109 | 7 | Paweł Widera | </pre> |
| 110 | 1 | Anonymous | |
| 111 | 1 | Anonymous | |
| 112 | 1 | Anonymous | Restrict nodes that can access the PBS server: |
| 113 | 7 | Paweł Widera | <pre> |
| 114 | 1 | Anonymous | > set server acl_hosts = master01.procksi.local |
| 115 | 2 | Anonymous | slave01.procksi.local |
| 116 | 1 | Anonymous | slave02.procksi.local |
| 117 | 1 | Anonymous | slave03.procksi.local |
| 118 | 1 | Anonymous | slave04.procksi.local |
| 119 | 1 | Anonymous | > set acl_host_enable = true |
| 120 | 7 | Paweł Widera | </pre> |
| 121 | 1 | Anonymous | |
| 122 | 7 | Paweł Widera | And set in _torque.cfg_ in order |
| 123 | 1 | Anonymous | to use the internal interface: |
| 124 | 7 | Paweł Widera | <pre> |
| 125 | 1 | Anonymous | SERVERHOST master01.procksi.local |
| 126 | 1 | Anonymous | ALLOWCOMPUTEHOSTSUBMIT true |
| 127 | 7 | Paweł Widera | </pre> |
| 128 | 7 | Paweł Widera | </pre> |
| 129 | 1 | Anonymous | |
| 130 | 1 | Anonymous | Configure default node to be used (see below): |
| 131 | 7 | Paweł Widera | <pre> |
| 132 | 1 | Anonymous | > set server default_node = slave |
| 133 | 7 | Paweł Widera | </pre> |
| 134 | 1 | Anonymous | |
| 135 | 1 | Anonymous | |
| 136 | 7 | Paweł Widera | Set the default queue to _batch_ |
| 137 | 7 | Paweł Widera | <pre> |
| 138 | 1 | Anonymous | > set server default_queue=batch |
| 139 | 7 | Paweł Widera | </pre> |
| 140 | 1 | Anonymous | |
| 141 | 7 | Paweł Widera | Configure the main queue _batch_: |
| 142 | 7 | Paweł Widera | <pre> |
| 143 | 1 | Anonymous | > create queue batch queue_type=execution |
| 144 | 1 | Anonymous | > set queue batch started=true |
| 145 | 1 | Anonymous | > set queue batch enabled=true |
| 146 | 1 | Anonymous | > set queue batch resources_default.nodes=1 |
| 147 | 7 | Paweł Widera | </pre> |
| 148 | 1 | Anonymous | |
| 149 | 7 | Paweł Widera | Configure queue _test _accordingly_. |
| 150 | 1 | Anonymous | |
| 151 | 7 | Paweł Widera | Specify all compute nodes to be used by creating/editing _$TORQUECFG/server_priv/nodes._ This may include the same machine where pbs_server will run. If the compute nodes have more than one processor, just add np=X after the name with X being the number of processors. Add node attributes so that a subset of nodes can be requested during the submission stage. |
| 152 | 7 | Paweł Widera | <pre> |
| 153 | 1 | Anonymous | master01.procksi.local np=2 procksi master xeon |
| 154 | 1 | Anonymous | slave01.procksi.local np=2 procksi slave xeon |
| 155 | 1 | Anonymous | slave02.procksi.local np=2 procksi slave xeon |
| 156 | 1 | Anonymous | slave03.procksi.local np=4 procksi slave opteron |
| 157 | 1 | Anonymous | slave04.procksi.local np=4 procksi slave opteron |
| 158 | 7 | Paweł Widera | </pre> |
| 159 | 1 | Anonymous | |
| 160 | 7 | Paweł Widera | Although the master node (_master01_) has two processors as well, we only allow one processor to be used for the queueing system as the other processor will be used for handling all frontend communication and I/O. (Make sure that hyperthreading technology is disabled on the head node and all compute nodes!) |
| 161 | 1 | Anonymous | |
| 162 | 1 | Anonymous | Request job to be run on specific nodes (on submission): |
| 163 | 7 | Paweł Widera | * Run on any compute node: |
| 164 | 7 | Paweł Widera | <pre> |
| 165 | 1 | Anonymous | qsub -q batch -l nodes=1:procksi |
| 166 | 7 | Paweł Widera | </pre> |
| 167 | 7 | Paweł Widera | * Run on any slave node: |
| 168 | 7 | Paweł Widera | <pre> |
| 169 | 1 | Anonymous | qsub -q batch -l nodes=1:slave |
| 170 | 7 | Paweł Widera | </pre> |
| 171 | 7 | Paweł Widera | * Run on master node: |
| 172 | 7 | Paweł Widera | <pre> |
| 173 | 1 | Anonymous | qsub -q batch -l nodes=1:master |
| 174 | 7 | Paweł Widera | </pre> |
| 175 | 1 | Anonymous | |
| 176 | 1 | Anonymous | |
| 177 | 1 | Anonymous | |
| 178 | 1 | Anonymous | |
| 179 | 7 | Paweł Widera | |
| 180 | 7 | Paweł Widera | h3. Setup and Configuration on the Slave Nodes |
| 181 | 7 | Paweł Widera | |
| 182 | 1 | Anonymous | Extract and build the distribution TORQUE on each slave node. Configure monitor and clients to use secure file transfer (scp). |
| 183 | 7 | Paweł Widera | <pre> |
| 184 | 1 | Anonymous | export TORQUECFG=/var/spool/torque |
| 185 | 1 | Anonymous | tar -xzvf TORQUE.tar.gz |
| 186 | 1 | Anonymous | cd TORQUE |
| 187 | 7 | Paweł Widera | </pre> |
| 188 | 1 | Anonymous | |
| 189 | 1 | Anonymous | Configuration for a 64bit machine with the following compiler options: |
| 190 | 7 | Paweł Widera | <pre> |
| 191 | 1 | Anonymous | FFLAGS = "-m64 -march=[Add Architecture] -O3 -fPIC" |
| 192 | 1 | Anonymous | CFLAGS = "-m64 -march=[Add Architecture] -O3 -fPIC" |
| 193 | 1 | Anonymous | CXXFLAGS = "-m64 -march=[Add Architecture] -O3 -fPIC" |
| 194 | 1 | Anonymous | LDFLAGS = "-L/usr/local/lib -L/usr/local/lib64" |
| 195 | 7 | Paweł Widera | </pre> |
| 196 | 7 | Paweł Widera | Attention: For Intel Xenon processors use _-march=nocona_, for AMD Opteron processors use _-march=opteron_. |
| 197 | 1 | Anonymous | |
| 198 | 1 | Anonymous | Configure, build, and install: |
| 199 | 7 | Paweł Widera | <pre> |
| 200 | 1 | Anonymous | ./configure --prefix=/usr/local --with-spooldir=$TORQUECFG --disable-server --enable-mom --enable-clients --with-default-server=master01.procksi.local |
| 201 | 1 | Anonymous | make |
| 202 | 1 | Anonymous | make install |
| 203 | 7 | Paweł Widera | </pre> |
| 204 | 1 | Anonymous | |
| 205 | 7 | Paweł Widera | Configure the compute nodes by creating/editing _$TORQUECFG/mom_priv/config_. The first line specifies the PBS server, the second line specifies hosts which can be trusted to access mom services as non-root, and the last line allows copying data via NFS without using SCP. |
| 206 | 7 | Paweł Widera | <pre> |
| 207 | 1 | Anonymous | $pbsserver master01.procksi.local |
| 208 | 1 | Anonymous | $loglevel 255 |
| 209 | 1 | Anonymous | $restricted master01.procksi.local |
| 210 | 1 | Anonymous | $usecp master01.procksi.local:/home/procksi /home/procksi |
| 211 | 7 | Paweł Widera | </pre> |
| 212 | 1 | Anonymous | |
| 213 | 1 | Anonymous | Start the queueing system (manually) in the correct order: |
| 214 | 7 | Paweł Widera | * Start the mom: |
| 215 | 7 | Paweł Widera | <pre> |
| 216 | 1 | Anonymous | /usr/local/sbin/pbs_mom |
| 217 | 7 | Paweł Widera | </pre> |
| 218 | 7 | Paweł Widera | * Kill the server: |
| 219 | 7 | Paweł Widera | <pre> |
| 220 | 1 | Anonymous | /usr/local/sbin/qterm -t quick |
| 221 | 7 | Paweł Widera | </pre> |
| 222 | 7 | Paweł Widera | * Start the server: |
| 223 | 7 | Paweł Widera | <pre> |
| 224 | 1 | Anonymous | /usr/local/sbin/pbs_server |
| 225 | 7 | Paweł Widera | </pre> |
| 226 | 7 | Paweł Widera | * Start the scheduler: |
| 227 | 7 | Paweł Widera | <pre> |
| 228 | 1 | Anonymous | /usr/local/sbin/pbs_sched |
| 229 | 7 | Paweł Widera | </pre> |
| 230 | 1 | Anonymous | |
| 231 | 7 | Paweł Widera | If you want to use MAUI as the final scheduler, keep in mind to kill _pbs_sched_ after testing the TORQURE installation. |
| 232 | 1 | Anonymous | |
| 233 | 1 | Anonymous | |
| 234 | 1 | Anonymous | Check that all nodes are properly configured and correctly reporting |
| 235 | 7 | Paweł Widera | <pre> |
| 236 | 1 | Anonymous | qstat -q |
| 237 | 1 | Anonymous | pbsnodes -a |
| 238 | 7 | Paweł Widera | </pre> |
| 239 | 1 | Anonymous | |
| 240 | 1 | Anonymous | |
| 241 | 7 | Paweł Widera | |
| 242 | 7 | Paweł Widera | h3. Prologue and Epilogue Scripts |
| 243 | 7 | Paweł Widera | |
| 244 | 1 | Anonymous | Get [repos:Externals/procksi_pbs.tgz] from the repository and untar it: |
| 245 | 7 | Paweł Widera | <pre> |
| 246 | 1 | Anonymous | untar –xvzf procksi_pbs.tgz |
| 247 | 7 | Paweł Widera | </pre> |
| 248 | 1 | Anonymous | |
| 249 | 7 | Paweł Widera | The _prologue_ script is executed just before the submitted job starts. Here, it generates a unique temp directory for each job in _/scratch_. |
| 250 | 1 | Anonymous | It must be installed on each NODE (master, slave): |
| 251 | 7 | Paweł Widera | <pre> |
| 252 | 1 | Anonymous | cp ./pbs/NODE/var/spool/torque/mom/priv/prologue $TORQUECFG/mom_priv |
| 253 | 1 | Anonymous | chmod 500 $TORQUECFG/mom_priv/prologue |
| 254 | 7 | Paweł Widera | </pre> |
| 255 | 1 | Anonymous | |
| 256 | 7 | Paweł Widera | The _epilogue_ script is executed right after the submitted job has ended. Here, it deletes the job's temp directory from _/scratch._ It must be installed on each NODE (master, slave) |
| 257 | 7 | Paweł Widera | <pre> |
| 258 | 1 | Anonymous | cp ./pbs/NODE/var/spool/torque/mom/priv/epilogue $TORQUECFG/mom_priv |
| 259 | 1 | Anonymous | chmod 500 $TORQUECFG/mom_priv/epilogue |
| 260 | 7 | Paweł Widera | </pre> |
| 261 | 1 | Anonymous | |
| 262 | 1 | Anonymous | |
| 263 | 1 | Anonymous | |
| 264 | 7 | Paweł Widera | h2. MAUI |
| 265 | 7 | Paweł Widera | |
| 266 | 7 | Paweł Widera | |
| 267 | 7 | Paweł Widera | |
| 268 | 7 | Paweł Widera | h3. Register new services |
| 269 | 7 | Paweł Widera | |
| 270 | 7 | Paweł Widera | Edit _/etc/services_ and add at the end: |
| 271 | 7 | Paweł Widera | <pre> |
| 272 | 1 | Anonymous | # PBS/MAUI services |
| 273 | 1 | Anonymous | pbs_maui 42559/tcp # pbs scheduler (maui) |
| 274 | 1 | Anonymous | pbs_maui 42559/udp # pbs scheduler (maui) |
| 275 | 7 | Paweł Widera | </pre> |
| 276 | 1 | Anonymous | |
| 277 | 1 | Anonymous | |
| 278 | 7 | Paweł Widera | |
| 279 | 7 | Paweł Widera | h3. Setup and Configuration on the Head Node |
| 280 | 7 | Paweł Widera | |
| 281 | 1 | Anonymous | Extract and build the distribution MAUI. |
| 282 | 7 | Paweł Widera | <pre> |
| 283 | 1 | Anonymous | export MAUIDIR=/var/spool/maui |
| 284 | 1 | Anonymous | tar -xzvf MAUI.tar.gz |
| 285 | 1 | Anonymous | cd TORQUE |
| 286 | 7 | Paweł Widera | </pre> |
| 287 | 1 | Anonymous | |
| 288 | 1 | Anonymous | Configuration for a 64bit machine with the following compiler options: |
| 289 | 7 | Paweł Widera | <pre> |
| 290 | 1 | Anonymous | FFLAGS = “-m64 -march=[Add Architecture] -O3 -fPIC" |
| 291 | 1 | Anonymous | CFLAGS = “-m64 -march=[Add Architecture] -O3 -fPIC" |
| 292 | 1 | Anonymous | CXXFLAGS = “-m64 -march=[Add Architecture] -O3 -fPIC" |
| 293 | 1 | Anonymous | LDFLAGS = “-L/usr/local/lib -L/usr/local/lib64" |
| 294 | 7 | Paweł Widera | </pre> |
| 295 | 7 | Paweł Widera | *Attention*: For Intel Xenon processors use _-march=nocona_, for AMD Opteron processors use _-march=opteron_. |
| 296 | 1 | Anonymous | |
| 297 | 1 | Anonymous | Configure, build, and install: |
| 298 | 7 | Paweł Widera | <pre> |
| 299 | 1 | Anonymous | ./configure --with-pbs=$TORQUECFG --with-spooldir=$MAUIDIR |
| 300 | 5 | Paweł Widera | make |
| 301 | 5 | Paweł Widera | make install |
| 302 | 7 | Paweł Widera | </pre> |
| 303 | 5 | Paweł Widera | |
| 304 | 7 | Paweł Widera | Fine-tune MAUI in $_MAUIDIR/maui.cfg_: |
| 305 | 7 | Paweł Widera | <pre> |
| 306 | 1 | Anonymous | SERVERHOST master01.procksi.local |
| 307 | 5 | Paweł Widera | |
| 308 | 1 | Anonymous | # primary admin must be first in list |
| 309 | 1 | Anonymous | ADMIN1 procksi |
| 310 | 1 | Anonymous | ADMIN1 root |
| 311 | 1 | Anonymous | |
| 312 | 1 | Anonymous | # Resource Manager Definition |
| 313 | 1 | Anonymous | RMCFG[MASTER01.PROCKSI.LOCAL] |
| 314 | 1 | Anonymous | ] |
| 315 | 1 | Anonymous | TYPE=PBS@RMNHOST@ |
| 316 | 1 | Anonymous | PORT=15001 |
| 317 | 1 | Anonymous | EPORT=15004 [CAN BE ALTERNATIVELY: 15017 - TRY!!!] |
| 318 | 1 | Anonymous | |
| 319 | 3 | Anonymous | SERVERPORT 42559 |
| 320 | 1 | Anonymous | SERVERMODE NORMAL |
| 321 | 3 | Anonymous | |
| 322 | 1 | Anonymous | # Node Allocation: |
| 323 | 1 | Anonymous | # JOBCOUNT number of jobs currently running on node |
| 324 | 3 | Anonymous | # LOAD current 1 minute load average |
| 325 | 3 | Anonymous | # AMEM real memory currently available to batch jobs |
| 326 | 3 | Anonymous | # APROCS processors currently available to batch jobs |
| 327 | 3 | Anonymous | # PREF node meets job specific resource preferences |
| 328 | 3 | Anonymous | |
| 329 | 3 | Anonymous | NODEALLOCATIONPOLICY PRIORITY |
| 330 | 3 | Anonymous | NODECFG[DEFAULT] PRIORITYF='-JOBCOUNT - 2*LOAD + 0.5*AMEM + 0.25*APROCS + PREF' |
| 331 | 7 | Paweł Widera | </pre> |
| 332 | 3 | Anonymous | |
| 333 | 3 | Anonymous | |
| 334 | 1 | Anonymous | Start the MAUI scheduler manually. Make sure that pbs_sched is not running any longer. |
| 335 | 3 | Anonymous | |
| 336 | 7 | Paweł Widera | * Start the scheduler: |
| 337 | 7 | Paweł Widera | <pre> |
| 338 | 1 | Anonymous | /usr/local/sbin/maui |
| 339 | 7 | Paweł Widera | </pre> |
| 340 | 1 | Anonymous | |
| 341 | 1 | Anonymous | |
| 342 | 1 | Anonymous | Get [repos:Externals/Cluster/procksi_pbs.tgz] from the repository and untar it: |
| 343 | 7 | Paweł Widera | <pre> |
| 344 | 1 | Anonymous | untar –xvzf procksi_pbs.tgz |
| 345 | 7 | Paweł Widera | </pre> |
| 346 | 3 | Anonymous | |
| 347 | 3 | Anonymous | Make the entire queuing system (Torque + Maui) start at bootup: |
| 348 | 7 | Paweł Widera | <pre> |
| 349 | 1 | Anonymous | cp ./pbs/master/etc/init.d/pbs_* /etc/init.d/ |
| 350 | 6 | Paweł Widera | /sbin/chkconfig --add pbs_mom |
| 351 | 6 | Paweł Widera | /sbin/chkconfig --add pbs_maui |
| 352 | 6 | Paweł Widera | /sbin/chkconfig --add pbs_server |
| 353 | 6 | Paweł Widera | /sbin/chkconfig pbs_mom on |
| 354 | 6 | Paweł Widera | /sbin/chkconfig pbs_maui on |
| 355 | 6 | Paweł Widera | /sbin/chkconfig pbs_server on |
| 356 | 7 | Paweł Widera | </pre> |
| 357 | 6 | Paweł Widera | |
| 358 | 7 | Paweł Widera | If you want to use the simple scheduler that comes with PBS Torque, then substitute _pbs_maui_ with _pbs_sched_. |
| 359 | 6 | Paweł Widera | |
| 360 | 6 | Paweł Widera | |
| 361 | 7 | Paweł Widera | |
| 362 | 7 | Paweł Widera | h3. Setup and Configuration on the Slave Nodes |
| 363 | 7 | Paweł Widera | |
| 364 | 6 | Paweł Widera | Get [repos:Externals/Cluster/procksi_pbs.tgz] from the repository and untar it: |
| 365 | 7 | Paweł Widera | <pre> |
| 366 | 6 | Paweł Widera | untar –xvzf procksi_pbs.tgz |
| 367 | 7 | Paweł Widera | </pre> |
| 368 | 6 | Paweł Widera | |
| 369 | 1 | Anonymous | Make the entire queuing system start at bootup: |
| 370 | 7 | Paweł Widera | <pre> |
| 371 | 1 | Anonymous | cp ./pbs/slave/etc/init.d/pbs_mom /etc/init.d/ |
| 372 | 1 | Anonymous | /sbin/chkconfig --add pbs_mom |
| 373 | 1 | Anonymous | /sbin/chkconfig pbs_mom on |
| 374 | 7 | Paweł Widera | </pre> |
| 375 | 1 | Anonymous | |
| 376 | 1 | Anonymous | |
| 377 | 7 | Paweł Widera | h3. Monitoring Grid Status |
| 378 | 7 | Paweł Widera | |
| 379 | 7 | Paweł Widera | |
| 380 | 7 | Paweł Widera | * display queue information (active/idle jobs) |
| 381 | 7 | Paweł Widera | <pre> |
| 382 | 1 | Anonymous | showq |
| 383 | 7 | Paweł Widera | </pre> |
| 384 | 7 | Paweł Widera | * current and historical scheduling statistics |
| 385 | 7 | Paweł Widera | <pre> |
| 386 | 1 | Anonymous | showstats -v |
| 387 | 7 | Paweł Widera | </pre> |
| 388 | 7 | Paweł Widera | * display job state and resources information |
| 389 | 7 | Paweł Widera | <pre> |
| 390 | 1 | Anonymous | checkjob <JOB_ID> |
| 391 | 7 | Paweł Widera | </pre> |
| 392 | 7 | Paweł Widera | * display node state and resources information |
| 393 | 7 | Paweł Widera | <pre> |
| 394 | 1 | Anonymous | checknode <NODE_NAME> |
| 395 | 7 | Paweł Widera | </pre> |