ClusterMonitoring » History » Version 2

Anonymous, 09/17/2007 02:38 PM

1 1 Anonymous
= Cluster Monitoring =
2 1 Anonymous
The cluster resources and performance needs to be constantly monitored, and the users need to be tracked.
3 1 Anonymous
4 1 Anonymous
We assume the following configuration:
5 2 Anonymous
 ||!StorMan    || 2.12_B928 
6 1 Anonymous
 ||Ganglia     || 3.0.4
7 1 Anonymous
 ||!JobMonarch || 0.2
8 2 Anonymous
9 2 Anonymous
10 2 Anonymous
== !StorMan ==
11 2 Anonymous
!StorMan is the ''DELL RAID Storage Manger" (RSM) for RAID systems.
12 2 Anonymous
13 2 Anonymous
 * Download [repos:Externals/Cluster/RSM-2.12_B928_Linux.tgz] from the repository and unpack it. Enter at the command line of the master node:
14 2 Anonymous
  {{{
15 2 Anonymous
  untgz StorMan-2.12.i386.rpm
16 2 Anonymous
  }}}
17 2 Anonymous
18 2 Anonymous
 * Install the RSM ignoring the dependencies. Enter at the command line of the master node:
19 2 Anonymous
  {{{
20 2 Anonymous
  rpm -Uv --dodeps StorMan-2.12.i386.rpm
21 2 Anonymous
  }}}
22 2 Anonymous
23 2 Anonymous
 * Make sure that the RSM starts at boot time. Enter at the command line of the master node:
24 2 Anonymous
  {{{
25 2 Anonymous
  cd /usr/StoreMan
26 2 Anonymous
  cp ./stor_agent /etc/init.d/rsd
27 2 Anonymous
  /sbin/chkconfig --add rsd
28 2 Anonymous
  /sbin/chkconfig rsd on
29 2 Anonymous
  }}}
30 2 Anonymous
31 2 Anonymous
 * Start the RSM. Enter at the command line of the master node:
32 2 Anonymous
  {{{
33 2 Anonymous
  /sbin/service rsd start
34 2 Anonymous
  }}}
35 2 Anonymous
36 2 Anonymous
  
37 1 Anonymous
38 1 Anonymous
== Ganglia ==
39 1 Anonymous
''Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.''
40 1 Anonymous
41 1 Anonymous
 * Download the latest release of the ''Ganglia Monitoring Core'' from [http://ganglia.sourceforge.net/ http://ganglia.sourceforge.net].
42 1 Anonymous
 * Install Ganglia into ''/usr/local/ganglia'', its web frontend into ''/usr/local/ganglia/html/', and its databases into ''/usr/local/ganglia/rrds/'.
43 1 Anonymous
 * Install the ''Ganglia Monitoring Daemon'' (gmond) on each node, and the ''Ganglia Meta Daemon'' (gmetad) on the the head node.
44 1 Anonymous
45 1 Anonymous
=== Ganglia Monitoring Daemon ===
46 1 Anonymous
 * Configure, build and install Ganglia on each slave node (only with ''gmond''):
47 1 Anonymous
  {{{
48 1 Anonymous
  ./configure --prefix=/usr/local
49 1 Anonymous
  }}}
50 1 Anonymous
  and on the master node (with ''gmond'' and ''gmetad''):
51 1 Anonymous
  {{{
52 1 Anonymous
  ./configure --prefix=/usr/local --with-gmeta
53 1 Anonymous
  }}}
54 1 Anonymous
55 1 Anonymous
 * Initialise the configuration file for ''gmond'':
56 1 Anonymous
  {{{
57 1 Anonymous
  gmond --default >> /etc/gmond.conf
58 1 Anonymous
  }}}
59 1 Anonymous
60 1 Anonymous
 * Configure the ''Ganglia Monitoring Daemon'' in ''/etc/gmond.conf'':
61 1 Anonymous
  * Set the name of the cluster: 
62 1 Anonymous
  {{{
63 1 Anonymous
  cluster {
64 1 Anonymous
    name = "ProCKSI"
65 1 Anonymous
  }
66 1 Anonymous
  }}}
67 1 Anonymous
  * Set the IP address and port for multicast data exchange:
68 1 Anonymous
  {{{
69 1 Anonymous
  udp_send_channel {
70 1 Anonymous
    mcast_join = 239.2.11.71
71 1 Anonymous
    port = 8649
72 1 Anonymous
  }
73 1 Anonymous
  udp_recv_channel {
74 1 Anonymous
    mcast_join = 239.2.11.71
75 1 Anonymous
    port = 8649
76 1 Anonymous
    bind = 239.2.11.71
77 1 Anonymous
  }
78 1 Anonymous
  }}}
79 1 Anonymous
 
80 1 Anonymous
 * Copy start-up script for ''gmond'':
81 1 Anonymous
  {{{
82 1 Anonymous
  cp ./gmond/gmond.init /etc/init.d/gmond
83 1 Anonymous
  }}}
84 1 Anonymous
85 1 Anonymous
 * Add additional route for correct data exchange via multicast using the ''internal'' interface (''eth0''). Modify ''/etc/inid.d/gmond'':
86 1 Anonymous
  {{{
87 1 Anonymous
   #Add multicast route to internal interface
88 1 Anonymous
   /sbin/route add -host 239.2.11.71 dev eth0
89 1 Anonymous
   daemon $GMOND
90 1 Anonymous
  }}}
91 1 Anonymous
  {{{
92 1 Anonymous
   #Remove multicast route to internal interface
93 1 Anonymous
   /sbin/route delete -host 239.2.11.71 dev eth0
94 1 Anonymous
   killproc gmond
95 1 Anonymous
  }}}
96 1 Anonymous
 * Make the Ganglia Monitoring Daemon start at bootup.
97 1 Anonymous
  {{{
98 1 Anonymous
   /sbin/chkconfig  gmond  on
99 1 Anonymous
  }}}
100 1 Anonymous
 * Start the Ganglia Monitoring Daemon:
101 1 Anonymous
  {{{
102 1 Anonymous
   /sbin/service  gmond  start
103 1 Anonymous
  }}}
104 1 Anonymous
  
105 1 Anonymous
=== Ganglia Meta Daemon ===
106 1 Anonymous
 * Install and configure the ''Ganglia Meta Daeomn'' (gmetad) on the master node.
107 1 Anonymous
108 1 Anonymous
 * Make the Ganglia Meta Daemon start at bootup.
109 1 Anonymous
  {{{
110 1 Anonymous
   /sbin/chkconfig  --add gmetad
111 1 Anonymous
   /sbin/chkconfig  gmetad  on
112 1 Anonymous
  }}}
113 1 Anonymous
 * Start the Ganglia Meta Daemon:
114 1 Anonymous
  {{{
115 1 Anonymous
   /sbin/service  gmetad  start
116 1 Anonymous
  }}}
117 1 Anonymous
 
118 1 Anonymous
 * If the pie chart diagrams do not show up, you have to install the ''php-gd'' packages.
119 1 Anonymous
120 1 Anonymous
  
121 1 Anonymous
=== Further Customisation ===
122 1 Anonymous
In order to display more fine-grained time intervals, edit the following files in ''/usr/local/ganglia/html/'':
123 1 Anonymous
 * '''header.php'''
124 1 Anonymous
 {{{
125 1 Anonymous
  if (!$physical) {
126 1 Anonymous
   $context_ranges[]="10 minutes";
127 1 Anonymous
   $context_ranges[]="20 minutes";
128 1 Anonymous
   $context_ranges[]="30 minutes";
129 1 Anonymous
   $context_ranges[]="1 hour";
130 1 Anonymous
   $context_ranges[]="2 hours";
131 1 Anonymous
   $context_ranges[]="4 hours";
132 1 Anonymous
   $context_ranges[]="8 hours";
133 1 Anonymous
   $context_ranges[]="12 hours";
134 1 Anonymous
   $context_ranges[]="1 day";
135 1 Anonymous
   $context_ranges[]="2 days";
136 1 Anonymous
   $context_ranges[]="week";
137 1 Anonymous
   $context_ranges[]="month";
138 1 Anonymous
   $context_ranges[]="year";
139 1 Anonymous
 }}}
140 1 Anonymous
141 1 Anonymous
 * '''get_context.php'''
142 1 Anonymous
 {{{
143 1 Anonymous
  switch ($range) {
144 1 Anonymous
   case "10 minutes":   $start = -600; break;
145 1 Anonymous
   case "20 minutes":   $start = -1200; break;
146 1 Anonymous
   case "30 minutes":   $start = -1800; break;
147 1 Anonymous
   case "1 hour":       $start = -3600; break;
148 1 Anonymous
   case "2 hours":      $start = -7200; break;
149 1 Anonymous
   case "4 hours":      $start = -14400; break;
150 1 Anonymous
   case "8 hours":      $start = -28800; break;
151 1 Anonymous
   case "12 hours":     $start = -43200; break;
152 1 Anonymous
   case "1 day":        $start = -86400; break;
153 1 Anonymous
   case "2 days":       $start = -172800; break;
154 1 Anonymous
   case "week":         $start = -604800; break;
155 1 Anonymous
   case "month":        $start = -2419200; break;
156 1 Anonymous
   case "year":         $start = -31449600; break;
157 1 Anonymous
 }}}
158 1 Anonymous
159 1 Anonymous
160 1 Anonymous
== !JobMonarch ==
161 1 Anonymous
!JobMonarch is an add-on to Ganglia which provides PBS job monitoring through the web browser.
162 1 Anonymous
163 1 Anonymous
See [http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation] for information on requirements, configuration and installation.
164 1 Anonymous
165 1 Anonymous
'''Attention''': Does not work properly yet.
166 1 Anonymous
167 1 Anonymous
168 1 Anonymous
== Domain Usage Monitoring ==
169 1 Anonymous
All HTML documents must contain the following code in order to be tracked correctly.
170 1 Anonymous
171 1 Anonymous
 {{{
172 1 Anonymous
<!-- Site Meter -->
173 1 Anonymous
	<script type="text/javascript" src="http://s18.sitemeter.com/js/counter.js?site=s18procksi">
174 1 Anonymous
	</script>
175 1 Anonymous
	<noscript>
176 1 Anonymous
		<a href="http://s18.sitemeter.com/stats.asp?site=s18procksi" target="_top">
177 1 Anonymous
			<img	src=[http://s18.sitemeter.com/meter.asp?site=s18procksi http://s18.sitemeter.com/meter.asp?site=s18procksi]
178 1 Anonymous
    				alt="Site Meter" border="0"/>
179 1 Anonymous
		</a>
180 1 Anonymous
	</noscript>
181 1 Anonymous
182 1 Anonymous
<!-- Copyright (c)2006 Site Meter -->
183 1 Anonymous
 }}}