ClusterMonitoring » History » Version 4

Anonymous, 01/08/2008 10:42 AM
More RAID Monotoring

1 1 Anonymous
= Cluster Monitoring =
2 1 Anonymous
The cluster resources and performance needs to be constantly monitored, and the users need to be tracked.
3 1 Anonymous
4 1 Anonymous
We assume the following configuration:
5 2 Anonymous
 ||!StorMan    || 2.12_B928 
6 1 Anonymous
 ||Ganglia     || 3.0.4
7 1 Anonymous
 ||!JobMonarch || 0.2
8 2 Anonymous
9 2 Anonymous
10 2 Anonymous
== !StorMan ==
11 3 Anonymous
!StorMan is the ''DELL RAID Storage Manger" (RSM) for RAID systems. The agent has ''/dev/aac0'' open and listens on ports 34571, 34572, 34573.
12 2 Anonymous
13 2 Anonymous
 * Download [repos:Externals/Cluster/RSM-2.12_B928_Linux.tgz] from the repository and unpack it. Enter at the command line of the master node:
14 2 Anonymous
  {{{
15 2 Anonymous
  untgz StorMan-2.12.i386.rpm
16 2 Anonymous
  }}}
17 2 Anonymous
18 2 Anonymous
 * Install the RSM ignoring the dependencies. Enter at the command line of the master node:
19 2 Anonymous
  {{{
20 2 Anonymous
  rpm -Uv --dodeps StorMan-2.12.i386.rpm
21 2 Anonymous
  }}}
22 2 Anonymous
23 2 Anonymous
 * Make sure that the RSM starts at boot time. Enter at the command line of the master node:
24 2 Anonymous
  {{{
25 2 Anonymous
  cd /usr/StoreMan
26 2 Anonymous
  cp ./stor_agent /etc/init.d/rsd
27 2 Anonymous
  /sbin/chkconfig --add rsd
28 2 Anonymous
  /sbin/chkconfig rsd on
29 2 Anonymous
  }}}
30 2 Anonymous
31 2 Anonymous
 * Start the RSM. Enter at the command line of the master node:
32 2 Anonymous
  {{{
33 2 Anonymous
  /sbin/service rsd start
34 2 Anonymous
  }}}
35 2 Anonymous
36 4 Anonymous
== Further RAID Monitoring ==
37 4 Anonymous
38 4 Anonymous
=== Command Line Interface (CLI) for Dell's RAID Controller ===
39 4 Anonymous
The manual can be found at [http://support.euro.dell.com/support/edocs/storage/CS6CH/en/ug/dell_ceg.htm]]
40 4 Anonymous
41 4 Anonymous
42 4 Anonymous
=== Comments by William Armitage ===
43 4 Anonymous
On procksi0 i have dropped the raid tools into /usr/local/afa
44 4 Anonymous
45 4 Anonymous
{{{
46 4 Anonymous
root@master01:/usr/local/afa# ls -l /usr/local/afa
47 4 Anonymous
total 3032
48 4 Anonymous
-rwxr--r-- 1 wja  procksi 1893976 Nov 29 11:51 afacli
49 4 Anonymous
-rw-r--r-- 1 root root          0 Nov 29 11:58 cfg.log
50 4 Anonymous
-rw-r--r-- 1 wja  procksi     572 Nov 29 11:51 getcfg.afa
51 4 Anonymous
-rw-r--r-- 1 root root        165 Dec 19 12:34 i2
52 4 Anonymous
-rw-r--r-- 1 root root       2050 Dec 19 12:36 i2.log
53 4 Anonymous
-rw-r--r-- 1 root root        159 Dec 11 18:11 i2.orig
54 4 Anonymous
-rw-r--r-- 1 root root        325 Dec 19 12:34 i3
55 4 Anonymous
-rw-r--r-- 1 root root    1153256 Dec 19 12:38 i3.log
56 4 Anonymous
-rw-r--r-- 1 root root         98 Nov 29 14:47 input
57 4 Anonymous
-rwxr--r-- 1 wja  procksi     595 Nov 29 11:51 MAKEDEV.afa
58 4 Anonymous
-rw-r--r-- 1 root root       1152 Nov 30 12:39 output
59 4 Anonymous
-rw-r--r-- 1 root root       1152 Nov 29 14:48 output.0
60 4 Anonymous
-rw-r--r-- 1 root root       1152 Nov 29 15:23 output.1
61 4 Anonymous
-rw-r--r-- 1 root root       1152 Nov 29 18:04 output.2
62 4 Anonymous
}}}
63 4 Anonymous
64 4 Anonymous
afacli is from http://linux.dell.com/storage.shtml under the section AACRAID > Management Utility > afa-apps-snmp.2807420-A04.tar.gz; untar and pull apart afaapps-4.1-0.i386.rpm.
65 4 Anonymous
66 4 Anonymous
[http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R85529&formatcnt=1&fileid=112003]
67 4 Anonymous
68 4 Anonymous
If you don't have /dev/afa0 create it by
69 4 Anonymous
{{{
70 4 Anonymous
  cd /dev
71 4 Anonymous
  /usr/local/afa/MAKEDEV.afa afa0
72 4 Anonymous
}}}
73 4 Anonymous
It disappeared after the reboot and needed recreating.
74 4 Anonymous
75 4 Anonymous
afacli is described as a bad port of a dos program.
76 4 Anonymous
while command line it does wierd things to the terminal so feed it scripts.
77 4 Anonymous
it does echo to the output as well as any logging set but it uses escape
78 4 Anonymous
codes that write to the alternate screen in colour xterm and then immediatly
79 4 Anonymous
switches back at end.
80 4 Anonymous
81 4 Anonymous
The "input" script comes from
82 4 Anonymous
[http://linux.dell.com/files/aacraid/nagios/check_raid_pl.txt][[br]]
83 4 Anonymous
{{{output log "output"}}}
84 4 Anonymous
85 4 Anonymous
the more detailed script "i2" comes from
86 4 Anonymous
[http://www.techno-obscura.com/~delgado/notes/sles9-NagiosAfacli.html][[br]]
87 4 Anonymous
{{{output i2.out}}}
88 4 Anonymous
89 4 Anonymous
"i3" dumps the controller logs. Its based on
90 4 Anonymous
[http://threebit.net/mail-archive/centos/msg02033.html][[br]]
91 4 Anonymous
{{{output i3.out}}}
92 2 Anonymous
  
93 1 Anonymous
94 1 Anonymous
== Ganglia ==
95 1 Anonymous
''Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.''
96 1 Anonymous
97 1 Anonymous
 * Download the latest release of the ''Ganglia Monitoring Core'' from [http://ganglia.sourceforge.net/ http://ganglia.sourceforge.net].
98 1 Anonymous
 * Install Ganglia into ''/usr/local/ganglia'', its web frontend into ''/usr/local/ganglia/html/', and its databases into ''/usr/local/ganglia/rrds/'.
99 1 Anonymous
 * Install the ''Ganglia Monitoring Daemon'' (gmond) on each node, and the ''Ganglia Meta Daemon'' (gmetad) on the the head node.
100 1 Anonymous
101 1 Anonymous
=== Ganglia Monitoring Daemon ===
102 1 Anonymous
 * Configure, build and install Ganglia on each slave node (only with ''gmond''):
103 1 Anonymous
  {{{
104 1 Anonymous
  ./configure --prefix=/usr/local
105 1 Anonymous
  }}}
106 1 Anonymous
  and on the master node (with ''gmond'' and ''gmetad''):
107 1 Anonymous
  {{{
108 1 Anonymous
  ./configure --prefix=/usr/local --with-gmeta
109 1 Anonymous
  }}}
110 1 Anonymous
111 1 Anonymous
 * Initialise the configuration file for ''gmond'':
112 1 Anonymous
  {{{
113 1 Anonymous
  gmond --default >> /etc/gmond.conf
114 1 Anonymous
  }}}
115 1 Anonymous
116 1 Anonymous
 * Configure the ''Ganglia Monitoring Daemon'' in ''/etc/gmond.conf'':
117 1 Anonymous
  * Set the name of the cluster: 
118 1 Anonymous
  {{{
119 1 Anonymous
  cluster {
120 1 Anonymous
    name = "ProCKSI"
121 1 Anonymous
  }
122 1 Anonymous
  }}}
123 1 Anonymous
  * Set the IP address and port for multicast data exchange:
124 1 Anonymous
  {{{
125 1 Anonymous
  udp_send_channel {
126 1 Anonymous
    mcast_join = 239.2.11.71
127 1 Anonymous
    port = 8649
128 1 Anonymous
  }
129 1 Anonymous
  udp_recv_channel {
130 1 Anonymous
    mcast_join = 239.2.11.71
131 1 Anonymous
    port = 8649
132 1 Anonymous
    bind = 239.2.11.71
133 1 Anonymous
  }
134 1 Anonymous
  }}}
135 1 Anonymous
 
136 1 Anonymous
 * Copy start-up script for ''gmond'':
137 1 Anonymous
  {{{
138 1 Anonymous
  cp ./gmond/gmond.init /etc/init.d/gmond
139 1 Anonymous
  }}}
140 1 Anonymous
141 1 Anonymous
 * Add additional route for correct data exchange via multicast using the ''internal'' interface (''eth0''). Modify ''/etc/inid.d/gmond'':
142 1 Anonymous
  {{{
143 1 Anonymous
   #Add multicast route to internal interface
144 1 Anonymous
   /sbin/route add -host 239.2.11.71 dev eth0
145 1 Anonymous
   daemon $GMOND
146 1 Anonymous
  }}}
147 1 Anonymous
  {{{
148 1 Anonymous
   #Remove multicast route to internal interface
149 1 Anonymous
   /sbin/route delete -host 239.2.11.71 dev eth0
150 1 Anonymous
   killproc gmond
151 1 Anonymous
  }}}
152 1 Anonymous
 * Make the Ganglia Monitoring Daemon start at bootup.
153 1 Anonymous
  {{{
154 1 Anonymous
   /sbin/chkconfig  gmond  on
155 1 Anonymous
  }}}
156 1 Anonymous
 * Start the Ganglia Monitoring Daemon:
157 1 Anonymous
  {{{
158 1 Anonymous
   /sbin/service  gmond  start
159 1 Anonymous
  }}}
160 1 Anonymous
  
161 1 Anonymous
=== Ganglia Meta Daemon ===
162 1 Anonymous
 * Install and configure the ''Ganglia Meta Daeomn'' (gmetad) on the master node.
163 1 Anonymous
164 1 Anonymous
 * Make the Ganglia Meta Daemon start at bootup.
165 1 Anonymous
  {{{
166 1 Anonymous
   /sbin/chkconfig  --add gmetad
167 1 Anonymous
   /sbin/chkconfig  gmetad  on
168 1 Anonymous
  }}}
169 1 Anonymous
 * Start the Ganglia Meta Daemon:
170 1 Anonymous
  {{{
171 1 Anonymous
   /sbin/service  gmetad  start
172 1 Anonymous
  }}}
173 1 Anonymous
 
174 1 Anonymous
 * If the pie chart diagrams do not show up, you have to install the ''php-gd'' packages.
175 1 Anonymous
176 1 Anonymous
  
177 1 Anonymous
=== Further Customisation ===
178 1 Anonymous
In order to display more fine-grained time intervals, edit the following files in ''/usr/local/ganglia/html/'':
179 1 Anonymous
 * '''header.php'''
180 1 Anonymous
 {{{
181 1 Anonymous
  if (!$physical) {
182 1 Anonymous
   $context_ranges[]="10 minutes";
183 1 Anonymous
   $context_ranges[]="20 minutes";
184 1 Anonymous
   $context_ranges[]="30 minutes";
185 1 Anonymous
   $context_ranges[]="1 hour";
186 1 Anonymous
   $context_ranges[]="2 hours";
187 1 Anonymous
   $context_ranges[]="4 hours";
188 1 Anonymous
   $context_ranges[]="8 hours";
189 1 Anonymous
   $context_ranges[]="12 hours";
190 1 Anonymous
   $context_ranges[]="1 day";
191 1 Anonymous
   $context_ranges[]="2 days";
192 1 Anonymous
   $context_ranges[]="week";
193 1 Anonymous
   $context_ranges[]="month";
194 1 Anonymous
   $context_ranges[]="year";
195 1 Anonymous
 }}}
196 1 Anonymous
197 1 Anonymous
 * '''get_context.php'''
198 1 Anonymous
 {{{
199 1 Anonymous
  switch ($range) {
200 1 Anonymous
   case "10 minutes":   $start = -600; break;
201 1 Anonymous
   case "20 minutes":   $start = -1200; break;
202 1 Anonymous
   case "30 minutes":   $start = -1800; break;
203 1 Anonymous
   case "1 hour":       $start = -3600; break;
204 1 Anonymous
   case "2 hours":      $start = -7200; break;
205 1 Anonymous
   case "4 hours":      $start = -14400; break;
206 1 Anonymous
   case "8 hours":      $start = -28800; break;
207 1 Anonymous
   case "12 hours":     $start = -43200; break;
208 1 Anonymous
   case "1 day":        $start = -86400; break;
209 1 Anonymous
   case "2 days":       $start = -172800; break;
210 1 Anonymous
   case "week":         $start = -604800; break;
211 1 Anonymous
   case "month":        $start = -2419200; break;
212 1 Anonymous
   case "year":         $start = -31449600; break;
213 1 Anonymous
 }}}
214 1 Anonymous
215 1 Anonymous
216 1 Anonymous
== !JobMonarch ==
217 1 Anonymous
!JobMonarch is an add-on to Ganglia which provides PBS job monitoring through the web browser.
218 1 Anonymous
219 1 Anonymous
See [http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation] for information on requirements, configuration and installation.
220 1 Anonymous
221 1 Anonymous
'''Attention''': Does not work properly yet.
222 1 Anonymous
223 1 Anonymous
224 1 Anonymous
== Domain Usage Monitoring ==
225 1 Anonymous
All HTML documents must contain the following code in order to be tracked correctly.
226 1 Anonymous
227 1 Anonymous
 {{{
228 1 Anonymous
<!-- Site Meter -->
229 1 Anonymous
	<script type="text/javascript" src="http://s18.sitemeter.com/js/counter.js?site=s18procksi">
230 1 Anonymous
	</script>
231 1 Anonymous
	<noscript>
232 1 Anonymous
		<a href="http://s18.sitemeter.com/stats.asp?site=s18procksi" target="_top">
233 1 Anonymous
			<img	src=[http://s18.sitemeter.com/meter.asp?site=s18procksi http://s18.sitemeter.com/meter.asp?site=s18procksi]
234 1 Anonymous
    				alt="Site Meter" border="0"/>
235 1 Anonymous
		</a>
236 1 Anonymous
	</noscript>
237 1 Anonymous
238 1 Anonymous
<!-- Copyright (c)2006 Site Meter -->
239 1 Anonymous
 }}}