ClusterMonitoring » History » Version 2
Anonymous, 09/17/2007 02:38 PM
1 | 1 | Anonymous | = Cluster Monitoring = |
---|---|---|---|
2 | 1 | Anonymous | The cluster resources and performance needs to be constantly monitored, and the users need to be tracked. |
3 | 1 | Anonymous | |
4 | 1 | Anonymous | We assume the following configuration: |
5 | 2 | Anonymous | ||!StorMan || 2.12_B928 |
6 | 1 | Anonymous | ||Ganglia || 3.0.4 |
7 | 1 | Anonymous | ||!JobMonarch || 0.2 |
8 | 2 | Anonymous | |
9 | 2 | Anonymous | |
10 | 2 | Anonymous | == !StorMan == |
11 | 2 | Anonymous | !StorMan is the ''DELL RAID Storage Manger" (RSM) for RAID systems. |
12 | 2 | Anonymous | |
13 | 2 | Anonymous | * Download [repos:Externals/Cluster/RSM-2.12_B928_Linux.tgz] from the repository and unpack it. Enter at the command line of the master node: |
14 | 2 | Anonymous | {{{ |
15 | 2 | Anonymous | untgz StorMan-2.12.i386.rpm |
16 | 2 | Anonymous | }}} |
17 | 2 | Anonymous | |
18 | 2 | Anonymous | * Install the RSM ignoring the dependencies. Enter at the command line of the master node: |
19 | 2 | Anonymous | {{{ |
20 | 2 | Anonymous | rpm -Uv --dodeps StorMan-2.12.i386.rpm |
21 | 2 | Anonymous | }}} |
22 | 2 | Anonymous | |
23 | 2 | Anonymous | * Make sure that the RSM starts at boot time. Enter at the command line of the master node: |
24 | 2 | Anonymous | {{{ |
25 | 2 | Anonymous | cd /usr/StoreMan |
26 | 2 | Anonymous | cp ./stor_agent /etc/init.d/rsd |
27 | 2 | Anonymous | /sbin/chkconfig --add rsd |
28 | 2 | Anonymous | /sbin/chkconfig rsd on |
29 | 2 | Anonymous | }}} |
30 | 2 | Anonymous | |
31 | 2 | Anonymous | * Start the RSM. Enter at the command line of the master node: |
32 | 2 | Anonymous | {{{ |
33 | 2 | Anonymous | /sbin/service rsd start |
34 | 2 | Anonymous | }}} |
35 | 2 | Anonymous | |
36 | 2 | Anonymous | |
37 | 1 | Anonymous | |
38 | 1 | Anonymous | == Ganglia == |
39 | 1 | Anonymous | ''Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.'' |
40 | 1 | Anonymous | |
41 | 1 | Anonymous | * Download the latest release of the ''Ganglia Monitoring Core'' from [http://ganglia.sourceforge.net/ http://ganglia.sourceforge.net]. |
42 | 1 | Anonymous | * Install Ganglia into ''/usr/local/ganglia'', its web frontend into ''/usr/local/ganglia/html/', and its databases into ''/usr/local/ganglia/rrds/'. |
43 | 1 | Anonymous | * Install the ''Ganglia Monitoring Daemon'' (gmond) on each node, and the ''Ganglia Meta Daemon'' (gmetad) on the the head node. |
44 | 1 | Anonymous | |
45 | 1 | Anonymous | === Ganglia Monitoring Daemon === |
46 | 1 | Anonymous | * Configure, build and install Ganglia on each slave node (only with ''gmond''): |
47 | 1 | Anonymous | {{{ |
48 | 1 | Anonymous | ./configure --prefix=/usr/local |
49 | 1 | Anonymous | }}} |
50 | 1 | Anonymous | and on the master node (with ''gmond'' and ''gmetad''): |
51 | 1 | Anonymous | {{{ |
52 | 1 | Anonymous | ./configure --prefix=/usr/local --with-gmeta |
53 | 1 | Anonymous | }}} |
54 | 1 | Anonymous | |
55 | 1 | Anonymous | * Initialise the configuration file for ''gmond'': |
56 | 1 | Anonymous | {{{ |
57 | 1 | Anonymous | gmond --default >> /etc/gmond.conf |
58 | 1 | Anonymous | }}} |
59 | 1 | Anonymous | |
60 | 1 | Anonymous | * Configure the ''Ganglia Monitoring Daemon'' in ''/etc/gmond.conf'': |
61 | 1 | Anonymous | * Set the name of the cluster: |
62 | 1 | Anonymous | {{{ |
63 | 1 | Anonymous | cluster { |
64 | 1 | Anonymous | name = "ProCKSI" |
65 | 1 | Anonymous | } |
66 | 1 | Anonymous | }}} |
67 | 1 | Anonymous | * Set the IP address and port for multicast data exchange: |
68 | 1 | Anonymous | {{{ |
69 | 1 | Anonymous | udp_send_channel { |
70 | 1 | Anonymous | mcast_join = 239.2.11.71 |
71 | 1 | Anonymous | port = 8649 |
72 | 1 | Anonymous | } |
73 | 1 | Anonymous | udp_recv_channel { |
74 | 1 | Anonymous | mcast_join = 239.2.11.71 |
75 | 1 | Anonymous | port = 8649 |
76 | 1 | Anonymous | bind = 239.2.11.71 |
77 | 1 | Anonymous | } |
78 | 1 | Anonymous | }}} |
79 | 1 | Anonymous | |
80 | 1 | Anonymous | * Copy start-up script for ''gmond'': |
81 | 1 | Anonymous | {{{ |
82 | 1 | Anonymous | cp ./gmond/gmond.init /etc/init.d/gmond |
83 | 1 | Anonymous | }}} |
84 | 1 | Anonymous | |
85 | 1 | Anonymous | * Add additional route for correct data exchange via multicast using the ''internal'' interface (''eth0''). Modify ''/etc/inid.d/gmond'': |
86 | 1 | Anonymous | {{{ |
87 | 1 | Anonymous | #Add multicast route to internal interface |
88 | 1 | Anonymous | /sbin/route add -host 239.2.11.71 dev eth0 |
89 | 1 | Anonymous | daemon $GMOND |
90 | 1 | Anonymous | }}} |
91 | 1 | Anonymous | {{{ |
92 | 1 | Anonymous | #Remove multicast route to internal interface |
93 | 1 | Anonymous | /sbin/route delete -host 239.2.11.71 dev eth0 |
94 | 1 | Anonymous | killproc gmond |
95 | 1 | Anonymous | }}} |
96 | 1 | Anonymous | * Make the Ganglia Monitoring Daemon start at bootup. |
97 | 1 | Anonymous | {{{ |
98 | 1 | Anonymous | /sbin/chkconfig gmond on |
99 | 1 | Anonymous | }}} |
100 | 1 | Anonymous | * Start the Ganglia Monitoring Daemon: |
101 | 1 | Anonymous | {{{ |
102 | 1 | Anonymous | /sbin/service gmond start |
103 | 1 | Anonymous | }}} |
104 | 1 | Anonymous | |
105 | 1 | Anonymous | === Ganglia Meta Daemon === |
106 | 1 | Anonymous | * Install and configure the ''Ganglia Meta Daeomn'' (gmetad) on the master node. |
107 | 1 | Anonymous | |
108 | 1 | Anonymous | * Make the Ganglia Meta Daemon start at bootup. |
109 | 1 | Anonymous | {{{ |
110 | 1 | Anonymous | /sbin/chkconfig --add gmetad |
111 | 1 | Anonymous | /sbin/chkconfig gmetad on |
112 | 1 | Anonymous | }}} |
113 | 1 | Anonymous | * Start the Ganglia Meta Daemon: |
114 | 1 | Anonymous | {{{ |
115 | 1 | Anonymous | /sbin/service gmetad start |
116 | 1 | Anonymous | }}} |
117 | 1 | Anonymous | |
118 | 1 | Anonymous | * If the pie chart diagrams do not show up, you have to install the ''php-gd'' packages. |
119 | 1 | Anonymous | |
120 | 1 | Anonymous | |
121 | 1 | Anonymous | === Further Customisation === |
122 | 1 | Anonymous | In order to display more fine-grained time intervals, edit the following files in ''/usr/local/ganglia/html/'': |
123 | 1 | Anonymous | * '''header.php''' |
124 | 1 | Anonymous | {{{ |
125 | 1 | Anonymous | if (!$physical) { |
126 | 1 | Anonymous | $context_ranges[]="10 minutes"; |
127 | 1 | Anonymous | $context_ranges[]="20 minutes"; |
128 | 1 | Anonymous | $context_ranges[]="30 minutes"; |
129 | 1 | Anonymous | $context_ranges[]="1 hour"; |
130 | 1 | Anonymous | $context_ranges[]="2 hours"; |
131 | 1 | Anonymous | $context_ranges[]="4 hours"; |
132 | 1 | Anonymous | $context_ranges[]="8 hours"; |
133 | 1 | Anonymous | $context_ranges[]="12 hours"; |
134 | 1 | Anonymous | $context_ranges[]="1 day"; |
135 | 1 | Anonymous | $context_ranges[]="2 days"; |
136 | 1 | Anonymous | $context_ranges[]="week"; |
137 | 1 | Anonymous | $context_ranges[]="month"; |
138 | 1 | Anonymous | $context_ranges[]="year"; |
139 | 1 | Anonymous | }}} |
140 | 1 | Anonymous | |
141 | 1 | Anonymous | * '''get_context.php''' |
142 | 1 | Anonymous | {{{ |
143 | 1 | Anonymous | switch ($range) { |
144 | 1 | Anonymous | case "10 minutes": $start = -600; break; |
145 | 1 | Anonymous | case "20 minutes": $start = -1200; break; |
146 | 1 | Anonymous | case "30 minutes": $start = -1800; break; |
147 | 1 | Anonymous | case "1 hour": $start = -3600; break; |
148 | 1 | Anonymous | case "2 hours": $start = -7200; break; |
149 | 1 | Anonymous | case "4 hours": $start = -14400; break; |
150 | 1 | Anonymous | case "8 hours": $start = -28800; break; |
151 | 1 | Anonymous | case "12 hours": $start = -43200; break; |
152 | 1 | Anonymous | case "1 day": $start = -86400; break; |
153 | 1 | Anonymous | case "2 days": $start = -172800; break; |
154 | 1 | Anonymous | case "week": $start = -604800; break; |
155 | 1 | Anonymous | case "month": $start = -2419200; break; |
156 | 1 | Anonymous | case "year": $start = -31449600; break; |
157 | 1 | Anonymous | }}} |
158 | 1 | Anonymous | |
159 | 1 | Anonymous | |
160 | 1 | Anonymous | == !JobMonarch == |
161 | 1 | Anonymous | !JobMonarch is an add-on to Ganglia which provides PBS job monitoring through the web browser. |
162 | 1 | Anonymous | |
163 | 1 | Anonymous | See [http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation] for information on requirements, configuration and installation. |
164 | 1 | Anonymous | |
165 | 1 | Anonymous | '''Attention''': Does not work properly yet. |
166 | 1 | Anonymous | |
167 | 1 | Anonymous | |
168 | 1 | Anonymous | == Domain Usage Monitoring == |
169 | 1 | Anonymous | All HTML documents must contain the following code in order to be tracked correctly. |
170 | 1 | Anonymous | |
171 | 1 | Anonymous | {{{ |
172 | 1 | Anonymous | <!-- Site Meter --> |
173 | 1 | Anonymous | <script type="text/javascript" src="http://s18.sitemeter.com/js/counter.js?site=s18procksi"> |
174 | 1 | Anonymous | </script> |
175 | 1 | Anonymous | <noscript> |
176 | 1 | Anonymous | <a href="http://s18.sitemeter.com/stats.asp?site=s18procksi" target="_top"> |
177 | 1 | Anonymous | <img src=[http://s18.sitemeter.com/meter.asp?site=s18procksi http://s18.sitemeter.com/meter.asp?site=s18procksi] |
178 | 1 | Anonymous | alt="Site Meter" border="0"/> |
179 | 1 | Anonymous | </a> |
180 | 1 | Anonymous | </noscript> |
181 | 1 | Anonymous | |
182 | 1 | Anonymous | <!-- Copyright (c)2006 Site Meter --> |
183 | 1 | Anonymous | }}} |