ClusterMonitoring » History » Version 1

Anonymous, 09/14/2007 10:27 AM

1 1 Anonymous
= Cluster Monitoring =
2 1 Anonymous
The cluster resources and performance needs to be constantly monitored, and the users need to be tracked.
3 1 Anonymous
4 1 Anonymous
We assume the following configuration:
5 1 Anonymous
 ||Ganglia     || 3.0.4
6 1 Anonymous
 ||!JobMonarch || 0.2
7 1 Anonymous
8 1 Anonymous
== Ganglia ==
9 1 Anonymous
''Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.''
10 1 Anonymous
11 1 Anonymous
 * Download the latest release of the ''Ganglia Monitoring Core'' from [http://ganglia.sourceforge.net/ http://ganglia.sourceforge.net].
12 1 Anonymous
 * Install Ganglia into ''/usr/local/ganglia'', its web frontend into ''/usr/local/ganglia/html/', and its databases into ''/usr/local/ganglia/rrds/'.
13 1 Anonymous
 * Install the ''Ganglia Monitoring Daemon'' (gmond) on each node, and the ''Ganglia Meta Daemon'' (gmetad) on the the head node.
14 1 Anonymous
15 1 Anonymous
=== Ganglia Monitoring Daemon ===
16 1 Anonymous
 * Configure, build and install Ganglia on each slave node (only with ''gmond''):
17 1 Anonymous
  {{{
18 1 Anonymous
  ./configure --prefix=/usr/local
19 1 Anonymous
  }}}
20 1 Anonymous
  and on the master node (with ''gmond'' and ''gmetad''):
21 1 Anonymous
  {{{
22 1 Anonymous
  ./configure --prefix=/usr/local --with-gmeta
23 1 Anonymous
  }}}
24 1 Anonymous
25 1 Anonymous
 * Initialise the configuration file for ''gmond'':
26 1 Anonymous
  {{{
27 1 Anonymous
  gmond --default >> /etc/gmond.conf
28 1 Anonymous
  }}}
29 1 Anonymous
30 1 Anonymous
 * Configure the ''Ganglia Monitoring Daemon'' in ''/etc/gmond.conf'':
31 1 Anonymous
  * Set the name of the cluster: 
32 1 Anonymous
  {{{
33 1 Anonymous
  cluster {
34 1 Anonymous
    name = "ProCKSI"
35 1 Anonymous
  }
36 1 Anonymous
  }}}
37 1 Anonymous
  * Set the IP address and port for multicast data exchange:
38 1 Anonymous
  {{{
39 1 Anonymous
  udp_send_channel {
40 1 Anonymous
    mcast_join = 239.2.11.71
41 1 Anonymous
    port = 8649
42 1 Anonymous
  }
43 1 Anonymous
  udp_recv_channel {
44 1 Anonymous
    mcast_join = 239.2.11.71
45 1 Anonymous
    port = 8649
46 1 Anonymous
    bind = 239.2.11.71
47 1 Anonymous
  }
48 1 Anonymous
  }}}
49 1 Anonymous
 
50 1 Anonymous
 * Copy start-up script for ''gmond'':
51 1 Anonymous
  {{{
52 1 Anonymous
  cp ./gmond/gmond.init /etc/init.d/gmond
53 1 Anonymous
  }}}
54 1 Anonymous
55 1 Anonymous
 * Add additional route for correct data exchange via multicast using the ''internal'' interface (''eth0''). Modify ''/etc/inid.d/gmond'':
56 1 Anonymous
  {{{
57 1 Anonymous
   #Add multicast route to internal interface
58 1 Anonymous
   /sbin/route add -host 239.2.11.71 dev eth0
59 1 Anonymous
   daemon $GMOND
60 1 Anonymous
  }}}
61 1 Anonymous
  {{{
62 1 Anonymous
   #Remove multicast route to internal interface
63 1 Anonymous
   /sbin/route delete -host 239.2.11.71 dev eth0
64 1 Anonymous
   killproc gmond
65 1 Anonymous
  }}}
66 1 Anonymous
 * Make the Ganglia Monitoring Daemon start at bootup.
67 1 Anonymous
  {{{
68 1 Anonymous
   /sbin/chkconfig  gmond  on
69 1 Anonymous
  }}}
70 1 Anonymous
 * Start the Ganglia Monitoring Daemon:
71 1 Anonymous
  {{{
72 1 Anonymous
   /sbin/service  gmond  start
73 1 Anonymous
  }}}
74 1 Anonymous
  
75 1 Anonymous
=== Ganglia Meta Daemon ===
76 1 Anonymous
 * Install and configure the ''Ganglia Meta Daeomn'' (gmetad) on the master node.
77 1 Anonymous
78 1 Anonymous
 * Make the Ganglia Meta Daemon start at bootup.
79 1 Anonymous
  {{{
80 1 Anonymous
   /sbin/chkconfig  --add gmetad
81 1 Anonymous
   /sbin/chkconfig  gmetad  on
82 1 Anonymous
  }}}
83 1 Anonymous
 * Start the Ganglia Meta Daemon:
84 1 Anonymous
  {{{
85 1 Anonymous
   /sbin/service  gmetad  start
86 1 Anonymous
  }}}
87 1 Anonymous
 
88 1 Anonymous
 * If the pie chart diagrams do not show up, you have to install the ''php-gd'' packages.
89 1 Anonymous
90 1 Anonymous
  
91 1 Anonymous
=== Further Customisation ===
92 1 Anonymous
In order to display more fine-grained time intervals, edit the following files in ''/usr/local/ganglia/html/'':
93 1 Anonymous
 * '''header.php'''
94 1 Anonymous
 {{{
95 1 Anonymous
  if (!$physical) {
96 1 Anonymous
   $context_ranges[]="10 minutes";
97 1 Anonymous
   $context_ranges[]="20 minutes";
98 1 Anonymous
   $context_ranges[]="30 minutes";
99 1 Anonymous
   $context_ranges[]="1 hour";
100 1 Anonymous
   $context_ranges[]="2 hours";
101 1 Anonymous
   $context_ranges[]="4 hours";
102 1 Anonymous
   $context_ranges[]="8 hours";
103 1 Anonymous
   $context_ranges[]="12 hours";
104 1 Anonymous
   $context_ranges[]="1 day";
105 1 Anonymous
   $context_ranges[]="2 days";
106 1 Anonymous
   $context_ranges[]="week";
107 1 Anonymous
   $context_ranges[]="month";
108 1 Anonymous
   $context_ranges[]="year";
109 1 Anonymous
 }}}
110 1 Anonymous
111 1 Anonymous
 * '''get_context.php'''
112 1 Anonymous
 {{{
113 1 Anonymous
  switch ($range) {
114 1 Anonymous
   case "10 minutes":   $start = -600; break;
115 1 Anonymous
   case "20 minutes":   $start = -1200; break;
116 1 Anonymous
   case "30 minutes":   $start = -1800; break;
117 1 Anonymous
   case "1 hour":       $start = -3600; break;
118 1 Anonymous
   case "2 hours":      $start = -7200; break;
119 1 Anonymous
   case "4 hours":      $start = -14400; break;
120 1 Anonymous
   case "8 hours":      $start = -28800; break;
121 1 Anonymous
   case "12 hours":     $start = -43200; break;
122 1 Anonymous
   case "1 day":        $start = -86400; break;
123 1 Anonymous
   case "2 days":       $start = -172800; break;
124 1 Anonymous
   case "week":         $start = -604800; break;
125 1 Anonymous
   case "month":        $start = -2419200; break;
126 1 Anonymous
   case "year":         $start = -31449600; break;
127 1 Anonymous
 }}}
128 1 Anonymous
129 1 Anonymous
130 1 Anonymous
== !JobMonarch ==
131 1 Anonymous
!JobMonarch is an add-on to Ganglia which provides PBS job monitoring through the web browser.
132 1 Anonymous
133 1 Anonymous
See [http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation] for information on requirements, configuration and installation.
134 1 Anonymous
135 1 Anonymous
'''Attention''': Does not work properly yet.
136 1 Anonymous
137 1 Anonymous
138 1 Anonymous
== Domain Usage Monitoring ==
139 1 Anonymous
All HTML documents must contain the following code in order to be tracked correctly.
140 1 Anonymous
141 1 Anonymous
 {{{
142 1 Anonymous
<!-- Site Meter -->
143 1 Anonymous
	<script type="text/javascript" src="http://s18.sitemeter.com/js/counter.js?site=s18procksi">
144 1 Anonymous
	</script>
145 1 Anonymous
	<noscript>
146 1 Anonymous
		<a href="http://s18.sitemeter.com/stats.asp?site=s18procksi" target="_top">
147 1 Anonymous
			<img	src=[http://s18.sitemeter.com/meter.asp?site=s18procksi http://s18.sitemeter.com/meter.asp?site=s18procksi]
148 1 Anonymous
    				alt="Site Meter" border="0"/>
149 1 Anonymous
		</a>
150 1 Anonymous
	</noscript>
151 1 Anonymous
152 1 Anonymous
<!-- Copyright (c)2006 Site Meter -->
153 1 Anonymous
 }}}