ClusterMonitoring » History » Version 3

Version 2 (Anonymous, 09/17/2007 02:38 PM) → Version 3/11 (Anonymous, 09/17/2007 07:42 PM)

= Cluster Monitoring =
The cluster resources and performance needs to be constantly monitored, and the users need to be tracked.

We assume the following configuration:
||!StorMan || 2.12_B928
||Ganglia || 3.0.4
||!JobMonarch || 0.2

== !StorMan ==
!StorMan is the ''DELL RAID Storage Manger" (RSM) for RAID systems. The agent has ''/dev/aac0'' open and listens on ports 34571, 34572, 34573.

* Download [repos:Externals/Cluster/RSM-2.12_B928_Linux.tgz] from the repository and unpack it. Enter at the command line of the master node:
{{{
untgz StorMan-2.12.i386.rpm
}}}

* Install the RSM ignoring the dependencies. Enter at the command line of the master node:
{{{
rpm -Uv --dodeps StorMan-2.12.i386.rpm
}}}

* Make sure that the RSM starts at boot time. Enter at the command line of the master node:
{{{
cd /usr/StoreMan
cp ./stor_agent /etc/init.d/rsd
/sbin/chkconfig --add rsd
/sbin/chkconfig rsd on
}}}

* Start the RSM. Enter at the command line of the master node:
{{{
/sbin/service rsd start
}}}



== Ganglia ==
''Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.''

* Download the latest release of the ''Ganglia Monitoring Core'' from [http://ganglia.sourceforge.net/ http://ganglia.sourceforge.net].
* Install Ganglia into ''/usr/local/ganglia'', its web frontend into ''/usr/local/ganglia/html/', and its databases into ''/usr/local/ganglia/rrds/'.
* Install the ''Ganglia Monitoring Daemon'' (gmond) on each node, and the ''Ganglia Meta Daemon'' (gmetad) on the the head node.

=== Ganglia Monitoring Daemon ===
* Configure, build and install Ganglia on each slave node (only with ''gmond''):
{{{
./configure --prefix=/usr/local
}}}
and on the master node (with ''gmond'' and ''gmetad''):
{{{
./configure --prefix=/usr/local --with-gmeta
}}}

* Initialise the configuration file for ''gmond'':
{{{
gmond --default >> /etc/gmond.conf
}}}

* Configure the ''Ganglia Monitoring Daemon'' in ''/etc/gmond.conf'':
* Set the name of the cluster:
{{{
cluster {
name = "ProCKSI"
}
}}}
* Set the IP address and port for multicast data exchange:
{{{
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}
}}}

* Copy start-up script for ''gmond'':
{{{
cp ./gmond/gmond.init /etc/init.d/gmond
}}}

* Add additional route for correct data exchange via multicast using the ''internal'' interface (''eth0''). Modify ''/etc/inid.d/gmond'':
{{{
#Add multicast route to internal interface
/sbin/route add -host 239.2.11.71 dev eth0
daemon $GMOND
}}}
{{{
#Remove multicast route to internal interface
/sbin/route delete -host 239.2.11.71 dev eth0
killproc gmond
}}}
* Make the Ganglia Monitoring Daemon start at bootup.
{{{
/sbin/chkconfig gmond on
}}}
* Start the Ganglia Monitoring Daemon:
{{{
/sbin/service gmond start
}}}

=== Ganglia Meta Daemon ===
* Install and configure the ''Ganglia Meta Daeomn'' (gmetad) on the master node.

* Make the Ganglia Meta Daemon start at bootup.
{{{
/sbin/chkconfig --add gmetad
/sbin/chkconfig gmetad on
}}}
* Start the Ganglia Meta Daemon:
{{{
/sbin/service gmetad start
}}}

* If the pie chart diagrams do not show up, you have to install the ''php-gd'' packages.


=== Further Customisation ===
In order to display more fine-grained time intervals, edit the following files in ''/usr/local/ganglia/html/'':
* '''header.php'''
{{{
if (!$physical) {
$context_ranges[]="10 minutes";
$context_ranges[]="20 minutes";
$context_ranges[]="30 minutes";
$context_ranges[]="1 hour";
$context_ranges[]="2 hours";
$context_ranges[]="4 hours";
$context_ranges[]="8 hours";
$context_ranges[]="12 hours";
$context_ranges[]="1 day";
$context_ranges[]="2 days";
$context_ranges[]="week";
$context_ranges[]="month";
$context_ranges[]="year";
}}}

* '''get_context.php'''
{{{
switch ($range) {
case "10 minutes": $start = -600; break;
case "20 minutes": $start = -1200; break;
case "30 minutes": $start = -1800; break;
case "1 hour": $start = -3600; break;
case "2 hours": $start = -7200; break;
case "4 hours": $start = -14400; break;
case "8 hours": $start = -28800; break;
case "12 hours": $start = -43200; break;
case "1 day": $start = -86400; break;
case "2 days": $start = -172800; break;
case "week": $start = -604800; break;
case "month": $start = -2419200; break;
case "year": $start = -31449600; break;
}}}

== !JobMonarch ==
!JobMonarch is an add-on to Ganglia which provides PBS job monitoring through the web browser.

See [http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation] for information on requirements, configuration and installation.

'''Attention''': Does not work properly yet.

== Domain Usage Monitoring ==
All HTML documents must contain the following code in order to be tracked correctly.

{{{
<!-- Site Meter -->
<script type="text/javascript" src="http://s18.sitemeter.com/js/counter.js?site=s18procksi">
</script>
<noscript>
<a href="http://s18.sitemeter.com/stats.asp?site=s18procksi" target="_top">
<img src=[http://s18.sitemeter.com/meter.asp?site=s18procksi http://s18.sitemeter.com/meter.asp?site=s18procksi]
alt="Site Meter" border="0"/>
</a>
</noscript>

<!-- Copyright (c)2006 Site Meter -->
}}}