ClusterMonitoring » History » Version 3
« Previous -
Version 3/11
(diff) -
Next » -
Current version
Anonymous, 09/17/2007 07:42 PM
= Cluster Monitoring =
The cluster resources and performance needs to be constantly monitored, and the users need to be tracked.
We assume the following configuration: ||!StorMan || 2.12_B928 ||Ganglia || 3.0.4 ||!JobMonarch || 0.2
!StorMan!StorMan is the ''DELL RAID Storage Manger" (RSM) for RAID systems. The agent has ''/dev/aac0'' open and listens on ports 34571, 34572, 34573. * Download [repos:Externals/Cluster/RSM-2.12_B928_Linux.tgz] from the repository and unpack it. Enter at the command line of the master node: {{{
untgz StorMan-2.12.i386.rpm
}}} * Install the RSM ignoring the dependencies. Enter at the command line of the master node: {{{
rpm -Uv --dodeps StorMan-2.12.i386.rpm
}}} * Make sure that the RSM starts at boot time. Enter at the command line of the master node: {{{
cd /usr/StoreMan
cp ./stor_agent /etc/init.d/rsd
/sbin/chkconfig --add rsd
/sbin/chkconfig rsd on
}}} * Start the RSM. Enter at the command line of the master node: {{{
/sbin/service rsd start
}}} Ganglia
''Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.'' * Download the latest release of the ''Ganglia Monitoring Core'' from [http://ganglia.sourceforge.net/ http://ganglia.sourceforge.net]. * Install Ganglia into ''/usr/local/ganglia'', its web frontend into ''/usr/local/ganglia/html/', and its databases into ''/usr/local/ganglia/rrds/'. * Install the ''Ganglia Monitoring Daemon'' (gmond) on each node, and the ''Ganglia Meta Daemon'' (gmetad) on the the head node.
=== Ganglia Monitoring Daemon ===
* Configure, build and install Ganglia on each slave node (only with ''gmond''):
{{{
./configure --prefix=/usr/local
}}}
and on the master node (with ''gmond'' and ''gmetad''):
{{{
./configure --prefix=/usr/local --with-gmeta
}}}
- Initialise the configuration file for ''gmond'':
{{{
gmond --default >> /etc/gmond.conf
}}}
- Configure the ''Ganglia Monitoring Daemon'' in ''/etc/gmond.conf'':
* Set the name of the cluster:
{{{
cluster {
name = "ProCKSI"
}
}}} * Set the IP address and port for multicast data exchange: {{{
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
}
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}
}}}
- Copy start-up script for ''gmond'':
{{{
cp ./gmond/gmond.init /etc/init.d/gmond
}}}
- Add additional route for correct data exchange via multicast using the ''internal'' interface (''eth0''). Modify ''/etc/inid.d/gmond'':
{{{
#Add multicast route to internal interface
/sbin/route add -host 239.2.11.71 dev eth0
daemon $GMOND
}}} {{{
#Remove multicast route to internal interface
/sbin/route delete -host 239.2.11.71 dev eth0
killproc gmond
}}} - Make the Ganglia Monitoring Daemon start at bootup.
{{{
/sbin/chkconfig gmond on
}}} - Start the Ganglia Monitoring Daemon:
{{{
/sbin/service gmond start
}}}
=== Ganglia Meta Daemon === * Install and configure the ''Ganglia Meta Daeomn'' (gmetad) on the master node.
- Make the Ganglia Meta Daemon start at bootup.
{{{
/sbin/chkconfig --add gmetad
/sbin/chkconfig gmetad on
}}} - Start the Ganglia Meta Daemon:
{{{
/sbin/service gmetad start
}}}
- If the pie chart diagrams do not show up, you have to install the ''php-gd'' packages.
=== Further Customisation ===
In order to display more fine-grained time intervals, edit the following files in ''/usr/local/ganglia/html/'':
* '''header.php'''
{{{
if (!$physical) {
$context_ranges[]="10 minutes";
$context_ranges[]="20 minutes";
$context_ranges[]="30 minutes";
$context_ranges[]="1 hour";
$context_ranges[]="2 hours";
$context_ranges[]="4 hours";
$context_ranges[]="8 hours";
$context_ranges[]="12 hours";
$context_ranges[]="1 day";
$context_ranges[]="2 days";
$context_ranges[]="week";
$context_ranges[]="month";
$context_ranges[]="year";
}}}
- '''get_context.php'''
{{{
switch ($range) {
case "10 minutes": $start = -600; break;
case "20 minutes": $start = -1200; break;
case "30 minutes": $start = -1800; break;
case "1 hour": $start = -3600; break;
case "2 hours": $start = -7200; break;
case "4 hours": $start = -14400; break;
case "8 hours": $start = -28800; break;
case "12 hours": $start = -43200; break;
case "1 day": $start = -86400; break;
case "2 days": $start = -172800; break;
case "week": $start = -604800; break;
case "month": $start = -2419200; break;
case "year": $start = -31449600; break;
}}}
!JobMonarch is an add-on to Ganglia which provides PBS job monitoring through the web browser.
See [http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation] for information on requirements, configuration and installation.
'''Attention''': Does not work properly yet.
Domain Usage MonitoringAll HTML documents must contain the following code in order to be tracked correctly. {{{
<!-- Site Meter -->
<script type="text/javascript" src="http://s18.sitemeter.com/js/counter.js?site=s18procksi">
</script>
<noscript>
<a href="http://s18.sitemeter.com/stats.asp?site=s18procksi" target="_top">
<img src=[http://s18.sitemeter.com/meter.asp?site=s18procksi http://s18.sitemeter.com/meter.asp?site=s18procksi]
alt="Site Meter" border="0"/>
</a>
</noscript>
<!-- Copyright (c)2006 Site Meter -->
}}}