ClusterMonitoring » History » Version 3

« Previous - Version 3/11 (diff) - Next » - Current version
Anonymous, 09/17/2007 07:42 PM


= Cluster Monitoring =
The cluster resources and performance needs to be constantly monitored, and the users need to be tracked.

We assume the following configuration: ||!StorMan || 2.12_B928 ||Ganglia || 3.0.4 ||!JobMonarch || 0.2

!StorMan
!StorMan is the ''DELL RAID Storage Manger" (RSM) for RAID systems. The agent has ''/dev/aac0'' open and listens on ports 34571, 34572, 34573. * Download [repos:Externals/Cluster/RSM-2.12_B928_Linux.tgz] from the repository and unpack it. Enter at the command line of the master node: {{{
untgz StorMan-2.12.i386.rpm
}}} * Install the RSM ignoring the dependencies. Enter at the command line of the master node: {{{
rpm -Uv --dodeps StorMan-2.12.i386.rpm
}}} * Make sure that the RSM starts at boot time. Enter at the command line of the master node: {{{
cd /usr/StoreMan
cp ./stor_agent /etc/init.d/rsd
/sbin/chkconfig --add rsd
/sbin/chkconfig rsd on
}}} * Start the RSM. Enter at the command line of the master node: {{{
/sbin/service rsd start
}}} Ganglia
''Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.'' * Download the latest release of the ''Ganglia Monitoring Core'' from [http://ganglia.sourceforge.net/ http://ganglia.sourceforge.net]. * Install Ganglia into ''/usr/local/ganglia'', its web frontend into ''/usr/local/ganglia/html/', and its databases into ''/usr/local/ganglia/rrds/'. * Install the ''Ganglia Monitoring Daemon'' (gmond) on each node, and the ''Ganglia Meta Daemon'' (gmetad) on the the head node.

=== Ganglia Monitoring Daemon === * Configure, build and install Ganglia on each slave node (only with ''gmond''): {{{
./configure --prefix=/usr/local
}}}
and on the master node (with ''gmond'' and ''gmetad''): {{{
./configure --prefix=/usr/local --with-gmeta
}}}

  • Initialise the configuration file for ''gmond'': {{{
    gmond --default >> /etc/gmond.conf
    }}}
  • Configure the ''Ganglia Monitoring Daemon'' in ''/etc/gmond.conf'': * Set the name of the cluster: {{{
    cluster {
    name = "ProCKSI"
    }
    }}} * Set the IP address and port for multicast data exchange: {{{
    udp_send_channel {
    mcast_join = 239.2.11.71
    port = 8649
    }
    udp_recv_channel {
    mcast_join = 239.2.11.71
    port = 8649
    bind = 239.2.11.71
    }
    }}}
  • Copy start-up script for ''gmond'': {{{
    cp ./gmond/gmond.init /etc/init.d/gmond
    }}}
  • Add additional route for correct data exchange via multicast using the ''internal'' interface (''eth0''). Modify ''/etc/inid.d/gmond'': {{{
    #Add multicast route to internal interface
    /sbin/route add -host 239.2.11.71 dev eth0
    daemon $GMOND
    }}} {{{
    #Remove multicast route to internal interface
    /sbin/route delete -host 239.2.11.71 dev eth0
    killproc gmond
    }}}
  • Make the Ganglia Monitoring Daemon start at bootup. {{{
    /sbin/chkconfig gmond on
    }}}
  • Start the Ganglia Monitoring Daemon: {{{
    /sbin/service gmond start
    }}}

=== Ganglia Meta Daemon === * Install and configure the ''Ganglia Meta Daeomn'' (gmetad) on the master node.

  • Make the Ganglia Meta Daemon start at bootup. {{{
    /sbin/chkconfig --add gmetad
    /sbin/chkconfig gmetad on
    }}}
  • Start the Ganglia Meta Daemon: {{{
    /sbin/service gmetad start
    }}}
  • If the pie chart diagrams do not show up, you have to install the ''php-gd'' packages.

=== Further Customisation ===
In order to display more fine-grained time intervals, edit the following files in ''/usr/local/ganglia/html/'': * '''header.php''' {{{
if (!$physical) {
$context_ranges[]="10 minutes";
$context_ranges[]="20 minutes";
$context_ranges[]="30 minutes";
$context_ranges[]="1 hour";
$context_ranges[]="2 hours";
$context_ranges[]="4 hours";
$context_ranges[]="8 hours";
$context_ranges[]="12 hours";
$context_ranges[]="1 day";
$context_ranges[]="2 days";
$context_ranges[]="week";
$context_ranges[]="month";
$context_ranges[]="year";
}}}

  • '''get_context.php''' {{{
    switch ($range) {
    case "10 minutes": $start = -600; break;
    case "20 minutes": $start = -1200; break;
    case "30 minutes": $start = -1800; break;
    case "1 hour": $start = -3600; break;
    case "2 hours": $start = -7200; break;
    case "4 hours": $start = -14400; break;
    case "8 hours": $start = -28800; break;
    case "12 hours": $start = -43200; break;
    case "1 day": $start = -86400; break;
    case "2 days": $start = -172800; break;
    case "week": $start = -604800; break;
    case "month": $start = -2419200; break;
    case "year": $start = -31449600; break;
    }}}
!JobMonarch
!JobMonarch is an add-on to Ganglia which provides PBS job monitoring through the web browser.

See [http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation http://subtrac.rc.sara.nl/oss/jobmonarch/wiki/Documentation] for information on requirements, configuration and installation.

'''Attention''': Does not work properly yet.

Domain Usage Monitoring
All HTML documents must contain the following code in order to be tracked correctly. {{{
<!-- Site Meter -->
<script type="text/javascript" src="http://s18.sitemeter.com/js/counter.js?site=s18procksi">
</script>
<noscript>
<a href="http://s18.sitemeter.com/stats.asp?site=s18procksi" target="_top">
<img src=[http://s18.sitemeter.com/meter.asp?site=s18procksi http://s18.sitemeter.com/meter.asp?site=s18procksi]
alt="Site Meter" border="0"/>
</a>
</noscript>

<!-- Copyright (c)2006 Site Meter -->
}}}