Monit » History » Version 4

Version 3 (Anonymous, 08/19/2008 05:57 PM) → Version 4/5 (Anonymous, 08/20/2008 01:37 PM)

= Monit: Monitoring of Services =
This installation guide will describe how to set up ''independent'' instances of [http://www.tildeslash.com/monit/ Monit] on the master node and each slave.[[br]]
In the future, [http://www.tildeslash.com/mmonit/ M|Monit] should be considered, which allows easy single point administration and monitoring (from the master node).

You can find more about Monit at [http://mon.wiki.kernel.org/].

== Installation ==
* Add the DAG repository on the ''master node'' and ''slave nodes''. Enter at the command line as ''root'':
{{{
wget http://apt.sw.be/redhat/el5/en/x86_64/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm
rpm -Uvh rpmforge-release-0.3.6-1.el5.rf.x86_64.rpm
}}}

* Install Monit on the ''master node'' and ''slave nodes''. Enter at the command line as ''root'':
{{{
yum install monit
}}}

== Configuration ==

=== Master node ===
On the master node, the following services will be monitore:[[br]] monitore:
''apache'', ''cron'', ''devices'' apache, cron, devices (/ & /home), ''mysql'', ''nfs'' (/home_nfs), ''ntp'', ''pbs_mom'', ''pbs_sched'', ''pbs_server'', ''postfix'', ''ssh'', ''system'', ''ypbind'', ''yppasswd'', ''ypserv'', [[br]] mysql, nfs, ntp, pbs_mom, pbs_sched, pbs_server, postfix, ssh, system, ypbind, yppasswd, ypserv
and if all ''slaves'' are reachable (''ping'')[[br]]
Currently, the
The monitoring of ''pbs_maui'' pbs_maui is switched off in favour of ''pbs_sched''.[[br]]

pbs_sched.

* Download the [source:Externals/Cluster/procksi_monit.tgz configuration files] from the repository and extract the files. Enter at the command line:
{{{
tar -xvzf procksi_monit.tgz
}}}

* Copy the files in ''./monit/master'' to the appropriate directories (''/etc/'', ''/etc/monit.d/'', ''/home/procksi/monit/'').

* Change permissions of the monit token file. Enter at the command line:
{{{
chown -R procksi.procksi_dev /home/procksi/monit/token
}}}

* Edit the Apache configuration file ''/etc/httpd/conf/httpd.conf'':
{{{
#General Aliases for Monitoring and Testing
Alias /monit/ "/home/procksi/monit/"
Alias /ganglia/ "/usr/local/ganglia/html/"
Alias /trees/ "/home/procksi/trees/"

#Conditional Logging: Don't log Ganglia and Monit requests
SetEnvIf Request_URI "ganglia" dontlog
SetEnvIf Request_URI "^\/monit\/token$" dontlog
}}}

* Restart the Apache server. Enter at the command line as ''root'':
{{{
/sbin/service httpd restart
}}}

* Make the Monit daemon start at bootup. Enter at the command line as ''root'':
{{{
/sbin/chkconfig monit on
}}}

* Start the Monit daemon. Enter at the command line as ''root'':
{{{
/sbin/service monit start
}}}

=== Slave nodes ===
On the master node, the following services will be monitore:[[br]] monitore:
''devices'' devices (/ and /scratch), ''nfs'' (/home), ''ntp'', ''pbs_mom'', ''ssh'', ''system'', ''ypbind'' nfs, ntp, pbs_mom, ssh, system, ypbind

* Download the [source:Externals/Cluster/procksi_monit.tgz configuration files] from the repository and extract the files. Enter at the command line:
{{{
tar -xvzf procksi_monit.tgz
}}}

* Copy a the files in ''./monit/slave'' to the appropriate directories (''/etc/'', ''/etc/monit.d/'').

* Edit ''/etc/monit.d/system'' and set the correct host name for each slave node.

* Make the Monit daemon start at bootup. Enter at the command line as ''root'':
{{{
/sbin/chkconfig monit on
}}}

* Start the Monit daemon. Enter at the command line as ''root'':
{{{
/sbin/service monit start
}}}

== Online Monitoring ==

The status of each monitored service, process, file, etc. is available with the Monit's integrated webserver at port 2812 from ''localhost'' and selected machines. Username and password can be found at the secret [[wiki:secretAuthentication authentication]] page.

|| master01 || [http://procksi0.cs.nott.ac.uk:2812]
|| slave01 || [http://procksi1.cs.nott.ac.uk:2812]
|| slave02 || [http://procksi2.cs.nott.ac.uk:2812]
|| slave03 || [http://procksi3.cs.nott.ac.uk:2812]
|| slave04 || [http://procksi4.cs.nott.ac.uk:2812]

== Offline Monitoring ==

Monit sends alerts to "procksi@cs.nott.ac.uk" if services are unavailable, have been restarted, or similar events.