Fortech I.T. Solutions - SANDBOX :: Assorted-Reference / Pe1950-linux-topics-os-disk-raid-health-monitoring

The pe1950 server has integrated LSI hardware and regular MPT modules present in RHEL4.X/5.X will recognize this hardware / so that installation is trivial (a non-issue)

However, to actually monitor the health of the raid array isn't instantly trivial, and while dell provided tools under linux (OMSA suite) do poll the integrated raid card for health; in some circumstances you may not want to install OMSA to be able to monitor internal raid health. (In my case, I didn't want to install OMSA to all 25 compute nodes in a compute cluster to be able to poll the health of the OS hardware raid mirrors)

Some digging with google located a 3rd party open source freeware, "mpt-status", which makes use of the mptctl kernel module, to generate a easily human readable report on the status of the internal raid.

This can subsequently be used as a trivial way to generate a basic script that can be called by CRON (say, every night) to notify in case of any non-optimal circumstances.

Get yourself a copy of the RPM (or compile from source), mpt-status: Available at the URL, http://www.drugphish.ch/~ratz/mpt-status/ or for the RPM specifically, http://www.drugphish.ch/~ratz/mpt-status/RPMS/1.2.0_RC7/mpt-status-1.2.0_RC7-3.i386.rpm
install the rpm to your system, "rpm --install mpt-status-1.2.0_RC7-3.i386.rpm"
try calling it to see what happens. If it complains you may need to "mknod" - follow the suggestion. If it complains that mltctl module is not loaded, then load the module.

See a capture below of an example of this sequence:

root@box nov-13-08-mptsas-status]# ls -la
total 204
drwxr-xr-x  3 root root   4096 Nov 13 08:40 .
drwxr-xr-x  9 root root   4096 Nov 13 09:08 ..
drwxr-xr-x  6  501  501   4096 Nov 13 08:37 mpt-status-1.2.0
-rw-r--r--  1 root root  27986 Jun 30  2006 mpt-status-1.2.0_RC7-3.i386.rpm
-rw-r--r--  1 root root 153600 Nov  5  2006 mpt-status-1.2.0.tar
-rw-r--r--  1 root root     82 Nov 13 08:40 README.txt
-rw-r--r--  1 root root     65 Nov 13 08:33 src-url

[root@box nov-13-08-mptsas-status]# rpm --install mpt-status-1.2.0_RC7-3.i386.rpm

[root@box nov-13-08-mptsas-status]# mpt-status
open /dev/mptctl: No such file or directory
  Try: mknod /dev/mptctl c 10 220

Make sure mptctl is loaded into the kernel

[root@box nov-13-08-mptsas-status]# mknod /dev/mptctl c 10 220

[root@box nov-13-08-mptsas-status]# mpt-status
open /dev/mptctl: No such device
  Are you sure your controller is supported by mptlinux?
Make sure mptctl is loaded into the kernel

[root@box nov-13-08-mptsas-status]# modprobe mptctl

[root@box nov-13-08-mptsas-status]# mpt-status
ioc0 vol_id 0 type IM, 2 phy, 465 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 9 ATA      ST3500320NS      MA07, 465 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 1 ATA      ST3500320NS      MA07, 465 GB, state ONLINE, flags NONE

now that it works, ensure that the module is loaded each time your system reboots by adding a line to /etc/rc.modules reading "modprobe mptctl"
- Note that the file /etc/rc.modules needs to have execute bit set, ie, chmod 700 for that file, in order for it to run-load at boot properly. (otherwise this modprobe does not actually happen when system is booted).
Provided below is a trivial script you could call by cron at regular intervals (nightly?) to poll for health of disks, and notify in case things are needing attention.


#!/bin/bash
#
# Small script called by cron nightly to poll for health
# of the local OS Mirror raid arrays, and notify if things are amiss.
#
# TDC Nov-13-08
#
################################################################
HOSTNAME=`hostname`
mpt-status > /tmp/head-node-raid-health-check-temporary-file
#
# First confirm we have all nodes reporting back some kind of status, else throw error.
#
# Note there are 1 nodes with 2 x ST3500 HDDs per unit, for 2 drives in total expected to report.
DRIVES=`cat /tmp/head-node-raid-health-check-temporary-file | grep -c ST3500`
#
# We also anticipate to have 1 counts of "OPTIMAL" returned, one per raid set / one per system
OPTIMAL=`cat /tmp/head-node-raid-health-check-temporary-file | grep -c OPTIMAL`
#
# Remove the temp file, so that it is not present for next time.
rm /tmp/head-node-raid-health-check-temporary-file
#
#
# Now, we do some logic to test that all is well in Denmark.
if [[ $OPTIMAL == "1" ]]
then EXIT_PAINLESSLY="true"
else echo "$HOSTNAME reports $DRIVES of 2 expected HDDs reporting on raid health, with $OPTIMAL of 1 raid sets reporting optimal health - PLEASE VERIFY IMMEDIATELY" | mail -s "Possible RAID Errors on $HOSTNAME" systems
fi

Fortech I.T. Solutions - SANDBOX

Menu

Actions

Search