CONTEXT

It is always a good idea to have some OS-level monitoring scripts that will check raid hardware health, and send annoying 'nag messages' to systems admin staff, in the event of unhappy things such as disk failure, raid health failure, etc. Better early than never; typically less work in the long run.

Fortunately there are lots of well documented tools and solutions out there to do this. I'm just summarizing some content here for my own reference / and possibly of use to others.

In this specific case,

  • Dell T300 server with Perc6i hardware raid controller
  • ProxmoxVE 1.6 / Debian Etch 64-bit based OS
  • using 3rd party binary repo for apps discussed here, as per directions from the URL: http://hwraid.le-vert.net/wiki/DebianPackages

hints from URL,

http://hwraid.le-vert.net/wiki/DebianPackages

we add entry thus:


---paste---
Please add deb http://hwraid.le-vert.net/distrib branch main to /etc/apt/sources.list to access all packages.

distrib can be either debian or ubuntu. 
branch can be etch, lenny, squeeze and sid for debian, or hardy, intrepid and jaunty for ubuntu.

In example, for current Debian stable release (Lenny): 
deb http://hwraid.le-vert.net/debian lenny main

Theses packages are available for amd64 and i386 arch
---endpaste---

Then, Install tools that will be of use to me:


  114  apt-get install megacli
...
  174  apt-get install megactl
...
  181  apt-get install megaraid-status

Some of the sort of output you can get from these tools:

proxmox:~# megaraidsas-status 
-- Arrays informations --
-- ID | Type | Size | Status
a0d0 | RAID 6 | 97GiB | optimal
a0d1 | RAID 6 | 3627GiB | optimal

-- Disks informations
-- ID | Model | Status | Warnings
a0e32s0 | ATA Hitachi HDS72202 1863GiB | online
a0e32s1 | ATA Hitachi HDS72202 1863GiB | online
a0e32s2 | ATA Hitachi HDS72202 1863GiB | online
a0e32s3 | ATA Hitachi HDS72202 1863GiB | online

proxmox:~# megasasctl
a0       PERC 6/i Adapter         encl:1 ldrv:2  batt:good
a0d0        97GiB RAID 6   1x4  optimal
a0d1      3627GiB RAID 6   1x4  optimal
a0e32s0    1863GiB  a0d0+ online  
a0e32s1    1863GiB  a0d0+ online  
a0e32s2    1863GiB  a0d0+ online  
a0e32s3    1863GiB  a0d0+ online  

proxmox:~# 

Note that the Megaraid CLI has absurd (nice, really) levels of detail of info available. For reference some of that is shown below.

For Reference: A nice CRIB SHEET FOR COMMANDS: http://tools.rapidsoft.de/perc/perc-cheat-sheet.html

"VIRTUAL DRIVE INFORMATION":

proxmox:/var/lib/vz/template/cache# megacli -LDInfo -Lall -aALL


Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (Target Id: 0)
Name:BOOT-VOL
RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
Size:97.656 GB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:4
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Virtual Disk: 1 (Target Id: 1)
Name:DATA-VOL
RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
Size:3.541 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:4
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Ongoing Progresses:
  Background Initialization: Completed 19%, Taken 172 min.
Encryption Type: None

Exit Code: 0x00
proxmox:/var/lib/vz/template/cache# 

ALSO INTERSTING: INFO FROM UNDERLYING DRIVES THEM SELVES:

proxmox:/var/lib/vz/template/cache# megacli -PDList -aALL      

Adapter #0

Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000000000000
Connected Port Number: 0(path0) 
Inquiry Data:       JK11A5YAKBNJPXHitachi HDS722020ALA330                 JKAOA3EA
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device

Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000001000000
Connected Port Number: 1(path0) 
Inquiry Data:       JK11A5YAKDB20XHitachi HDS722020ALA330                 JKAOA3EA
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device

Enclosure Device ID: 32
Slot Number: 2
Device Id: 2
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000002000000
Connected Port Number: 2(path0) 
Inquiry Data:       JK11A5YAKE38UXHitachi HDS722020ALA330                 JKAOA3EA
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device

Enclosure Device ID: 32
Slot Number: 3
Device Id: 3
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.818 TB [0xe8d00000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000003000000
Connected Port Number: 3(path0) 
Inquiry Data:       JK11A5YAKBSL4XHitachi HDS722020ALA330                 JKAOA3EA
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device


Exit Code: 0x00
proxmox:/var/lib/vz/template/cache#  

ALSO -- BBU STATUS:

proxmox:/var/lib/vz/template/cache# megacli -AdpBbuCmd -aALL

BBU status for Adapter: 0

BatteryType: BBU
Voltage: 4010 mV
Current: 0 mA
Temperature: 22 C

BBU Firmware Status:

  Charging Status              : None
  Voltage                      : OK
  Temperature                  : OK
  Learn Cycle Requested	       : No
  Learn Cycle Active           : No
  Learn Cycle Status           : OK
  Learn Cycle Timeout          : No
  I2c Errors Detected          : No
  Battery Pack Missing         : No
  Battery Replacement required : No
  Remaining Capacity Low       : No
  Periodic Learn Required      : No

Battery state: 

GasGuageStatus:
  Fully Discharged        : No
  Fully Charged           : Yes
  Discharging             : Yes
  Initialized             : Yes
  Remaining Time Alarm    : No
  Remaining Capacity Alarm: No
  Discharge Terminated    : No
  Over Temperature        : No
  Charging Terminated     : No
  Over Charged            : No

Relative State of Charge: 98 %
Charger Status: Complete
Remaining Capacity: 1229 mAh
Full Charge Capacity: 1250 mAh
isSOHGood: Yes

BBU Capacity Info for Adapter: 0

Relative State of Charge: 98 %
Absolute State of charge: 65 %
Remaining Capacity: 1229 mAh
Full Charge Capacity: 1250 mAh
Run time to empty: 65535 Min
Average time to empty: 65535 Min
Average Time to full: 65535 Min
Cycle Count: 22
Max Error: 0 %
Remaining Capacity Alarm: 190 mAh
Remaining Time Alarm: 10 Min


BBU Design Info for Adapter: 0

Date of Manufacture: 07/10, 2008
Design Capacity: 1900 mAh
Design Voltage: 3700 mV
Specification Info: 49
Serial Number: 1683
Pack Stat Configuration: 0xe4bc
Manufacture Name: SANYO
Device Name: DLU8735
Device Chemistry: LION
Battery FRU: N/A


BBU Properties for Adapter: 0

Auto Learn Period: 7776000 Sec
Next Learn time: 342974714 Sec 
Learn Delay Interval:0 Hours
Auto-Learn Mode: Enabled

Exit Code: 0x00
proxmox:/var/lib/vz/template/cache#