CONTEXT
It is always a good idea to have some OS-level monitoring scripts that will check raid hardware health, and send annoying 'nag messages' to systems admin staff, in the event of unhappy things such as disk failure, raid health failure, etc. Better early than never; typically less work in the long run.
Fortunately there are lots of well documented tools and solutions out there to do this. I'm just summarizing some content here for my own reference / and possibly of use to others.
In this specific case,
- Dell T300 server with Perc6i hardware raid controller
- ProxmoxVE 1.6 / Debian Etch 64-bit based OS
- using 3rd party binary repo for apps discussed here, as per directions from the URL: http://hwraid.le-vert.net/wiki/DebianPackages
hints from URL,
http://hwraid.le-vert.net/wiki/DebianPackages
we add entry thus:
---paste--- Please add deb http://hwraid.le-vert.net/distrib branch main to /etc/apt/sources.list to access all packages. distrib can be either debian or ubuntu. branch can be etch, lenny, squeeze and sid for debian, or hardy, intrepid and jaunty for ubuntu. In example, for current Debian stable release (Lenny): deb http://hwraid.le-vert.net/debian lenny main Theses packages are available for amd64 and i386 arch ---endpaste---
Then, Install tools that will be of use to me:
114 apt-get install megacli ... 174 apt-get install megactl ... 181 apt-get install megaraid-status
Some of the sort of output you can get from these tools:
proxmox:~# megaraidsas-status -- Arrays informations -- -- ID | Type | Size | Status a0d0 | RAID 6 | 97GiB | optimal a0d1 | RAID 6 | 3627GiB | optimal -- Disks informations -- ID | Model | Status | Warnings a0e32s0 | ATA Hitachi HDS72202 1863GiB | online a0e32s1 | ATA Hitachi HDS72202 1863GiB | online a0e32s2 | ATA Hitachi HDS72202 1863GiB | online a0e32s3 | ATA Hitachi HDS72202 1863GiB | online proxmox:~# megasasctl a0 PERC 6/i Adapter encl:1 ldrv:2 batt:good a0d0 97GiB RAID 6 1x4 optimal a0d1 3627GiB RAID 6 1x4 optimal a0e32s0 1863GiB a0d0+ online a0e32s1 1863GiB a0d0+ online a0e32s2 1863GiB a0d0+ online a0e32s3 1863GiB a0d0+ online proxmox:~#
Note that the Megaraid CLI has absurd (nice, really) levels of detail of info available. For reference some of that is shown below.
For Reference: A nice CRIB SHEET FOR COMMANDS: http://tools.rapidsoft.de/perc/perc-cheat-sheet.html
"VIRTUAL DRIVE INFORMATION":
proxmox:/var/lib/vz/template/cache# megacli -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Information: Virtual Disk: 0 (Target Id: 0) Name:BOOT-VOL RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3 Size:97.656 GB State: Optimal Stripe Size: 64 KB Number Of Drives:4 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Encryption Type: None Virtual Disk: 1 (Target Id: 1) Name:DATA-VOL RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3 Size:3.541 TB State: Optimal Stripe Size: 64 KB Number Of Drives:4 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Ongoing Progresses: Background Initialization: Completed 19%, Taken 172 min. Encryption Type: None Exit Code: 0x00 proxmox:/var/lib/vz/template/cache#
ALSO INTERSTING: INFO FROM UNDERLYING DRIVES THEM SELVES:
proxmox:/var/lib/vz/template/cache# megacli -PDList -aALL Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Firmware state: Online SAS Address(0): 0x1221000000000000 Connected Port Number: 0(path0) Inquiry Data: JK11A5YAKBNJPXHitachi HDS722020ALA330 JKAOA3EA FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Firmware state: Online SAS Address(0): 0x1221000001000000 Connected Port Number: 1(path0) Inquiry Data: JK11A5YAKDB20XHitachi HDS722020ALA330 JKAOA3EA FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 2 Device Id: 2 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Firmware state: Online SAS Address(0): 0x1221000002000000 Connected Port Number: 2(path0) Inquiry Data: JK11A5YAKE38UXHitachi HDS722020ALA330 JKAOA3EA FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 3 Device Id: 3 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Firmware state: Online SAS Address(0): 0x1221000003000000 Connected Port Number: 3(path0) Inquiry Data: JK11A5YAKBSL4XHitachi HDS722020ALA330 JKAOA3EA FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Exit Code: 0x00 proxmox:/var/lib/vz/template/cache#
ALSO -- BBU STATUS:
proxmox:/var/lib/vz/template/cache# megacli -AdpBbuCmd -aALL BBU status for Adapter: 0 BatteryType: BBU Voltage: 4010 mV Current: 0 mA Temperature: 22 C BBU Firmware Status: Charging Status : None Voltage : OK Temperature : OK Learn Cycle Requested : No Learn Cycle Active : No Learn Cycle Status : OK Learn Cycle Timeout : No I2c Errors Detected : No Battery Pack Missing : No Battery Replacement required : No Remaining Capacity Low : No Periodic Learn Required : No Battery state: GasGuageStatus: Fully Discharged : No Fully Charged : Yes Discharging : Yes Initialized : Yes Remaining Time Alarm : No Remaining Capacity Alarm: No Discharge Terminated : No Over Temperature : No Charging Terminated : No Over Charged : No Relative State of Charge: 98 % Charger Status: Complete Remaining Capacity: 1229 mAh Full Charge Capacity: 1250 mAh isSOHGood: Yes BBU Capacity Info for Adapter: 0 Relative State of Charge: 98 % Absolute State of charge: 65 % Remaining Capacity: 1229 mAh Full Charge Capacity: 1250 mAh Run time to empty: 65535 Min Average time to empty: 65535 Min Average Time to full: 65535 Min Cycle Count: 22 Max Error: 0 % Remaining Capacity Alarm: 190 mAh Remaining Time Alarm: 10 Min BBU Design Info for Adapter: 0 Date of Manufacture: 07/10, 2008 Design Capacity: 1900 mAh Design Voltage: 3700 mV Specification Info: 49 Serial Number: 1683 Pack Stat Configuration: 0xe4bc Manufacture Name: SANYO Device Name: DLU8735 Device Chemistry: LION Battery FRU: N/A BBU Properties for Adapter: 0 Auto Learn Period: 7776000 Sec Next Learn time: 342974714 Sec Learn Delay Interval:0 Hours Auto-Learn Mode: Enabled Exit Code: 0x00 proxmox:/var/lib/vz/template/cache#