Assorted-Reference.OVH-sw-raid-recovery-linux-topics History
Hide minor edits - Show changes to markup
May 29, 2020, at 01:17 PM
by
- Added lines 10-11:
- note it is recommended to do 'soft reboots' from inside the host, and NOT by doing 'reboot' via the admin panel of OVH
- note also you will need to flip it back to boot from hard disk / if it was set into 'boot from net rescue mode'.
May 29, 2020, at 11:38 AM
by
- Added lines 23-230:
- Before starting anything much here is what we see. Logged in to root SSH rescue environment on the server.
- Old SDB Drive has disk layout this, hinted from cfdisk output capture:
Disk: /dev/sdb Size: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors Label: gpt, identifier: CA0C7679-A38B-498A-93D1-CFF2BFCD4171 Device Start End Sectors Size Type >> /dev/sdb1 40 2048 2009 1004.5K BIOS boot Free space 4096 4095 0 0B /dev/sdb2 4096 41945087 41940992 20G Linux RAID /dev/sdb3 41945088 44040191 2095104 1023M Linux swap /dev/sdb4 44040192 3907018751 3862978560 1.8T Linux RAID Free space 3907018752 3907029134 10383 5.1M and RAID HINT: current status of cat /proc/mdstat shows us: root@rescue:/etc/mdadm# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md2 : active raid1 sdb2[1] 20970432 blocks [2/1] [_U] md4 : active raid1 sdb4[1] 1931489216 blocks [2/1] [_U] bitmap: 0/15 pages [0KB], 65536KB chunk
- we are confident SDB is good old disk and SDB is new empty disk. can check with cfdisk /dev/sda to validate if you wish.
- Clone disk layout from SDB to SDA, then randomize ID for SDA. Thus:
sgdisk /dev/sdb -R /dev/sda sgdisk -G /dev/sda
- Once that is done. We can add in the root slice on SDA into raid and let it sync up. Thus:
root@rescue:/etc/mdadm# mdadm --manage /dev/md2 -a /dev/sda2 mdadm: added /dev/sda2 HAVE A LOOK TO CONFIRM SYNC IS UNDERWAY: root@rescue:/etc/mdadm# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md2 : active raid1 sda2[2] sdb2[1] 20970432 blocks [2/1] [_U] [>....................] recovery = 2.4% (519488/20970432) finish=1.9min speed=173162K/sec md4 : active raid1 sdb4[1] 1931489216 blocks [2/1] [_U] bitmap: 0/15 pages [0KB], 65536KB chunk unused devices: <none> root@rescue:/etc/mdadm#
- Let the sync finish, takes a few minutes
- OPTIONAL (?) STEP: MANUAL dd clone SDB1 onto SDA1, thus:
root@rescue:~# dd if=/dev/sdb1 of=/dev/sda1 2009+0 records in 2009+0 records out 1028608 bytes (1.0 MB) copied, 0.0316076 s, 32.5 MB/s root@rescue:~#
- Confirm raid sync is done
root@rescue:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md2 : active raid1 sda2[0] sdb2[1] 20970432 blocks [2/2] [UU] md4 : active raid1 sdb4[1] 1931489216 blocks [2/1] [_U] bitmap: 0/15 pages [0KB], 65536KB chunk unused devices: <none> root@rescue:~#
- make sure /MNT exists, proceed. Hints below
root@rescue:~# cd /mnt root@rescue:/mnt# ls -la total 4 drwxr-xr-x 2 root root 4096 May 12 2015 . drwxr-xr-x 37 root root 400 May 28 09:00 .. root@rescue:/mnt# cd SETUP AND ENTER THE CHROOT ENVIRONMENT: ======================================== root@rescue:~# mount /dev/md2 /mnt root@rescue:~# mount --rbind /dev /mnt/dev root@rescue:~# mount --rbind /proc /mnt/proc root@rescue:~# mount --rbind /sys /mnt/sys root@rescue:~# chroot /mnt bash root@rescue:/# df -h Filesystem Size Used Avail Use% Mounted on /dev/md2 20G 3.8G 15G 21% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 0 16G 0% /sys/fs/cgroup root@rescue:/# INSTALL GRUB ONTO SDA DRIVE =========================== root@rescue:/# grub-install /dev/sda Installing for i386-pc platform. Installation finished. No error reported. root@rescue:/#
- Ok we are done. Get out, unmount, reboot, happy days.
- Final step add in last raid chunk to let sync on data PVE slice raid get up to date. Hints:
add in last raid mirror piece: mdadm --manage /dev/md4 -a /dev/sda4 thus: root@ns506XXX:~# df -h Filesystem Size Used Avail Use% Mounted on udev 16G 0 16G 0% /dev tmpfs 3.2G 18M 3.2G 1% /run /dev/md2 20G 3.8G 15G 21% / tmpfs 16G 37M 16G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/pve-data 1.8T 45G 1.7T 3% /var/lib/vz /dev/sdc1 1.8T 396G 1.5T 22% /backups tmpfs 3.2G 0 3.2G 0% /run/user/0 /dev/fuse 30M 16K 30M 1% /etc/pve root@ns506XXX:~# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md4 : active raid1 sdb4[1] 1931489216 blocks [2/1] [_U] bitmap: 3/15 pages [12KB], 65536KB chunk md2 : active raid1 sdb2[1] sda2[0] 20970432 blocks [2/2] [UU] unused devices: <none> root@ns506XXX:~# mdadm --manage /dev/md4 -a /dev/sda4 mdadm: hot added /dev/sda4 root@ns506XXX:~# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md4 : active raid1 sda4[2] sdb4[1] 1931489216 blocks [2/1] [_U] [>....................] recovery = 0.0% (448768/1931489216) finish=143.4min speed=224384K/sec bitmap: 3/15 pages [12KB], 65536KB chunk md2 : active raid1 sdb2[1] sda2[0] 20970432 blocks [2/2] [UU] unused devices: <none> root@ns506XXX:~# ok status few minutes later. Make sure raid is grinding the build. Yes. Will take a few hours, ok. root@ns506XXX:~# root@ns506XXX:~# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md4 : active raid1 sda4[2] sdb4[1] 1931489216 blocks [2/1] [_U] [>....................] recovery = 0.8% (16803328/1931489216) finish=302.6min speed=105427K/sec bitmap: 3/15 pages [12KB], 65536KB chunk md2 : active raid1 sdb2[1] sda2[0] 20970432 blocks [2/2] [UU] unused devices: <none> root@ns506XXX:~#
May 29, 2020, at 11:29 AM
by
- Added lines 1-22:
OVH PROXMOX SOFTWARE RAID - RECOVERY HINTS
- Context piece. You have an OVH (So-you-start, Kimsufi, etc) rental server.
- Running proxmox linux, installed via template that OVH provided
- with at least 2 hard drives in a vanilla SW Raid config.
- stock raid setup which OVH template has provided
- one drive - in this case - SDA first drive - is removed and replaced because it is failing/bad
- endgame, after drive replace is done, system won't boot any longer. Sounds like grub wasn't installed to SDB by the template, or that the BIOS of the motherboard is not interested in trying to boot drive other than SDA? Can't really tell.
- Solution hints are below
Concise hints
- System is booted in rescue mode
- clone disk layout from good SDB to new SDA drive
- add root volume SDA slice into root MD raid device, let it sync up
- optional? did this as debug step, not sure if required. use DD to clone SDB1 onto SDA1. Tiny non-raid slice.
- once synced, setup chroot environment
- enter chroot environment and install grub
- reboot server. Happy days things boot up.
- start up raid sync for last large PVE data slice and let that grind for a few hours as background job. OK