mdadm: Replacing a faulty drive

First, which drive is failed will have to be determined, which can be read from the mdstat file assuming the server is no longer able to see the drive.

cat /proc/mdstat

If the drive is still visible, but is displaying a lot of errors, the bad drive may be determined via SMART values

smartctl -a /dev/sdX

The bad drive will have to be removed from the array first before it is safe to remove the physical drive from the server and replace it.
In the example below the raid array is named /dev/md0, and /dev/sdb has failed, but is still visisble. /dev/md0 is a raid1 array made of /dev/sda1 and /dev/sdb1.

mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm --manage /dev/md0 --remove /dev/sdb1

This would have to be repeated for every array /dev/sdb is a part of.

Via the /proc/mdstat file you may verify the drive has been removed successfully from the array, as it should be noted as "[U_]" or similar. (U being up, _ being down)

cat /proc/mdstat

It is now safe to have the disk replaced.

Once the disk has been replaced, it is time to create an identical partition table and add the new drive back into the array.

sfdisk -d /dev/sda | sfdisk /dev/sdb
mdadm --manage /dev/md0 --add /dev/sdb1

Again, this would have to be repeated for every raid partition that exists.
The array should start rebuilding, which can be determined by viewing the mdstat file once again.

cat /proc/mdstat
No Comments
Back to top