mdadm: Replacing a faulty drive
First, which drive is failed will have to be determined, which can be read from the mdstat file assuming the server is no longer able to see the drive.
If the drive is still visible, but is displaying a lot of errors, the bad drive may be determined via SMART values
smartctl -a /dev/sdX
The bad drive will have to be removed from the array first before it is safe to remove the physical drive from the server and replace it.
In the example below the raid array is named /dev/md0, and /dev/sdb has failed, but is still visisble. /dev/md0 is a raid1 array made of /dev/sda1 and /dev/sdb1.
mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm --manage /dev/md0 --remove /dev/sdb1
This would have to be repeated for every array /dev/sdb is a part of.
Via the /proc/mdstat file you may verify the drive has been removed successfully from the array, as it should be noted as "[U_]" or similar. (U being up, _ being down)
It is now safe to have the disk replaced.
Once the disk has been replaced, it is time to create an identical partition table and add the new drive back into the array.
sfdisk -d /dev/sda | sfdisk /dev/sdb
mdadm --manage /dev/md0 --add /dev/sdb1
Again, this would have to be repeated for every raid partition that exists.
The array should start rebuilding, which can be determined by viewing the mdstat file once again.