Replacing a failed disk in a mdadm RAID

December 8, 2013 Thomas Jansson 18 Comments

Introduction

I have a RAID5 with 4 disks, see Rebuilding and updating my Linux NAS and HTPC server, and from my daily digest emails of the system I discovered that one of my disk had issues. I found the following in dmesg:

[ 8347.726688] ata6.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x0
[ 8347.726694] ata6.00: irq_stat 0x40000008
[ 8347.726698] ata6.00: failed command: READ FPDMA QUEUED
[ 8347.726705] ata6.00: cmd 60/08:38:78:10:00/00:00:17:00:00/40 tag 7 ncq 4096 in
[ 8347.726705]          res 41/40:00:78:10:00/00:00:17:00:00/40 Emask 0x409 (media error) <F>
[ 8347.726709] ata6.00: status: { DRDY ERR }
[ 8347.726711] ata6.00: error: { UNC }
[ 8347.731152] ata6.00: configured for UDMA/133
[ 8347.731180] sd 5:0:0:0: [sde] Unhandled sense code
[ 8347.731183] sd 5:0:0:0: [sde]  
[ 8347.731185] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8347.731188] sd 5:0:0:0: [sde]  
[ 8347.731190] Sense Key : Medium Error [current] [descriptor]
[ 8347.731194] Descriptor sense data with sense descriptors (in hex):
[ 8347.731195]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[ 8347.731204]         17 00 10 78 
[ 8347.731208] sd 5:0:0:0: [sde]  
[ 8347.731211] Add. Sense: Unrecovered read error - auto reallocate failed
[ 8347.731214] sd 5:0:0:0: [sde] CDB: 
[ 8347.731216] Read(10): 28 00 17 00 10 78 00 00 08 00
[ 8347.731224] end_request: I/O error, dev sde, sector 385880184
[ 8347.731227] end_request: I/O error, dev sde, sector 385880184
[ 8347.731241] ata6: EH complete
[ 8348.531767] raid5_end_read_request: 2 callbacks suppressed
[ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1)
[ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1)
[ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1)
[ 8348.534560] md/raid:md0: read error corrected (8 sectors at 385878088 on sde1)
[ 8348.534562] md/raid:md0: read error corrected (8 sectors at 385878096 on sde1)
[ 8348.534563] md/raid:md0: read error corrected (8 sectors at 385878104 on sde1)
[ 8348.534564] md/raid:md0: read error corrected (8 sectors at 385878112 on sde1)
[20132.633534] md: md0: data-check done.

Investigating the bad drive

To further investigate the disk in question (/dev/sde) I looked into the S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) status of the sick drive:

# smartctl -i /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD10EARS-003BB1
Serial Number:    WD-WCAV5K430328
LU WWN Device Id: 5 0014ee 2afe6f748
Firmware Version: 80.00A80
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Dec  2 22:09:37 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

This didn’t really tell me anything, so I started a “long” self-test with the following command. The long self-test takes about 2 hours – alternatively there is a short, but less thorough self-test that takes around 2 minutes:

smartctl -t long /dev/sde

The output of a self-test can be found with the following command. In my case it was clear the the drive indeed was in trouble.

# smartctl -l selftest /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     23574         267040872

I ordered a 3TB WD RED disk (especially made for NAS operations) to replace it. It is much larger and initially I will not be able to utilize the 3TB, but once all the old 1TB disks eventually fails and I have replaced them all with 3TB disks, I can grow the raid.

Removing the faulty disk

A important part of a RAID setup is the ability to cope with the failure of a faulty disk. The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:

# hdparm -i /dev/sde | grep SerialNo
 Model=WDC WD10EARS-003BB1, FwRev=80.00A80, SerialNo=WD-WCAV5K430328

and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the faulty disk, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:

mdadm --manage /dev/md0 --fail /dev/sde1
mdadm --manage /dev/md0 --remove /dev/sde1

Adding the new drive

Having replaced the faulty disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:

# hdparm -i /dev/sde | grep SerialNo
Model=WDC WD30EFRX-68EUZN0, FwRev=80.00A80, SerialNo=WD-WMC4N1096166

Partitioning disk over 2TB does not work with MSDOS filetable so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.

# parted -a optimal /dev/sde 
(parted) mklabel gpt
(parted) mkpart primary 1 -1
(parted) set 1 raid on                                                    
(parted) print                                                                
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
 
Number  Start   End     Size    File system  Name     Flags
 1      1049kB  3001GB  3001GB               primary  raid
 
(parted) quit                                                             
Information: You may need to update /etc/fstab.

Now the disk was ready for inclusion in the raid:

mdadm --manage /dev/md0 --add /dev/sde1

Over the next 3 hours I could monitor the rebuild using the following command:

[root@kelvin ~][20:43]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[5] sdc1[1] sdb1[3] sdd1[4]
      2930280960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.5% (4893636/976760320) finish=176.9min speed=91536K/sec
      bitmap: 4/8 pages [16KB], 65536KB chunk
 
unused devices: <none>

Monitoring health of the raid

I have several systems in place to monitor the health of my raid (among other things):

logwatch – monitors my /var/log/messages for anything out of the ordinary and mails me the output on a daily basis.
mdadm – mdadm will mail me if a disk has completely failed or the raid for some other reason fails. A complete resync is done every week.
smartd – I have smartd running “short” tests every night and long tests every second week. Reports are mailed to me.
munin – graphical and historical monitoring of performance and all stats of the server.

Only registered users can comment.

Davide says:

December 9, 2013 at 18:55

Hi Thomas,

Thank you for your post. It seems you detected the failure due to the daily reports generated using the smartd tool. If you did not have smartd tests scheduled, how effectively do you think the weekly resync (mdadm) would have spot the failure (by degrading the raid)?
How necessary do you see to run “short” tests every night and even long tests every second week if mdadm resync is performed every week?

I am asking because I would not want to introduce too much stress on the disks. I use them mainly for storage and data are not accessed very often, so a test setup like yours would actually increase the disk load significantly in my opinion.

Reply
Thomas Jansson says:

December 11, 2013 at 20:11
Hi Davide

I actually discovered the that the disk was failing from the daily logwatch mail. I would also have seen it if I had been running smartd, but I only started this after the failure. From my own point of view I think the strain on the disk is quite limited for a short self-test. That said, even if it did strain the disk and theoretically reduced the lifetime I would still prefer to be informed and perhaps exchange disks more often.

mdadm was unaware of the problems that the disk had and I think it would only have caught the problem if the disk completely failed (being completely unresponsive) and ejected it from the raid. Smartd and logwatch would however catch these errors. The mdadm resync did find errors as well – as you can see from the /var/log/messages in the top of the article:
[ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1) [ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1) [ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1)
I hope this this can help you.

Kind regards
Thomas
Reply
Jim Conner says:

December 24, 2014 at 17:58

Thank you so much for posting this technique. I just replaced my own failed WD-Green with a 3TB WD-Red… and better yet.. the failed disk for me was also SDE – Cut ‘n’ Paste-tastic for me. Perfect. Cheers!!

Reply
Pingback: Hot Swap RAID Disk Replacement | Fishdude's Blog
Marko Turunen says:

April 6, 2015 at 10:11

Great blog post again! Some time ago I added 3TB drive to 4x2TB raid array and did not set partition table to GPT (almost 1TB wasted as unused space). I tried to think ways to claim this space into better use and found your excellent instruction how to do it.

1. I failed and removed 3TB drive from my raid array
2. Changed partition table to GPT and repartitioned with parted (one 2TB partition for raid and 1TB partition for other use)
3. Added new 2TB partition to and voi’la problem solved!

Reply
tomashook says:

December 19, 2015 at 06:11

Thank you, it worked like charm, I had this issue:
md0 : active raid5 sdg1[6] sdc1[0] sde1[4] sdf1[5] sdb1[1]
14650667520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UU_UUU]

I didn’t used the part “Removing the faulty disk” because I remove disk physically and sdd was not there anymore …

so I just connected new disk, that got the same alias ==>> /dev/sdd

I used command:
sudo parted /dev/sdd
(parted) mklabel gpt
(parted) mkpart pri 1 -1
(parted) quit
sudo mdadm –manage /dev/md0 –add /dev/sdd1
cat /proc/mdstat

🙂 Voila:
md0 : active raid5 sdd1[7] sdf1[5] sdc1[0] sdg1[6] sde1[4] sdb1[1]
14650667520 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UU_UUU]
[>………………..] recovery = 0.6% (18637836/2930133504) finish=2472.3min speed=19626K/sec

thx
Tomas

Reply
Pingback: Linux RAID Disk Replacement with Sans Digital 8-bay eSATA tower | hobo.house
Pingback: I’m positive technology conspires against you! – 2E0DFU – playing with amateur radio
PaulW says:

April 8, 2016 at 11:41

Thank you, your post was very helpful!!!!

Reply
1. Thomas Jansson says:
  
  September 4, 2016 at 16:14
  
  Thanks glad to hear! 🙂
  
  Reply
Pete says:

May 19, 2016 at 19:13

Great post! I needed some brushing up on my linux raid commands! I’m currently cruising to full raid recovery. The detail used when writing is peerless. Good job!

Reply
1. Thomas Jansson says:
  
  September 4, 2016 at 16:14
  
  Scary stuff, but go to hear that you could use my post. 🙂
  
  Reply
Andrew Morgan says:

September 2, 2016 at 22:18

Just used this to replace a faulty disk in my RAID too. Thanks for that. One thing that scared the pants off me was that after physically replacing the disk and formatting, the add command failed as the RAID had not restarted in degraded mode after the reboot. I had to do

mdadm –manage /dev/md0 –run

before I could add the new disk. Resync is underway now though.

Reply
1. Thomas Jansson says:
  
  September 4, 2016 at 16:15
  
  Glad to hear that is working, but yes there is definitely something scary about it even though how well tested md is.
  
  Reply
Stéphan says:

December 22, 2016 at 22:06

Thanks for sharing Thomas. Your explanations are very efficient.

Reply
sagar jha says:

June 28, 2017 at 11:30

Hello Thomas , I’m a student pursuing Geophysics from Kurukshetra University India.I got to learn you are a master in python.So I needed help.
Please contact me through email

Reply
Proteus Four says:

May 29, 2021 at 17:23

Thank’s a lot for your post. Still useful for me. You save my life (in the server of course)

Reply
Fonic says:

September 5, 2022 at 07:51

Thanks for the guide! I’d like to point out one mistake, if I may:

Investigating the bad drive

smartctl -i /dev/sde

[…]
This didn’t really tell me anything, […]

This didn’t tell you anything because the command line is insufficient. You need to run:

smartctl –info –health –attributes

to query the device’s health status and list all SMART attributes (which should reveal what’s wrong with the device).

If you like, you can link to my script ‘smart-status.sh’ that I created for this purpose a while ago (GitHub Gist):
https://gist.github.com/fonic/97c32695ea087a0215363f8b3b334d9c

Reply