Replacing a failed disk in a mdadm RAID

Introduction

I have a RAID5 with 4 disks, see Rebuilding and updating my Linux NAS and HTPC server, and from my daily digest emails of the system I discovered that one of my disk had issues. I found the following in dmesg:

[ 8347.726688] ata6.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x0
[ 8347.726694] ata6.00: irq_stat 0x40000008
[ 8347.726698] ata6.00: failed command: READ FPDMA QUEUED
[ 8347.726705] ata6.00: cmd 60/08:38:78:10:00/00:00:17:00:00/40 tag 7 ncq 4096 in
[ 8347.726705]          res 41/40:00:78:10:00/00:00:17:00:00/40 Emask 0x409 (media error) <F>
[ 8347.726709] ata6.00: status: { DRDY ERR }
[ 8347.726711] ata6.00: error: { UNC }
[ 8347.731152] ata6.00: configured for UDMA/133
[ 8347.731180] sd 5:0:0:0: [sde] Unhandled sense code
[ 8347.731183] sd 5:0:0:0: [sde]  
[ 8347.731185] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8347.731188] sd 5:0:0:0: [sde]  
[ 8347.731190] Sense Key : Medium Error [current] [descriptor]
[ 8347.731194] Descriptor sense data with sense descriptors (in hex):
[ 8347.731195]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[ 8347.731204]         17 00 10 78 
[ 8347.731208] sd 5:0:0:0: [sde]  
[ 8347.731211] Add. Sense: Unrecovered read error - auto reallocate failed
[ 8347.731214] sd 5:0:0:0: [sde] CDB: 
[ 8347.731216] Read(10): 28 00 17 00 10 78 00 00 08 00
[ 8347.731224] end_request: I/O error, dev sde, sector 385880184
[ 8347.731227] end_request: I/O error, dev sde, sector 385880184
[ 8347.731241] ata6: EH complete
[ 8348.531767] raid5_end_read_request: 2 callbacks suppressed
[ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1)
[ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1)
[ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1)
[ 8348.534560] md/raid:md0: read error corrected (8 sectors at 385878088 on sde1)
[ 8348.534562] md/raid:md0: read error corrected (8 sectors at 385878096 on sde1)
[ 8348.534563] md/raid:md0: read error corrected (8 sectors at 385878104 on sde1)
[ 8348.534564] md/raid:md0: read error corrected (8 sectors at 385878112 on sde1)
[20132.633534] md: md0: data-check done.

Investigating the bad drive

To further investigate the disk in question (/dev/sde) I looked into the S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) status of the sick drive:

# smartctl -i /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD10EARS-003BB1
Serial Number:    WD-WCAV5K430328
LU WWN Device Id: 5 0014ee 2afe6f748
Firmware Version: 80.00A80
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Dec  2 22:09:37 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

This didn’t really tell me anything, so I started a “long” self-test with the following command. The long self-test takes about 2 hours – alternatively there is a short, but less thorough self-test that takes around 2 minutes:

smartctl -t long /dev/sde

The output of a self-test can be found with the following command. In my case it was clear the the drive indeed was in trouble.

# smartctl -l selftest /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     23574         267040872

I ordered a 3TB WD RED disk (especially made for NAS operations) to replace it. It is much larger and initially I will not be able to utilize the 3TB, but once all the old 1TB disks eventually fails and I have replaced them all with 3TB disks, I can grow the raid.

Removing the faulty disk

A important part of a RAID setup is the ability to cope with the failure of a faulty disk. The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:

# hdparm -i /dev/sde | grep SerialNo
 Model=WDC WD10EARS-003BB1, FwRev=80.00A80, SerialNo=WD-WCAV5K430328

and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the faulty disk, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:

mdadm --manage /dev/md0 --fail /dev/sde1
mdadm --manage /dev/md0 --remove /dev/sde1

Adding the new drive

Having replaced the faulty disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:

# hdparm -i /dev/sde | grep SerialNo
Model=WDC WD30EFRX-68EUZN0, FwRev=80.00A80, SerialNo=WD-WMC4N1096166

Partitioning disk over 2TB does not work with MSDOS filetable so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.

# parted -s -a optimal /dev/sde 
(parted) mklabel gpt
(parted) mkpart primary 1 -1
(parted) set 1 raid on                                                    
(parted) print                                                                
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
 
Number  Start   End     Size    File system  Name     Flags
 1      1049kB  3001GB  3001GB               primary  raid
 
(parted) quit                                                             
Information: You may need to update /etc/fstab.

Now the disk was ready for inclusion in the raid:

mdadm --manage /dev/md0 --add /dev/sde1

Over the next 3 hours I could monitor the rebuild using the following command:

[root@kelvin ~][20:43]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[5] sdc1[1] sdb1[3] sdd1[4]
      2930280960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.5% (4893636/976760320) finish=176.9min speed=91536K/sec
      bitmap: 4/8 pages [16KB], 65536KB chunk
 
unused devices: <none>

Monitoring health of the raid

I have several systems in place to monitor the health of my raid (among other things):

  • logwatch – monitors my /var/log/messages for anything out of the ordinary and mails me the output on a daily basis.
  • mdadm – mdadm will mail me if a disk has completely failed or the raid for some other reason fails. A complete resync is done every week.
  • smartd – I have smartd running “short” tests every night and long tests every second week. Reports are mailed to me.
  • munin – graphical and historical monitoring of performance and all stats of the server.
This entry was posted in Articles, Computer. Bookmark the permalink.

2 Responses to Replacing a failed disk in a mdadm RAID

  1. Davide says:

    Hi Thomas,

    Thank you for your post. It seems you detected the failure due to the daily reports generated using the smartd tool. If you did not have smartd tests scheduled, how effectively do you think the weekly resync (mdadm) would have spot the failure (by degrading the raid)?
    How necessary do you see to run “short” tests every night and even long tests every second week if mdadm resync is performed every week?

    I am asking because I would not want to introduce too much stress on the disks. I use them mainly for storage and data are not accessed very often, so a test setup like yours would actually increase the disk load significantly in my opinion.

  2. Hi Davide

    I actually discovered the that the disk was failing from the daily logwatch mail. I would also have seen it if I had been running smartd, but I only started this after the failure. From my own point of view I think the strain on the disk is quite limited for a short self-test. That said, even if it did strain the disk and theoretically reduced the lifetime I would still prefer to be informed and perhaps exchange disks more often.

    mdadm was unaware of the problems that the disk had and I think it would only have caught the problem if the disk completely failed (being completely unresponsive) and ejected it from the raid. Smartd and logwatch would however catch these errors. The mdadm resync did find errors as well – as you can see from the /var/log/messages in the top of the article:

    [ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1)
    [ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1)
    [ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1)

    I hope this this can help you.

    Kind regards
    Thomas

Leave a Reply