Growing a mdadm RAID by replacing disks

Introduction

As it can be read in my related earlier post: Replacing a failed disk in a mdadm RAID I have a 4 disk RAID 5 setup which I initially populated with 1TB disk WD GREEN (cheap, but not really suited for NAS operation). After a few years I started fill up the file system, so I wanted to grow my RAID by upgrading the disks to WD RED 3TB disks. The WD RED disk are especially tailored to the NAS workload. The workflow of growing the mdadm RAID is done through the following steps:

  • Fail, remove and replace each of 1TB disk with a 3TB disk. After each disk I have to wait for the RAID to resync to the new disk.
  • I then have to grow the RAID to use all the space on each of the 3TB disks.
  • Finally, I have to grow the filesystem to use the available space on the RAID device.

The following is similar to my previous article Replacing a failed disk in a mdadm RAID, but I have included it hear for completness.

Removing the old drive

The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:

# hdparm -i /dev/sde | grep SerialNo
 Model=WDC WD10EARS-003BB1, FwRev=80.00A80, SerialNo=WD-WCAV5K430328

and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the disk I want to replace, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:

mdadm --manage /dev/md0 --fail /dev/sde1
mdadm --manage /dev/md0 --remove /dev/sde1

Adding the new drive

Having replaced the old disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:

# hdparm -i /dev/sde | grep SerialNo
Model=WDC WD30EFRX-68EUZN0, FwRev=80.00A80, SerialNo=WD-WMC4N1096166

Partitioning disk over 2TB does not work with MSDOS file table so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.

# parted -a optimal /dev/sde 
(parted) mklabel gpt
(parted) mkpart primary 2048s 100%
(parted) align-check optimal 1
1 aligned
(parted) set 1 raid on                                                    
(parted) print                                                                
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
 
Number  Start   End     Size    File system  Name     Flags
 1      1049kB  3001GB  3001GB               primary  raid
 
(parted) quit                                                             
Information: You may need to update /etc/fstab.

Now the disk was ready for inclusion in the raid:

mdadm --manage /dev/md0 --add /dev/sde1

Over the next 3 hours I could monitor the rebuild using the following command:

[root@kelvin ~][20:43]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[5] sdc1[1] sdb1[3] sdd1[4]
      2930280960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.5% (4893636/976760320) finish=176.9min speed=91536K/sec
      bitmap: 4/8 pages [16KB], 65536KB chunk
 
unused devices: <none>

Now this takes around 3 hours in my case per disk and it is very important to wait for the array to have rebuilt after each replacement. After having replaced all 4 disk and the RAID is resynced I can now continue.

Resize the array to the new maximal size

Now all the disks have been replaced with larger 3TB disk, but the raid device is not using the space yet. To instruct mdadm to use all the available space I issue the following commands:

mdadm --grow /dev/md0 --bitmap none
mdadm --grow /dev/md0 --size=max

Now this also takes quite a while to complete – several hours in my case. The RAID is still usable while this is happening.

Resize the filesystem

Finally I had to grow the filesystem to use the new available space on the array. My array is mounted under /home, so I have umount the filesystem first:

umount /home

To make sure everything is okay I force a check of the filesystem before the resizing:

fsck.ext4 -f /dev/md0

Finally I start the resizing of the file system – this is very quick as majority of the work is done later when the filesystem is mounted again by a process called ext4lazyinit. ext4lazyinit took almost a full day to complete:

resize2fs /dev/md0

Related posts

http://rainbow.chard.org/2013/01/30/how-to-align-partitions-for-best-performance-using-parted/
http://zackreed.me/articles/69-mdadm-replace-smaller-disks-with-larger-ones

Posted in Articles, Computer | Leave a comment

Rotating website backup using rsync over ssh

Introduction

Recently the hosting company for website started supporting SSH access. This meant I could ditch the unsecure FTP transfers and do everything though SFTP and rsync over ssh. Beside making editing of files much easier this also allowed me to implement a rolling/rotating backup of the website. While it can be argued that such backup would never be needed as the hosting company surely has a safe storage solution I have personally experienced the loss of data from a server breakdown at the hosting company.

The python script

Below I have written a python script to automate the backup and keep the last 12 weeks of changes in separate folders with hard links in between. This means for website like mine (with low amount of changes) that the backup does not take up much more than the size of the websize + size of changes (which are small). The script defaults to 12 copies backups and I run the script through cron every week on my home linux server. The script can also be run on the command line if needed with the syntax:

rsync-backup-websites.py user@host:/www/ /home/tjansson/backup/websites/host/

A cron line to run the script monthly on the first day of the month at 4:05 in the morning.

5 4 1 * * /home/tjansson/bin/rsync-backup-websites.py user@host:/www/ /home/tjansson/backup/websites/host/

On a final note it is assumed for this script to work through cron, that the ssh access is setup using keys and perhaps ssh-agent for passwordless access to the server.

#!/usr/bin/env python
import os
import argparse
import shutil
 
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='This script does rotating backup using rsync  ')
    parser.add_argument('source',         type=str,             help='The source. Example: user@host:/www/')
    parser.add_argument('backup_path',    type=str,             help='The backup path template. Example: /home/tjansson/backup/websites/host/')
    parser.add_argument('-c', '--copies', type=str, default=12, help='The maximum number of copies to save in the rotation. Default=12')
    parser.add_argument('-d', '--debug',  dest='debug', action='store_true', help='Turn on verbose debugging')
    args = parser.parse_args()
 
# Folder template
folder = '{}backup{}'.format(args.backup_path, '{}')
 
# Delete the oldest folder
folder_old = folder.format(args.copies)
if os.path.isdir(folder_old):
    if args.debug:
        print 'Removing the oldest folder: {}'.format(folder_old)
    shutil.rmtree(folder_old)
 
# Rotating backups
if args.debug:
    print 'Rotating backups'
for i in range(args.copies-1,-1,-1):
    folder_0 = folder.format(i)
    folder_1 = folder.format(i+1)
    if os.path.isdir(folder_0):
        if args.debug:
            print 'mv {} {}'.format(folder_0, folder_1)
        os.system('mv {} {}'.format(folder_0, folder_1))
 
#Execute the RSYNC
target = folder.format(0)
link   = folder.format(1)
if not os.path.isdir(target):
    os.mkdir(target)
if not os.path.isdir(link):
    cmd = 'rsync -ah --delete -e ssh {source} {target}'.format(link=link, source=args.source, target=target)
else:
    cmd = 'rsync -ah --delete -e ssh --link-dest="{link}" {source} {target}'.format(link=link, source=args.source, target=target)
 
if args.debug:
    print 'Rsyncing the latests changes'
    print cmd
os.system(cmd)
os.system('touch {}'.format(target))

Further reading and inspiration to this post

www.mikerubel.org/computers/rsync_snapshots/
en.wikipedia.org/wiki/Backup_rotation_scheme

Posted in Articles, Computer | Leave a comment

Migrating a wordpress blog – mysql charset problems and backup script

Introduction

The, now previous, hosting company of my wife’s blog had a major data corruption and completely lost a years worth of database entries and files. There was no communication before we found the problem ourselves, so we were very unhappy and decided to reconstruct the host on another site.

Luckily I had set up WordPress to send a complete database dump weekly as tar.gz balls, so no database entries were lost. All uploaded images and such was permanently lost, but reconstructing this is much easier than reconstructing posts and comments.

Charset problems moving the site to another webhotel

After creating a backup of the files left on the old host I made a local copy to my computer and another copy to the new webhotel. After the DNS changes had gone through and had imported the database dump on the new hosts the only thing left was to edit wp-config.php with the new database settings… or so I thought. It turned out that the all the tables of the database were in charset latin1_swedish_ci, but some of the posts contained utf8 characters as well. The result was the all Danish letters and many special characters in english looked garbled on the blog.

After searching the web for hours through variations over simple search and replace, which I did not find feasible, I finally found the holy grail – the ‘replace’ command as part of the mysql (now mariadb) project. The following command corrected all entries in the sql file from the mix of different charsets to a consistent utf8 output that rendered beautifully on the website:

replace "CHARSET=latin1" "CHARSET=utf8" "SET NAMES latin1" "SET NAMES utf8" < database.sql > database_uft8.sql

Continue reading

Posted in Articles, Computer | Leave a comment

Replacing a failed disk in a mdadm RAID

Introduction

I have a RAID5 with 4 disks, see Rebuilding and updating my Linux NAS and HTPC server, and from my daily digest emails of the system I discovered that one of my disk had issues. I found the following in dmesg:

[ 8347.726688] ata6.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x0
[ 8347.726694] ata6.00: irq_stat 0x40000008
[ 8347.726698] ata6.00: failed command: READ FPDMA QUEUED
[ 8347.726705] ata6.00: cmd 60/08:38:78:10:00/00:00:17:00:00/40 tag 7 ncq 4096 in
[ 8347.726705]          res 41/40:00:78:10:00/00:00:17:00:00/40 Emask 0x409 (media error) <f>
[ 8347.726709] ata6.00: status: { DRDY ERR }
[ 8347.726711] ata6.00: error: { UNC }
[ 8347.731152] ata6.00: configured for UDMA/133
[ 8347.731180] sd 5:0:0:0: [sde] Unhandled sense code
[ 8347.731183] sd 5:0:0:0: [sde]  
[ 8347.731185] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8347.731188] sd 5:0:0:0: [sde]  
[ 8347.731190] Sense Key : Medium Error [current] [descriptor]
[ 8347.731194] Descriptor sense data with sense descriptors (in hex):
[ 8347.731195]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[ 8347.731204]         17 00 10 78 
[ 8347.731208] sd 5:0:0:0: [sde]  
[ 8347.731211] Add. Sense: Unrecovered read error - auto reallocate failed
[ 8347.731214] sd 5:0:0:0: [sde] CDB: 
[ 8347.731216] Read(10): 28 00 17 00 10 78 00 00 08 00
[ 8347.731224] end_request: I/O error, dev sde, sector 385880184
[ 8347.731227] end_request: I/O error, dev sde, sector 385880184
[ 8347.731241] ata6: EH complete
[ 8348.531767] raid5_end_read_request: 2 callbacks suppressed
[ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1)
[ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1)
[ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1)
[ 8348.534560] md/raid:md0: read error corrected (8 sectors at 385878088 on sde1)
[ 8348.534562] md/raid:md0: read error corrected (8 sectors at 385878096 on sde1)
[ 8348.534563] md/raid:md0: read error corrected (8 sectors at 385878104 on sde1)
[ 8348.534564] md/raid:md0: read error corrected (8 sectors at 385878112 on sde1)
[20132.633534] md: md0: data-check done.
</f>

Continue reading

Posted in Articles, Computer | 7 Comments

Migrate from RAID 6 to RAID 5 with mdadm

Introduction

lian-li-qo08bI have been using a quite secure setup for the last couples of years with a 4 drive RAID 6 setup. This setup can tolerate two disk failures without any data loss. Recently though, I have been getting close to the edge of the filesystem and could use some extra space and since I have both monthly backup to an external hard drive and nightly offsite backup I am actually not very afraid of a data loss on a RAID 5 setup. So I have planned to change my 4 disk RAID 6 to a 4 disk RAID 5 without any spares.

A word of caution: Please do not do any of the actions below before a backup has been made.

Continue reading

Posted in Articles, Computer | 3 Comments