Building a powerful, cheap and silent Linux NAS and HTPC server

Introduction

I am currently using an old IBM Thinkpad T42 as a server and NAS for my desktop and laptop, sharing files through CIFS/SAMBA and NFS. I was running out of storage space on the server, which only had a single 250GB 2.5” disk and I considered either to buy a cheap NAS box as the ReadyNAS Duo and have two disks in raid 1 (mirroring) or build a new server from scratch with room for several disks in raid 5.

Updates to the article

I have updated this post several times since I posted it and it now contains the following new sections:

  1. Data scrubing
  2. Monitoring the server through munin
  3. A HTPC media server
  4. The server was killed by thunder and has now been rebuild and upgraded, see:
    Rebuilding and updating my Linux NAS and HTPC server

Underpowered off-the-shelf NAS boxes

When I researched the ReadyNAS duo (1000DKK = 175USD) I found the performance to be less than what I wanted:

  • trustedreviews.comaverage read and write speeds of 24.6MB/sec and 17.3MB/sec.
  • smallnetbuilder.com — write speed: around 15mb/s, read speed: 35MB/sec for files up to 128MB and less than 10MB/sec for larger files!.

Since I do image and video processing on large images on my desktop over NFS I need the speed to be better than this. So I started to look into more expensive NAS enclosures, but even though I think I would have been perfectly happy with a QNAP TS-459 Pro+ Turbo NAS enclosure performance wise, it is silly expensive (6000DKK = 1050 USD) without disks. After all this box is nothing but underpowered, single-purposed Linux servers with a webgui, surely I can do this better my self. Building my own server would also give me the power to run a HLDS server, saturate my gigabit network, running the LAMP stack, a squeezebox service and such.

Finding the right case

I started searching the web for a enclosure that was small (as in mini-itx), had a good build quality, supported more than two internal 3.5” sata drives (preferably hot-swappable) and finally somewhat silent cooling wise.

There isn’t many of those around, so I thought I had hit jackpot when I found Chenbro ES34069, which is quite ideal if it wasn’t because the number of compatible mainboards listed on the page is quite sparse and the few they listed was not available in Denmark. I contacted them to ask if some specific mainboards would fit and if they would update the page to cover more recent mini-itx boards, but after it took them 3 weeks to answer and at that time I have already moved on. For those interested their answer after three weeks was that the Intel DG45FC and DH57JG would fit their case.

Still searching the web I found a silentpcreview article which recommended a build based on a Lian Li PC-Q08B enclosure. This article is the inspiration for my build, modified to the availability in Denmark.

  • Lian Li PC-Q08B (739DKK = 129USD) – This Mini-itx enclosure has 6 internal 3.5” bays and is rated as very quite in several reviews. I can now add it is only really quite if you under voltage the fans a bit, but once this is done is very quiet.
  • Zotac G43-ITX (709DKK = 124USD) – This is a socket 775 mini-itx motherboard with 5 sata channels, e-sata and HDMI output. 6 sata channels would have been better, but 5 is good enough. The HDMI port makes for possible role as media server later on.
  • Core 2 Duo E7500 3 MB (960DKK = 170USD) – This was the cheapest Core 2 Duo socket 775 processor i could find. Since I had plans for HLDS server and perhaps even virtual machines I did not want a unpowered celeron.
  • CoolerMaster Silent Pro M500 (600DKK = 109USD) – This is a bit expensive power supply, but it should be silent according to online reviews.
  • Corsair 4GB DDR2 XMS2 PC6400 800MHZ (540DKK = 94USD) – I could do with 2GB, but the motherboard only has 2 slots, so I would rather upgrade now than later.
  • 4 x Green WD10EARS (450DKK = 79USD per disk) – I wanted to make a raid 5, with one spare leaving me 2TB usable space. These disks should not use to much power and be reasonable quiet.
  • One 2.5” hard drive for the OS – I had an old 40GB SATA model I could use, but if I haven’t had this I would probably by a cheap 40GB SSD – just to avoid bottlenecking the server now when all the other components are so powerful.
  • Scythe Shuriken Rev.B SCSK-1100 – the boxed intel cooler wasn’t too bad, but I wanted something dead silent.

The total price including the disks was 5700DKK (970USD) which is the same price as the QNAP TS-459 Pro+ Turbo NAS without any disks.

Choosing a specialized or normal NAS Linux distribution

I am quite experienced with Linux and am using CentOS or Debian for all my server needs. I do this even though I am aware of FreeNAS and Openfiler, but I prefer the flexibility of a distro which is not single purposed. I won’t to go details on how to install Linux machine, but I will say I used UNetbootin to boot the machine on a Debian squeeze netinstall image.

Setting up software raid on 4K sectors disks

First I did not think much about the fact that my new disk had 4K sectors (also called Advanced Format), but when I started to create my raid I quickly realized that I had done something wrong. My raid was building at 35MB/s which wasn’t too impressive. After having surfed google for a while I found that had partitioned my disks wrongly such that blocks were split over two physical sectors on the hard drive. I had used fdisk to partition my drives, but the default is apparently to partition in DOS-compatible mode, where the first sector is sector 63. Invoking fdisk with the options “-cu” will make fdisk default to non DOS-compatible mode and start a partition at sector 2048. After having formated using “fdisk -cu /dev/sda” I ended up with this:

root@kelvin:~# fdisk -cul /dev/sda
 
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
81 heads, 63 sectors/track, 382818 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000d5bbd
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  1953525167   976761560   fd  Linux raid autodetect

With the new partitioning my rebuild speed rose to 75mb/s. Quite a amazing difference. :)

To create the raid after having partitioned the four disks (and set their type to “fd”) I used the following command

mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 --spare-devices=1 /dev/sd{a,b,c,d}1

I could then follow the initial build process one of these commands

root@kelvin:~# mdadm --detail /dev/md0 
/dev/md0:
        Version : 1.2
  Creation Time : Tue Feb  1 23:24:46 2011
     Raid Level : raid5
     Array Size : 1953520640 (1863.02 GiB 2000.41 GB)
  Used Dev Size : 976760320 (931.51 GiB 1000.20 GB)
   Raid Devices : 3
  Total Devices : 4
    Persistence : Superblock is persistent
 
    Update Time : Tue Feb  1 23:34:14 2011
          State : clean, degraded, recovering
 Active Devices : 2
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 2
 
         Layout : left-symmetric
     Chunk Size : 512K
 
 Rebuild Status : 4% complete
 
           Name : kelvin:0  (local to host kelvin)
           UUID : 3f335d48:4b43c1d5:6ee5c975:1428eb56
         Events : 2
 
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       4       8       33        2      spare rebuilding   /dev/sdc1
 
       3       8       49        -      spare   /dev/sdd1
root@kelvin:~# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sdc1[4] sdd1[3](S) sdb1[1] sda1[0]
      1953520640 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [>....................]  recovery =  4.3% (42161792/976760320) finish=215.1min speed=72409K/sec

After the 3-4 hours it took to initialize the raid I needed to create a file system on the raid. Again I need to align the filesystem with the disks, so I used the http://busybox.net/~aldot/mkfs_stride.html to calculate the best settings for my filesystem and the result was:

mkfs.ext4 -b 4096 -E stride=128,stripe-width=256 /dev/md0

Recovering from a faulty disk

A important part of a RAID setup is the ability to cope with the failure of a faulty disk. Since I am using RAID5 + 1 spare I can afford one disk to die and the raid setup will start to rebuild on the spare disk giving me the time to order a new disk online. Since the enclosure I have chosen does not support hot-swap and the disk have no separate lights for each disk I need a way to find out which of the disks to replace. In the case that a disks has failed I would receive a mail from the mdadm daemon and by issuing the following command I could find out which disk has failed:

mdadm --detail /dev/md0

Assuming that /dev/sda has failed I can get the serial number of the disk using hdparm:

hdparm -i /dev/sda | grep SerialNo

and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. Since I can’t do hot-swap I will then wait till the disks has finished rebuilding on the spare disk, power the server down, replace the faulty disk, power the server up again, partition the disks as above and finally add the new disk to the array with:

mdadm --add /dev/md0 /dev/sda1

Data scrubing

To ensure that a rebuild will run smoothly without any data reading errors it is commonly recommended to do a data scrub/check daily or weekly. To initiate a data scrub of the raid use the following command:

[root@kelvin ~]#  echo check >> /sys/block/md0/md/sync_action

The process can be monitored in the virtual file /proc/mdstat

[root@kelvin ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sda1[0] sdd1[3](S) sdc1[4] sdb1[1]
      1953520640 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      [>....................]  check =  2.6% (26135552/976760320) finish=203.2min speed=77948K/sec
 
unused devices: <none>

I have decided that a weekly data scrub (at 3AM on Sundays) is sufficient and have added this to my crontab:

0 3     * * Sun echo check >> /sys/block/md0/md/sync_action

Power consumption

I have a powermeter and measured the wattage of the machine when it was running:
Idling: 50W
Raid disks spun down: 40W
Raid disk fully utilized: 60W

Which isn’t too bad compared to the much less powerful “QNAP TS-459 Pro”
Idling: 45.1 W
Rebuildign: 52.2 W
see http://www.tomshardware.com/reviews/nas-intel-atom-raid,2824-4.html

I have set the disks to spin down after 10 minutes of in activity by editing “/etc/hdparm.conf” with a section as the following for each disk:

/dev/sda {
	spindown_time = 120
}

Benchmarking the system

To make a realistic performance test comparable to others found online I shared the raid partition through CIFS/SAMBA and tested the performance using http://www.808.dk/?nastester:

NAS performance tester 1.0 http://www.808.dk/?nastester
Running warmup...
Running a 400MB file write on drive X: 5 times...
Iteration 1:     97,49 MB/sec
Iteration 2:     107,28 MB/sec
Iteration 3:     109,11 MB/sec
Iteration 4:     110,05 MB/sec
Iteration 5:     117,08 MB/sec
------------------------------
Average (W):     108,2 MB/sec
------------------------------
Running a 400MB file read on drive X: 5 times...
Iteration 1:     116,55 MB/sec
Iteration 2:     112,93 MB/sec
Iteration 3:     108,19 MB/sec
Iteration 4:     117,08 MB/sec
Iteration 5:     111,97 MB/sec
------------------------------
Average (R):     113,34 MB/sec
------------------------------
root@kelvin:~# hdparm -Tt /dev/md0
 
/dev/md0:
 Timing cached reads:   4284 MB in  2.00 seconds = 2142.08 MB/sec
 Timing buffered disk reads: 586 MB in  3.00 seconds = 195.15 MB/sec

Monitoring the server through munin

The mdadm tools provide plenty of monitoring options for the raid.
The quick overview:

[root@kelvin ~] # cat /proc/mdstat

The more detailed overview:

[root@kelvin ~]# mdadm --detail /dev/md0

The nitty-gritty details:

[root@kelvin ~][22:23]# ls /sys/devices/virtual/block/md0/md/
array_size       component_size  dev-sdc1  metadata_version          raid_disks  reshape_position     stripe_cache_size  sync_completed       sync_speed
array_state      degraded        dev-sdd1  mismatch_cnt              rd0         resync_start         suspend_hi         sync_force_parallel  sync_speed_max
bitmap_set_bits  dev-sda1        layout    new_dev                   rd1         safe_mode_delay      suspend_lo         sync_max             sync_speed_min
chunk_size       dev-sdb1        level     preread_bypass_threshold  rd2         stripe_cache_active  sync_action        sync_min

Using the sync_speed and sync_completed virtual files i created a small munin script to monitor the progress:
Once I have tweaked this a bit I might upload it to http://exchange.munin-monitoring.org/

A HTPC media server

The Zotac motherboard has a HDMI port and the HDMI output was immediately recognized on my Sony KDL-40W5500 LCD television. So now I have placed my server underneath my TV with a HDMI cable to the TV and a audio cable to my stereo.

The great thing about this build compared to a atom or arm based build or even a stand alone QNAP NAS is that I have more than plenty of performance to watch youtube videos, run grooveshark or watch HD video recorded on my digital camera. In addition I can now browse the web at get the full experience. Since this build has a dual core Intel E7500@2.93GHz, 4GB DDR3 ram and a onboard Intel GMA X4500 graphics processor the setup laughs at flash heavy pages and does everything and normal computer would. I am very happy. :)

Conclusion

I am very happy with my build. It is almost completely silent, quite powerful and cheaper than buying something comparable from the shelf.

Sources

http://www.ducea.com/2009/03/08/mdadm-cheat-sheet/
http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html
http://www.ibm.com/developerworks/linux/library/l-4kb-sector-disks/index.html?ca=dgr-lnxw074KB-Disksdth-LX
http://en.gentoo-wiki.com/wiki/RAID/Software#Data_Scrubbing

This entry was posted in Articles, Computer. Bookmark the permalink.

53 Responses to Building a powerful, cheap and silent Linux NAS and HTPC server

  1. Peter says:

    Sådan Tjansson – Jeg skammer mig næsten over min købe-NAS :-)

  2. Hehe – til gengæld slipper du lidt du får det blå lys af den 14cm køler der sidder foran. Det er lidt et bøvet look, men det fungere fint og stille efter den blev lidt lav spænding. Jeg kan ikke helt få mig til at skifte den bare på grund af det. :)

  3. Kåre says:

    Det ser ud til at være et rigtig lækkert setup!

  4. Jens Munk Hansen says:

    Sådan, Tjansson. Skal have fornyet min snart 5 år gamle NAS og ville gerne noget som kunne real-time transcode, hvilket nok er lidt urealistisk. Tror at min plan bliver et lidt hjemmelavet setup.

  5. Marty says:

    Nice article, VERY usefull

  6. Jens: Kan kun anbefale det. Jeg får i hvert fald meget mere ydelse og features end ved en færdiglavet NAS med en atom processor. :)

  7. augusto says:

    Thanks a lot for sharing your experience!
    I wish I’ll be able to achieve that for my home/biz lan

  8. Chris says:

    Hi

    Thanks a lot for the article. I wonder about one thing though: Where did you install the OS to?

  9. Sorry – I forgot to write that. I installed an old 2.5” 40GB hard disk and put the OS there. I have updated the article to reflect this now.

  10. Greg says:

    Interesting article, thanks, this is something that I want to do and I have been testing FreeNAS 0.7 and 8.x and so far I like the 0.7 version because of the RAID ability (easy to set up) but most importantly, disk encryption. I would not want my NAS server stolen with all my data unencrypted. Since FreeNAS 8 currently doesn’t support encryption, I have been looking at alternatives to FreeNAS 8 and was thinking of going the Linux route, your article is interesting and informative. I might test your steps but add in disk encryption, will have to see if there is a method of mounting the encrypted volumes via a GUI, or I could probably set up SSH and use that to mount.

  11. Pingback: Absence of updates.. « Game Programming & Design

  12. Mattias says:

    This is cool. I stumbled onto your site via the sshfs/automount article. Then I saw this article about a NAS/HTPC and thought “Hey, that’s what I’m building”. And then you’re using the exact same q08 chassis I’m using! Really reassuring to see you completed the build and it was quiet:) By the way, what are you using to undervolt your fans? My MB has one controlled fan power output, but I guess I’ll need to get something for the exaust fan as well…
    For the insides I’ve got a core i5 2500k, 8gb of ddr3 1600mhz and an ASUS P8H67-I motherboard. The board has a SATA 6gb/s connector so I’m going to “have to” get a fast ssd to make use of that later on, but I’ll start of with an old harddrive. I’m still waiting for my PSU, so I haven’t started by build yet. I also can’t afford much storage right now on my students budget, but in a couple of months I’ll be checking back in here to get the details of your RAID setup. It looked sweet.

    Keep up the good work!
    Hälsningar från Sverige

  13. Sorry Mattias, your post got lost in my todo list. For undervolting the fans I used the classic Zalman Fanmate 2.

    As you can read in my followup article: Rebuilding and updating my Linux NAS and HTPC server I had to rebuild my NAS again since it was hurt by lightning. The funny thing is that I now also decided on a 2500K – which works out great with Debian testing and kernel 3.0.1. XBMC is awesome on this setup. :)

  14. Dave says:

    Excellent article – advice, your benchmarks are quite impressive!!

    Thanks for sharing.
    Dave

  15. alfonso says:

    Hi Thomas,

    quick question: did the thunder kill your hard drives?

    less quick question: what is your backup strategy?

    thank you,

  16. Hi Alfonso

    Luckily the thunder only killed off the mainboard, so I could use the hard disks again.

    My backup strategy consists of 2 measures:
    1) Nightly backup via rsync of everything to like-minded Linux oriented friend’s server.
    2) Monthly rsync-based backup to external hard drive – I am using rsnapshot for this. This way I have 6 archives, but since it uses hard links it doesn’t take up more place than the diff of the files.

    Kind regards
    Thomas

  17. Dave: Thanks – I was impressed my self that it could so easily saturate a gigabit network. :)

  18. alfonso says:

    Hi Thomas,
    thank you for the answer and THANKS SO MUCH for all the info! your blog it’s a goldmine!

    reverse engineering http://busybox.net/~aldot/mkfs_stride.html it seems to me you chose chunk size of 512K (RAID = 6, physical disks = 4, number of filesystem blocks = 4).
    I found in this blog http://louwrentius.com/blog/2010/05/linux-raid-level-and-chunk-size-the-benchmarks/ that 64K might be the best choice for RAID 6 with 4 disks. I’ll try and report.
    Also, do you use a bitmap? It seems it really helps when rebuilding an array:
    http://louwrentius.com/blog/2011/12/speeding-up-linux-mdadm-raid-array-rebuild-time-using-bitmaps/

  19. Hi Alfonso

    I am glad that you like and that you contribute to it as well with our conversation. :) At first my setup was 3 disk RAID-5 with one spare and only later I grew it to a 4 disk RAID-6, so probably I would have chosen differently now, but also using the http://busybox.net/~aldot/mkfs_stride.html . It is a interesting read, but reformatting again seems to much hassle since I already am able to saturate my gigabit network as it is which is my measure of success at the moment. I was to redo the build again – 64K seems as the right choice, but it is not really bad with my 512K chunks.

    Regarding bitmaps I remember I investigated it, but I think I forgot about it, so I have enabled it now. I also remember that I tuned my speed_limit_min, speed_limit_max and stripe_cache_size to get better performance at some point:

    [root@kelvin ~][00:29]# cat /proc/sys/dev/raid/speed_limit_min
    1000
    [root@kelvin ~][00:29]# cat /proc/sys/dev/raid/speed_limit_max
    200000
    [root@kelvin ~][00:29]# cat /sys/block/md0/md/stripe_cache_size
    8192

    See
    http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html
    http://h3x.no/2011/07/09/tuning-ubuntu-mdadm-raid56

  20. alfonso says:

    Since I realized I didn’t correctly aligned my RAID (4K issue) and I still have my external USB drives with all my data I decided to try, but boy I don’t know how to make that script work… :(
    so I guess I’ll settle for the conclusion of 64k. But I really would like to run the test… I’ll try a bit longer!
    What tool do you use to assess the speed of your RAID?
    I did a check after reading it on your blog (echo check >> /sys/block/md0/md/sync_action) and the speed was about 17-20 MB and it took about 24 hours and this is what sprung me to action. ;-)

  21. The 4K alignment did a huge difference i my case as well. I usually use hdparm for basic testing:

    [root@kelvin ~][00:34]# hdparm -tT /dev/md0
    /dev/md0:
     Timing cached reads:   19618 MB in  2.00 seconds = 9822.01 MB/sec
     Timing buffered disk reads: 496 MB in  3.01 seconds = 164.95 MB/sec

    But the resync speed from a

    [root@kelvin ~][00:40]# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4]
    md0 : active raid6 sde1[0] sdb1[3] sdd1[4] sdc1[1]
          1953520640 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
          [>....................]  check =  0.1% (1664232/976760320) finish=166.0min speed=97896K/sec
          bitmap: 0/8 pages [0KB], 65536KB chunk

    is much more relevant, as it shows the actual speed. To see this on my otherwise healthy raid I issue a check of consistency with:

    echo check >> /sys/block/md0/md/sync_action

    Edit: Wauv – you just wrote that. I think should go to sleep now :)

  22. alfonso says:

    mmmmm

    this is weird. I thought my system was doing very badly:

    # cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid6 sde1[3] sdd1[2] sdc1[1] sdb1[0]
          3907024768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
          [==&gt;..................]  resync = 12.0% (236086020/1953512384) finish=1690.7min speed=16929K/sec
     
    unused devices:

    but then I typed this:

    # hdparm -tT /dev/md0
     
    /dev/md0:
     Timing cached reads:   5982 MB in  2.00 seconds = 2991.31 MB/sec
     Timing buffered disk reads: 672 MB in  3.00 seconds = 223.77 MB/sec

    so what is it? Very fast or very slow?

    anyway when I was only looking at /proc/mdstat I tried to run a tuning script (which I rewrote from an existing one) and it gave me 10% better performances.
    if you are interested it’s here and you can run it in an “informative” way, no actual system modification get executed:
    http://ubuntuforums.org/showpost.php?p=11622143&postcount=22

  23. Davide says:

    Thomas wrote:
    >My backup strategy consists of 2 measures:
    >1) Nightly backup via rsync of everything to like-minded Linux oriented friend’s server.

    Hi Thomas,
    How did you solve the problem of privacy with your friend? Are you encrypting your files in some way?

  24. alfonso says:

    Hi Thomas,

    I created a small benchmark script and did some tests on tuning…
    If you find the time, I’ll be curious to know the results for your sistem.
    Takes 5 minutes and it’s not destructive.
    http://ubuntuforums.org/showthread.php?p=11627044

    cheers,

  25. Davide: I currently do not use any privacy enforcing methods as I have complete trust in this friend (which is also family) and do not have anything secret beyond normal privacy. We have however tried out the tool called http://duplicity.nongnu.org/:

    Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

    The disadvantage from this tool however is that recovering files is more cumbersome.

    Alfonso:
    It seems that your write performance is higher than mine:

    [root@kelvin tjansson][16:52]# ./speed-test.sh
    Flush the cache before each test
    writing tests:
    10000000000 bytes (10 GB) copied, 122.276 s, 81.8 MB/s
    10000000000 bytes (10 GB) copied, 124.307 s, 80.4 MB/s
    10000000000 bytes (10 GB) copied, 122.709 s, 81.5 MB/s
    reading tests:
    10000000000 bytes (10 GB) copied, 67.7142 s, 148 MB/s
    10000000000 bytes (10 GB) copied, 67.1621 s, 149 MB/s
    10000000000 bytes (10 GB) copied, 69.4491 s, 144 MB/s
  26. alfonso says:

    Thanks,
    I bet with a non-empty file-system results are different. I’ll try as well after I copy all my data!

  27. Alfonso: Following the advice in http://askubuntu.com/questions/19325/improving-mdadm-raid-6-write-speed and http://h3x.no/2011/07/09/tuning-ubuntu-mdadm-raid56 I tuned my /sys/block/md0/md/stripe_cache_size from 256 to 8192:

    [root@kelvin home][19:02]# echo 8192 > /sys/block/md0/md/stripe_cache_size

    and made it permanent by adding it to /etc/rc.local. My new speeds were quite a bit higher:

    [root@kelvin tjansson][19:02]# ./speed-test.sh
    flush the cache before each test
    writing tests:
    10000000000 bytes (10 GB) copied, 76.7549 s, 130 MB/s
    10000000000 bytes (10 GB) copied, 76.0624 s, 131 MB/s
    10000000000 bytes (10 GB) copied, 75.9001 s, 132 MB/s
    reading tests:
    10000000000 bytes (10 GB) copied, 67.3017 s, 149 MB/s
    10000000000 bytes (10 GB) copied, 69.878 s, 143 MB/s
    10000000000 bytes (10 GB) copied, 67.3777 s, 148 MB/s
  28. alfonso says:

    Hi Thomas,
    it’s very interesting that you found that value for stripe_cache_size to be optimal. I wrote a script to find sub-optimal settings and I got the same result.
    Give it a go if you have time, I’d like to hear if it works:

    http://ubuntuforums.org/showthread.php?p=11646960

  29. alfonso says:

    Well,

    good news all around! :) my video driver issue has been fixed (so now I can use 1920x1080i and 1280x720p over HDMI with no problem) and after aligning partitions, using the right chunk size and tuning some parameters, scrubbing the md device got 10 times faster!!!

    # cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md0 : active raid6 sdd1[4] sdc1[1] sde1[3] sdb1[0]
          3907024768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
          [&gt;....................]  check =  0.7% (13775616/1953512384) finish=161.6min speed=200001K/sec
          bitmap: 0/15 pages [0KB], 65536KB chunk
  30. alfonso says:

    Hi Thomas,

    I hope you’re doing fine!

    I just wanted to share something I found out.

    I think redundancy check is set automatically, no need for manual setup of

    0 3 * * Sun echo check >> /sys/block/md0/md/sync_action

    please check if you have a file called /etc/cron.d/mdadm and confirm. It’s set to run once a month, but it could be easily changed.

    the advantage is that (if you need to) /usr/share/mdadm/checkarray allows to cancel or change priority of a scan and it has a simple interface.

    the one thing I’m left wondering is: since it runs in cron, how will I be notified if there is a problem? shoud I forward all root emails to my email address to receive warnings?

    ok, found it. just add “MAILTO=” in /etc/crontab e.g.:
    MAILTO=myemail@gmail.com

    no need to restart any process. It works as soon as the change is done.

    please note I had already setup the system to send emails since I wanted to receive alarms from mdamd (in /etc/mdadm/mdadm.conf add/edit MAILADDR myemail@gmail.com)

    and in case you get any warning, here there are a few hints on what to do:
    http://pbraun.nethence.com/doc/sysutils_linux/mdadm.html

    cheers!

  31. Bill Whaley says:

    Thanks for posting the details to your build. I too was looking at some of the pre-built appliances , but this route sounds much better. This will be the way I will do as well. Now to start looking for parts!

  32. Dave says:

    Thanks for all of this, I love your attention to detail!

    I’ve got a travla T 1160 http://www.travla.com/product_d.php?id=0000000013 and a zotac ionitx B http://www.amazon.com/Zotac-IONITX-B-Desktop-Motherboard-Nvidia/dp/B0055N2WQM in my half height dell server rack under my stairs: http://www.daveeasa.com/images/548%20Melba/Man%20Cave/ManCave.JPG

    I have had stability problems with win 7 and my latest WD 3TB drive so I am tempted to switch over to centos. Though I recently updated the bios and it hasn’t crashed since so I’m sitting on the fence for now. It’d be especially nice to have ssh access and to seamlessly install git, subversion, and all of the free apps on apache for sharing files without dealing with the mess of doing that on win 7.

    The space under my stairs has a solid-core door with gaskets and a separate air intake through the lowest two stairs and a temperature controlled exhaust fan that parallels my dryer duct, so noise isn’t a huge concern. Sometimes I sleep in that space (when guests are in my room) and then I power the server down for light as much as fan noise.)

    The biggest problem with the zotac board is that it only has 3 sata ports, so I can’t max out the 4 drives even if I wanted to. Performance seems fine for netflix and decoding local avi’s or listening to music.

    I’m curious if you have any thoughts on what you would do in my shoes. Here are the options I see:

    1a. Jetway JNF99fl-525-lf (6 sata, atom d525) to take over the 1160 +
    1b. Travla T1200 for the zotac board which will serve up htpc and perhaps secondary web/file server and maybe eventually convert the jetway to freenas and have a 3rd board for apache/git/etc but this seems overkill

    2a. Jetway board as above for files
    2b. Travla 2u enclosure for both the zotac and jetway boards
    (not sure what I do with the 1u enclosure with this plan)

    3a. The zotac board you are using or some other higher performance board (seems like I don’t really need the performance though)
    3b. This plan necessitates the 2u enclosure for clearance above the fan/heatshield

    4. Jetway board, move the zotac to a client pc elsewhere in the house (but most of my client pc’s are mac’s now since the hardware is so much nicer.

  33. Hi Dave.

    I am not really current on what is hot since I did my build, but I can give my more personal experiences. As you probably have read I am using my build as a HTPC, but I am currently considering a fanless graphics cards so I can have the HDMI link also serve audio. The current motherboard I have used in the build doesn’t support audio through the HDMI.

    Regarding the builds I guess I would try to consolidate as much as I could on one machine – this would save time on administration and electricity. My build replaced two servers. Even though I think you are right that the dual core atom processor is probably capable of doing some transcoding and heavier stuff I would make a more build with a little more juice. One benefit of a more powerful processor would be that you could create a basic Linux installation running as hypervisor and create multiple servers on the same machine if you need the multitude.

    Hmm, this was a bit of a confusing answer, but what I meant to say was that I would go for a single powerful setup rather than several lower performance nodes. Also, remember to check the HDMI capabilities of the board or consider the space for graphics card.

    Kind regards
    Thomas

  34. Ferdinando Ametrano says:

    I would be worried to use WD10EARS in RAID 5 or RAID 6.

    Those HDDs have no TLER (Time Limited Error Recovery), and as such it’s easy for one drive to be dropped from the RAID array. If this happen the spare drive recovery process would put extra stress on the remaining drives, increasing the chances of another drive failure, with dramatic consequences.

    It happen to me with my QNAP 409 when upgrading in RAID 5 from 1TB disks to 2TB disks.

    I’ve read RAID 1 is not affected by the TLER’s lack because it has no parity calculation, even if I’m not sure about the rationale.

    Copy pasted from WD site (via another thread):

    When an error is found on a desktop edition hard drive, the drive will enter into a deep recovery cycle to attempt to repair the error, recover the data from the problematic area, and then reallocate a dedicated area to replace the problematic area. This process can take up to 2 minutes depending on the severity of the issue. Most RAID controllers allow a very short amount of time for a hard drive to recover from an error. If a hard drive takes too long to complete this process, the drive will be dropped from the RAID array. Most RAID controllers allow from 7 to 15 seconds for error recovery before dropping a hard drive from an array. Western Digital does not recommend installing desktop edition hard drives in an enterprise environment (on a RAID controller).

    Western Digital RAID edition hard drives have a feature called TLER (Time Limited Error Recovery) which stops the hard drive from entering into a deep recovery cycle. The hard drive will only spend 7 seconds to attempt to recover. This means that the hard drive will not be dropped from a RAID array. Though TLER is designed for RAID environments, it is fully compatible and will not be detrimental when used in non-RAID environments.

  35. Ferdinando Ametrano: I see your concern and I have tried to find a answer for my self for the last few hours. As far as I now understand TLER is a matter to be concerned with when using hardware raid, but using software raid this is not actually a problem:

    An editor on the site smallnetbuilder.com has queried the manufacturers of consumer NAS boxes that has non-TLER drives on their recommended list of drives about TLER. The answer was the following:

    The responses I received from Synology, QNAP, NETGEAR and Buffalo all indicated that their NAS RAID controllers don’t depend on or even listen to TLER, CCTL, ERC or any other similar error recovery signal from their drives. Instead, their software RAID controllers have their own criteria for drive timeouts, retries and when a drive is finally marked bad.

    So is there any benefit to using TLER / CCTL / ERC drives? Maybe. These features usually come on “Enterprise” grade drives (WD Caviar RE series, Seagate Barracuda ES, ES.2, Samsung Spinpoint F1), which are built to take the constant, hard use of business environments. So investing in these more expensive drives is probably a smart move if your NAS is under constant heavy use. But it will be the more robust drive construction and not TLER / CCTL / ERC that will make your RAID NAS more reliable.

    http://www.smallnetbuilder.com/nas/nas-features/31202-should-you-use-tler-drives-in-your-raid-nas
    Since most of these NAS boxes runs some form of Linux and mdadm I think this goes for any setup using mdadm.

    The following discussion on the storagereview.com forum is also a very good read on the topic:
    http://forums.storagereview.com/index.php/topic/29208-how-to-use-desktop-drives-in-raid-without-tlererccctl/

  36. Great article, Thomas. Just what I needed to build my own NAS.
    I based it on your design, with a few (minor) modifications.
    Mobo: ASUS E45M1-I DELUXE MINI ITX DDR3 AMD BRAZOS
    RAM: 8GB DDR3
    SSD: SAMSUNG 128GB 2.5 INCH 830 SERIES BASIC MZ-7PC128B
    HDD: 4x WESTERN DIGITAL 1000GB 64MB INTELLIPOWER WD10EZRX CAVIAR GREEN
    Case: LIAN LI PC-Q08B (how could I go any different?)

    1st of all, I use an SSD to run the OS on, and created a raid5 set on my non SSD-disks containing the storage, /tmp and /var, without creating RAID enabled partitions.
    That way I don’t encounter your dos 4k sector issue
    But if I’d do it again, I wouldn’t go for the SSD solution anymore…
    my stripe_cache_size is set to 4096, as it was better than 8192
    Here are my speed results:

    [root@behemoth ~]# ./speed-test.sh
    flush the cache before each test
    writing tests:
    10000000000 bytes (10 GB) copied, 49.6605 s, 201 MB/s
    10000000000 bytes (10 GB) copied, 51.5719 s, 194 MB/s
    10000000000 bytes (10 GB) copied, 53.9838 s, 185 MB/s
    reading tests:
    10000000000 bytes (10 GB) copied, 40.6848 s, 246 MB/s
    10000000000 bytes (10 GB) copied, 39.3262 s, 254 MB/s
    10000000000 bytes (10 GB) copied, 39.0654 s, 256 MB/s
    

    I did, however, find some issues with the WD green disks. I saw my Load_Cycle_Count go up too fast.
    Apparently WD Green disks have an issue (with Linux). The heads are parked way too frequently, causing your Load_Cycle_Count to skyrocket.
    I found some details on the issue at hand here
    It boils down to this:

    for dev in /dev/sd[a-d]; do smartctl -a $dev | grep Load_Cycle_Count; done;

    divide by (per drive)

    for dev in /dev/sd[a-d]; do smartctl -a $dev | grep Power_On_Hours; done;

    If you want your drives to live for about 4 years, you’ll have to stay below 9. (considering the drives have a max lifetime of 300,000 Load_Cycle_Count, some models have twice that much)
    after 210 hours, I had an average of 50 Loop cycle counts per hour. My drives were only going to last for about 8 months!

    After fiddling about a bit with different settings, I gave idle3-tools a try. Although not really stable, it did fix my issue.
    My Load_Cycle_Count want down from about 50 per hour to about 0 (yay!)
    The idle3-tools iterface is a bit buggy, so you’ll have to set the value, and then verify it. Rince and repeat until you get to about 30 seconds. I used 2400 to get 25.2 seconds. Don’t disable it. Apparently it doesn’t solve the issue (I haven’t ventured into this yet, as my SAN is in production)
    Ah, and don’t forget: power-cycle your NAS after setting this. A mere reboot doesn’t apply the new hdd firmware settings.

  37. Aksel Gresvig says:

    Great guide, thanks for sharing and making it so detailed!

    Is this setup still what you would go for? I am considering doing a very similar thing now.
    But does it work well to run everything in one OS with XBMC running on top as the frontend? I mean, there is a lot of stuff going on in the background, does it never interfere with XBMC?

    I was thinking about running the server as a virtual Linux guest in a Linux host, where the host really only has the (VirtualBox) image and XBMC running.. Perhaps I am only making life more difficult for myself?
    Hypervisor is another approach, but again likely more complex.

    But regarding distros, is Debian still the preferred OS?

    All advice for a first-time Linux NAS builder is appreciated!

  38. Hi Aksel

    I am glad that you could use it. As you can see my server died juli 2011 and I rebuild it with newer specs: http://www.tjansson.dk/?p=1660

    The new build is quite powerful and has no problem running the Mate desktop (Gnome branch) behind XBMC – BD images run fine in full HD over HDMI, so I quite happy.

    In regards to virtualization is definitely possible, but maybe not really a good way to fare if you do not have a specific reason to do so. Performance wise I see no reason to do so, but if you have some very specific security, networking or testing reason I can’t see why not to try so.

    I would say that Debian testing is preferred. Especially in the new i2500k based build as the built in GPU benefits greatly from the advances in the very recent kernels.

    Kind regards
    Thomas Jansson

  39. Thanks for the reply Thomas!
    I will go the non-virtualization route. It gives some cool opportunities but I don’t really need it.

    Quick question regarding hardware:
    Did you consider getting a Xeon CPU and ECC RAM? Or is it not really needed for a home server box, even though it is running 24/7?

  40. I didn’t really consider Xeon’s as I didn’t need the power an the have no built-in GPU. The i5 2500K is even maybe to powerful for the build, but I like the possibilities of growth in features my server can do. I haven’t chosen ECC ram either since it only servers my self and a few others and some downtime is acceptable – that said I have never had a problem which I would related to memory issues and my uptime is usually months only abrupted by kernel upgrades.

  41. dave lister says:

    50w idle, ouch! when you see the load consumption, you can see how overpowered that PSU is for this build – its not hitting its “sweet spot” in efficiency.

    Pico PSU would save quite a bit of power.

  42. Hi Dave

    I don’t consider it all that bad – this is while having 5 disks spinning and running XBMC on a TV over HDMI. The Pico PSU I have seen from a quick google search didn’t have the possibility of powering 5 disks. But overall you are right – it could probably have been made more power efficient, but my emphasis while building the box was to be powerfull and silent.

  43. Sylvain says:

    Thanks for such a great guide. I have tried FreeNAS, did not like it. I am testing OpenMediaVault now (Debian based appliance) and it is fine but I need more flexibillity and I want to include an HTPC part like you did. Your project is what I need ! I will do it on a rescued Lenovo Desktop with a P4 HT, 2 Gb of RAM, a Nvidia GT210 video card with HDMI, one IDE 40 Gb as the system drive and 2 WD Green 2TB disks for the data (music, movies, photos). If do my project with Ubuntu Desktop 12.04 instead of Debian, will I be able to replicate your setup ? And if I need to minimize the disks spinning and power consumption, would that be better to use rsnapshot (http://www.rsnapshot.org) instead of RAID-1 for the data disks ? Thanks in advance !

  44. Hi Sylvian

    I am glad you liked it. It is isn’t too demanding to run the NAS and the XMBC part for that mater especially since you have a dedicated GPU. The only reason I chose Debian (beside beeing annoyed with the new GUI’s of Ubuntu) is that the kernel was the most reason one that supported my i5-2500K on die GPU. Since this is not your problem Ubuntu should be just fine.

    The disks spinning down is only depending on your usage, so in case the idle at night the will spin down – I have actually written about this in this very article :) See /etc/hdparm.conf. Finally even though I am great fan and user of rsnapshot I would choose RAID 1 over rsnapshot, both since you do not have to have downtime when one of the disks eventually dies and the reading speed would be higher (not that it likely matters).

    Kind regards
    Thomas

  45. Bogdan Hlevca says:

    Hi Thomas,

    Great tutorial. exactly what I was looking for.
    I thought that I can contribute a little information. I did the installation as documented here, but I failed a few times due to freezing of the OS ( Debian wheezy with kernel 3.2). Annoyed I dis some research and apparently kernels 3.2 and older cannot handle properly the new Ivy bridge that comes with the newer 3 generation CPUs from Intel.
    I did the upgrade to a 3.6 kernel and to my surprise the rebuild speed rose from 75KB/s to 128KB/s with no other changes than the new kernel.

    Personalities : [raid6] [raid5] [raid4]
    md0 : active raid5 sdd1[4] sde1[3](S) sdc1[1] sdb1[0]
    4294702080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
    [>....................] recovery = 1.9% (41148736/2147351040) finish=272.7min speed=128692K/sec

    I heave a question regarding RAID5. Is it really necessary to create a spare or can you build a raid 5 with no spare and keep the spare disk on the shelf and replace it only when it is necessary. To my the hot spare looks more like a luxury item, especially when you don’t have hot plugs.

    Regards,
    Bogdan

  46. Hi Bogdan

    Thanks – I am glad you liked it. It sounds great with the with the 3.6 kernel even though I think you mean 125MB/s instead of 128KB/s which would be a little slow for rebuilding:) I haven’t experienced any freezing of my i5-2500K sandy, so it is, as you say, only ivy bridge that this problem exists. When I did my build I had to use 3.0 (or 3.1) as Sandy Bridge was not supported on 2.6.

    There is no problem in creating a RAID 5 without hot spares, but the idea of the hot spare is to have the raid rebuild from a critical state as fast a possible. If you have it on shelf anyways I would definitely set it up as a hot spare or alternatively create a RAID 6 instead as I did. With a RAID 6 you have 2 parity disks and can sustain 2 disk failures without data loss – which is safer as your data is not as vulnerable if you a single disk failure. You can see how I did in my continuation of this article: http://www.tjansson.dk/?p=1660

    Kind regards
    Thomas

  47. Bogdan Hlevca says:

    Hi Thomas,

    Sorry for my mistake, it is MB not KB.
    On another not I have a question about the effective size of a RAID array. I did a search on the Internet and there are various calculators that give different answers, but all estimate more than I have.

    I have five WD 3TB red setup in RAID 5+1. I used your settings from above and I get a reported effective size by df -h of 6TB. Most estimates from these calculators give much more.
    I know that the effective size of a disk is actually 2.7 and the effective total size would be around 10.8TB. If I consider the size of taken by the parity (size of on disk) it should still be somewhere around 8.1 TB, but I get 6TB instead.
    Do you have any good calculation methods to estimate the size?

    Below is the information from mdadm and from fdisk for one of the HDD.

    =============================================================
    mdadm --detail /dev/md0 
    /dev/md0:
            Version : 1.2
      Creation Time : Fri Jan  4 22:14:31 2013
         Raid Level : raid5
         Array Size : 6442053120 (6143.62 GiB 6596.66 GB)
      Used Dev Size : 2147351040 (2047.87 GiB 2198.89 GB)
       Raid Devices : 4
      Total Devices : 5
        Persistence : Superblock is persistent
    
        Update Time : Mon Jan  7 10:03:42 2013
              State : clean 
     Active Devices : 4
    Working Devices : 5
     Failed Devices : 0
      Spare Devices : 1
    
             Layout : left-symmetric
         Chunk Size : 512K
    
               Name : debnas:0  (local to host debnas)
               UUID : 8f550ea5:f2153f0f:eadbc9c4:d2a5f4f2
             Events : 44
    
        Number   Major   Minor   RaidDevice State
           0       8       33        0      active sync   /dev/sdc1
           1       8       49        1      active sync   /dev/sdd1
           2       8       65        2      active sync   /dev/sde1
           4       8       81        3      active sync   /dev/sdf1
    
           5       8       17        -      spare   /dev/sdb1
    ============================================================
    
    Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes
    90 heads, 3 sectors/track, 21705678 cylinders, total 5860533168 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disk identifier: 0x911e60c0
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdc1            2048  4294967294  2147482623+  fd  Linux raid autodetect
    
    =============================================================
    

    Thank you,
    Bogdan

  48. Hi Bogdan

    I think the smoking gun is the following line:

    Used Dev Size : 2147351040 (2047.87 GiB 2198.89 GB)
    

    It seems as if you have partitioned your disks wrongly since the use space per disk is around 2TB and not the 2.7TB as you wrote. At least one of them was partioned wrongly. If you have not already started to fill them up I would just redo it all. Otherwise I guess you could take them out of the raid on at the time, repartition them and add them to slowly have all at 2.7TB and then grow the raid volume. The easiest and quickest way is however is just to redo it.

    Kind regards
    Thomas Jansson

  49. Bogdan Hlevca says:

    Hi Thomas,

    You are right. Thanks for pointing that out. The problem is actually fdisk. It is not capable of creating a partition beyond 2TB.

    If I use gdisk it works. I think this is useful information for others trying larger disks.

    I will redo the whole thing again. I have now plenty of experience :-)

    Regards,
    Bogdan

  50. Bogdan Hlevca says:

    Hi Thomas,

    One more thing. It appears that for this type of drives is better to let mdadm do the sizing .
    If I use:

    mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 /dev/sd{b,c,d,e,f}1
    

    It creates smaller useable size even though the partition is set now correctly with gdisk.
    However, if I use the whole disk instead

    mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 /dev/sd{b,c,d,e,f}
    

    I get the full use of the HDD :

    root@debnas:/home/bogdan# mdadm --detail /dev/md0 
    /dev/md0:
            Version : 1.2
      Creation Time : Mon Jan  7 16:13:31 2013
         Raid Level : raid5
         Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
      Used Dev Size : 2930135040 (2794.39 GiB 3000.46 GB)
       Raid Devices : 4
      Total Devices : 5
        Persistence : Superblock is persistent
    
        Update Time : Mon Jan  7 16:13:31 2013
              State : clean, degraded, recovering 
     Active Devices : 3
    Working Devices : 5
     Failed Devices : 0
      Spare Devices : 2
    
             Layout : left-symmetric
         Chunk Size : 512K
    
     Rebuild Status : 0% complete
    
               Name : debnas:0  (local to host debnas)
               UUID : 6912b116:03ef5ba0:9c276a56:21d4f804
             Events : 1
    
        Number   Major   Minor   RaidDevice State
           0       8       16        0      active sync   /dev/sdb
           1       8       32        1      active sync   /dev/sdc
           2       8       48        2      active sync   /dev/sdd
           5       8       64        3      spare rebuilding   /dev/sde
    
           4       8       80        -      spare   /dev/sdf
    

    I am getting now the full 8.1 TB I expected.
    Hopefully this will help others who have problems with missing effective RAID size with 3TB drives.

    Kind Regards,
    Bogdan

  51. Hi Bogdan

    I am glad to hear that it is working. I think the reason that it is working is that you are not using any partitioning when you write

    /dev/sd{b,c,d,e,f}
    

    instead of

    /dev/sd{b,c,d,e,f}1
    

    So you are accessing the whole devices. This can be fine – I have done similarly when I had to format 3TB drives on old Linux installations.

    The only caveat could come up if you need to rebuild the raid in another machine. I don’t think Linux automatically recognizes the full disk as a raid member automatically as there are no partition with the usual flags. The is not a terrible hard problem – you would just have to gather the raid again more specifically by telling mdadm which disks are in the raid.

    The reason I mention it is just that I was very happily surprised by Linux to automatically assembled my RAID when I had to move my disks to another computer.

    Kind regards
    Thomas

  52. Pingback: USB external enclosue fails to mount - Page 2

  53. Pingback: DIY Linux NAS server | My BlogMy Blog

Leave a Reply