Introduction
Replacing RAID disks is normally fairly routine in my setup, but this time I hit a persistent Invalid argument when adding a member over USB. I wrote this to share the exact steps and the reasoning, so others can get through the same problem faster.
The short version, the array would not accept a member while the Bad Block Log was present on a USB attached mdadm array. Reassembling with --update=force-no-bbl was the fix, after which adding the disk worked immediately.
Background
My server curie runs Ubuntu Server, see New silent and powerful Linux NAS and Server. The data array /dev/md0 is RAID5 in an IcyBox IB 3740 C31 enclosure on USB 3.1 Gen 2, managed with mdadm, mounted as /home. The array used four WD Red 3 TB WD30EFRX disks that have been spinning for about eleven years and not surprisingly one now disk failed. I bought new disks which are WD Red Plus 4 TB WD40EFPX to replace them all since 11 years is beyond expected lifetime.
Started out normally
When the disk failed, I marked as failed and removed it from the array:
sudo mdadm --manage /dev/md0 --fail /dev/sdb1 sudo mdadm --manage /dev/md0 --remove /dev/sdb1
After having installed the new disk, I formatted it using the commands below. The commands use the whole disk for the RAID volume and ensure the partition is optimally aligned with the drive’s internal physical data blocks. If this alignment isn’t correct, the system would be forced to perform inefficient Read-Modify-Write operations, causing significantly lower performance and increased drive wear.
sudo parted /dev/sdb --script mklabel gpt mkpart primary 1MiB 100% sudo parted /dev/sdb --script set 1 raid on sudo parted /dev/sdb align-check optimal 1 sudo parted /dev/sdb print
Then I tried to add it to the array with this command, but got the following error:
$ sudo mdadm --manage /dev/md0 --add /dev/sdb1 mdadm: add new device failed for /dev/sdb1 as 6: Invalid argument
And in dmesg I saw the following:
[ 748.944806] md: sda1 does not have a valid v1.2 superblock, not importing! [ 748.944813] md: md_import_device returned -22
Troubleshooting UASP – not the problem
First, I thought the IcyBox IB 3740 C31 USB enclosure was the problem and perhaps something to do with UASP, so I checked the connection and confirmed they all used UASP:
lsusb -t |__ Port 001: Dev 003, If 0, Class=Mass Storage, Driver=uas, 10000M |__ Port 002: Dev 004, If 0, Class=Mass Storage, Driver=uas, 10000M |__ Port 003: Dev 005, If 0, Class=Mass Storage, Driver=uas, 10000M |__ Port 004: Dev 006, If 0, Class=Mass Storage, Driver=uas, 10000M
I tried to disable UASP and connect through normal usb, but that didn’t work:
echo "options usb-storage quirks=2109:0715:u" | sudo tee /etc/modprobe.d/usb-storage-quirks.conf sudo update-initramfs -u
so I reverted it:
sudo rm /etc/modprobe.d/usb-storage-quirks.conf sudo update-initramfs -u sudo reboot
New USB disk enclosure?
I also confirmed that moving the disk to a different USB bridge did not solve the problem. I had an older single bay adapter I wanted to see if some low-level commands only worked through single USB enclosure, but it only supported 32 bit LBA and truncated large drives, so I bought a cheap IcyBox USB Dockingstation (IB-1121-C31) and tested that, but the problem md_import_device returned -22 (Invalid argument) remained.
Breakthrough, assemble without the Bad Block Log
Hours of googling and LLM’s indicated it could also be related to to bad blocks log from the old failed disk:
$ sudo mdadm --examine-badblocks /dev/sd[acd]1 Bad-blocks on /dev/sda1: 974389216 for 8 sectors 6004799503160661 for 341 sectors Bad-blocks on /dev/sdc1: 974389216 for 8 sectors Bad-blocks on /dev/sdd1: 974389216 for 8 sectors $ sudo mdadm --examine-badblocks /dev/sd[b]1 Bad-blocks on /dev/sdb1: 0 for 0 sectors 0 for 0 sectors 0 for 0 sectors 0 for 0 sectors 0 for 0 sectors 0 for 0 sectors 0 for 0 sectors ....
I tried to umount /home using these commands:
sudo lsof +f -- /home # To see what uses the disk sudo umount /home
That was all very fine, but I could still not stop md0, so had make sure that /home was not mounted after a reboot with the following:
sudo systemctl mask home.mount sudo sync sudo systemctl reboot
this worked and md0 was now stopped. The I assembled md0 again with without importing the Bad Block Log, then add the member:
sudo mdadm --stop /dev/md0 sudo mdadm --assemble /dev/md0 --update=force-no-bbl /dev/sda1 /dev/sdc1 /dev/sdd1 sudo mdadm --manage /dev/md0 --add /dev/sdb1
Result, the array started with three of four devices and immediately accepted the new member, the rebuild ran and finished, and the array returned to clean!
Root Cause
The array had an active Bad Block Log (BBL) on each member device. This feature records sectors that have returned I/O errors so the array can avoid them later. It is generally useful on direct SATA or SAS connections, but in my case, the disks sit behind a USB bridge (VIA VL817 / ASM235CM). Apparently due to a kernel issue introduced in recent md versions (around Linux 6.7–6.8), arrays with an active BBL can refuse new members when any of these are true:
- The new device has no BBL yet (fresh disk).
- The BBL metadata reservation cannot be written consistently through the USB transport layer.
- The array metadata version is 1.2 (superblock at 4 MiB offset).
When the kernel tries to import the new device, it sees inconsistent or missing BBL information and aborts the add operation with EINVAL (-22) — which is exactly what I saw.
Clean up and aftermath
To make sure that /home was mounted again at next boot I ran
sudo systemctl unmask home.mount sudo systemctl enable home.mount sudo systemctl reboot
Links
- https://askubuntu.com/questions/1405117/problem-dealing-with-corrupt-file-and-mdadm
- https://superuser.com/questions/1766665/re-adding-disk-to-md-raid-fails
