I’ve been on the receiving end of failing hard drives in the past and lost many of my original podcast source audio files and more importantly a years' worth of home videos, gone forever.
Not wishing for a repeat of this I purchased an 8TB external USB HardDrive and installed BackBlaze. The problem for me though was that BackBlaze was an ongoing expense, could only be used for a single machine and couldn’t really do anything other than be an offsite backup. I’d been considering a Network Attached Storage for years now and the thinking was, if I had a NAS then I could have backup redundancy1 plus a bunch of other really useful features and functionality.
The trigger was actually a series of crashes and disconnects of the 8TB USB HDD, and with the OS’s limited ability to troubleshoot HDD hardware-specific issues via USB I had some experience from my previous set of HDD failures many years ago, that this is how it all starts. So I gathered together a bunch of smaller HDDs and copied across all the data to them, while I still could, and resolved to get a better solution: hence the NAS.
Looking at both QNAP and Synology and my desire to have as broad a compatibility as possible, I settled on an Intel-based Synology, which in Synology-speak, means a “Plus” model. Specifically the DS918+ presented the best value for money with 4 Bays and the ability to extend with a 5 Bay external enclosure if I really felt the need in future. I waited until the DS920+ was released and noted that the benchmarks on the 920 weren’t particularly impressive and hence I stuck with the DS918+ and got a great deal as it had just become a clearance product to make way for the new model.
My series of external drives I had been using to hold an interim copy of my data were: a 4TB 3.5", a 4TB 2.5" (at that time I thought it was a drive in an enclosure you could extract), and a 2TB 3.5" drive as well as, of course, my 8TB drive which I wasn’t sure was toast yet. The goal was to reuse as many of my existing drives as possible and not spend even more money on more, new HDDs. I’d also given a disused but otherwise healthy 3.5" 4TB drive to my son for his PC earlier in the year and he hadn’t actually used it, so I reclaimed it temporarily for this exercise.
Here’s how it went down:
STEP 1: Insert 8TB Drive and in Storage Manager, Drive Info, run an Extended SMART test…and hours later…hundreds of bad sectors. To be honest, that wasn’t too surprising since the 8TB drive was periodically disconnecting and reconnecting and rebuilding its file tables - but now I had the proof. The Synology refused to let me create a Storage Pool or a Volume or anything so I resigned myself to buying 1 new drive: I saw that SeaGate Barracudas were on sale so I grabbed one from UMart and tried it.
STEP 2: Insert new 4TB Barracuda and in Storage Manager, Drive Info, run an Extended SMART test…and hours later…it worked perfectly! (As you’d expect) Though the test took a VERY long time, I was happy so I created a Storage Pool, Synology Hybrid RAID. Created a Volume, BTRFS because it came highly recommended, and then began copying over the first 4TB’s worth of data to the new Volume. So far, so good.
STEP 3: Insert my son’s 4TB drive and extend the SHR Storage Pool to include it. The Synology allowed me to do this and I did so for some reason without running a SMART Extended test on it first, and it let me so that should be fine right? Turns out, this was a terrible idea.
STEP 4: Once all data was copied off the 4TB data drive and to the Synology Volume, wipe that drive, extract the 3.5" HDD and insert the reclaimed 4TB 3.5" into the Synology and in Storage Manager, Drive Info, run an Extended SMART test…and hours later…hundreds of bad sectors. Um, okay. That’s annoying. So I might be up for yet another HDD since I have 9TB to store.
OH DEAR MOMENT: As I was re-running the drive check the Synology began reporting that the Volume was Bad, and the Storage Pool was unhealthy. I looked into the HDD manager and saw that my sons reclaimed 3.5" drive was also full of bad sectors, as the Synology had run a periodic test while data was still copying. I also attempted to extract the 2.5" drive from the external enclosure only to discover that it was a fully integrated controller/drive/enclosure and couldn’t be extracted without breaking it. (So much for that) Whilst I still had a copy of my 4TB of data in BackBlaze at this point I wasn’t worried about losing data but the penny dropped: Stop trying to save money and just buy the right drives. So I went to Computer Alliance and bought three shiny new 4TB SeaGate IronWolf drives.
STEP 5: Insert all three new 4TB IronWolfs and in Storage Manager, Drive Info, run an Extended SMART test…and hours later…the first drive perfect! The second and third drives however…had bad sectors. Bad sectors. On new drives? And not only that NAS-specific, high reliability drives? John = not impressed. I extended the Storage Pool (Barracuda + 1 IronWolf) and after running a Data Scrub it still threw up errors despite the fact both drives appeared to be fine and were brand new.
This is not what you want to see on a brand new drive…
So I did what all good geeks do and got out of the DSM GUI and hit SSH and the Terminal. I ran “btrfs check –repair” and recover, super-recover and chunk-recover and ultimately the chunk tree recovery failed. I read that I had to stop everything running and accessing the Pool so I painstakingly killed every process and re-ran the recovery but ultimately it still failed after a 24 hour long attempt. There was nothing for it - it was time to start copying the data that was on there (what I could read) back on to a 4TB external drive and blow it all away and start over.
STEP 6: In the midst of a delusion that I could still recover the data without having to recopy the lot of it off the NAS (a two day exercise), I submitted a return request for first failed IronWolf, while I re-ran the SMART on the other potentially broken drive. The return policy stated that they needed to test the HDD and that could take a day or two and Computer Alliance is a two hour round trip from my house. Fortunately I met a wonderfully helpful and accomodating support person at CA on that day and he simply handed me a replacement, taking the Synology screenshot of the bad sector count and serial number confirming I wasn’t pulling a switch on him and handed me a replacement IronWolf on the spot! (Such a great guy - give him a raise) I returned home, this time treating the HDD like a delicate egg the whole trip, inserted it and in Storage Manager, Drive Info, run an Extended SMART test…and hours later…perfect!
STEP 7: By this time I’d given up all hope of recovering the data and with three shiny new drives in the NAS, my 4TB of original data restored to my external drive (I had to pluck 5 files that failed to copy back from my BackBlaze backup) I wiped all the NAS drives…and started over. Not taking ANY chances I re-ran the SMART tests on all three and when they were clean (again) recreated the Pool, new Volume, and started copying my precious data back on to the NAS all over again.
STEP 8: I went back to Computer Alliance to return the second drive and this time I met a different support person, someone who was far more “by the book” and accepted the drive and asked me to come back another day once they’d tested it. I’d returned home and hours later they called and said “yeah it’s got bad sectors…” (you don’t say?) and unfortunately due to personal commitments I couldn’t return until the following day. I grabbed the replacement drive, drove on eggshells, added it to the last free bay and in Storage Manager, Drive Info, run an Extended SMART test…and hours later…perfect! (FINALLY)
STEP 9: I copied all of the data across from all of my external drives on to the Synology. The Volume was an SHR with 10.9TB of usable space spread across x4 4TB drives, (x3 IronWolf, and x1 Barracuda). The Data Scrub passed, the SMART Tests passed, and the IronWolf-specific Health Management tests all passed with flying colours (all green, oh yes!) It was time to repurpose the 4TB 2.5" external drive as my offline backup for the fireproof safe. I reformatted it to ExFAT and set up HyperBackup for my critical files (Home Videos, Videos of the Family, my entire photo library), backed them up and put that in the safe.
Looking back the mistake was that I never should have extended the storage pool before the Synology had run a SMART test and flagged the bad sectors. In so doing it wrote data to those bad sectors and there were just too many for BTRFS to recover in the end. In addition I never should have tried to do this on the cheap. I should have just bought new drives from the get-go. Not only that, I should have just bought NAS-specific drives from the get-go as well. Despite the bad sectors and bad luck of getting two out of three bad IronWolf drives, in the end they have performed very well and completed their SMARTs faster with online forums suggesting a desktop-class HDD (the Barracuda) is a bad choice for a NAS. I now have my own test example to see if the Barracuda is actually suitable as a long-term NAS drive, since I ended up with both types in the same NAS, same age, same everything else, so I’ll report back in a few years to see which starts failing first.
Ultimately I also stopped using BackBlaze. It was slowing down my MacBook Pro, I found video compression on data recovery that was frustrating, and even with my 512GB SSD on the MBP with everything on it, I would often get errors about a lack of space for backups to BackBlaze. Whilst financially the total lifecycle cost of the NAS and the drives is far more than BackBlaze (or an equivalent backup service) would cost me, the NAS can also do so many more things, than just to backup my data via TimeMachine.
But that’s another story for another article. In the end the NAS plus drives cost me $1.5k AUD, 6 trips to two different computer stores and 6 weeks from start to finish, but it’s been running now since August 2020 and hasn’t skipped a beat. Oh…my…NAS.
Redundancy against the failure of an individual HDD ↩︎