Category: Linux

Recovering QNAP NAS lost data when NAS not starting properly

QNAP is a company that designs great network attached storages (NAS). Unfortunately, even their NAS can crash. Mine did. Before you get to how to recover the lost data, here's my NAS and RAID spec (so that you can understand what and why I did):

  • QNAP TS-410U
  • RAID5
  • 4 HDD (/dev/sda, /dev/sdb, /dev/sdc, /dev/sdd)
  • Approximately 1.4 TB of data
  • Fortunately I had the most important data already backuped somewhere else (less pressure and stresses during fixing)

And this is what happened to it:

  1. NAS software update (for 1 week everything was working fine)
  2. NAS rejected one of my HDDs (/dev/sda) due to SMART status.
  3. RAID5 is now in degradation mode.
  4. Broken HDD has been removed (not replaced!).
  5. NAS has been shutdown (I didn't plan to use it so I turn it off for 2 weeks - just in case).
  6. NAS would not boot with HDDs inside (well it would boot but it didn't get an IP address, so that I could get to it).
  7. NAS is not reachable at all (despite the fact that it seemed to work just fine).
  8. Basic system reset (3s) didn't help at all (still no network connection).

Booting without any hard drives

You won't be able to do anything, unless you manage to get online with your QNAP. If it's just a software issue (which was in my case), follow these instructions:

  1. Force shutdown of your NAS (press power button for 10 seconds)
  2. Remove all the hard drives
  3. Turn on your NAS by pressing power button
  4. Once it is ready (it beeps), perform a basic system reset
  5. Restart your NAS (either by performing shutdown or by disconnecting power)
  6. Boot it again
  7. You should be able to reach the following website: http://your-nas-ip-address:8080/
  8. Unfortunately you don't have any hard drives connected, so no data recovery yet ;)

No hard drives and no setup equals no way to recover data

Udls9W0

Before you attach our hard drives and restore RAID, you need to know one thing: QNAP that is not a setup with at least 1 HDD, won't provide you with any tools like scp or rsync. You will be able to examine your HDDs (there's mdadm luckily), but you won't transfer your data via LAN. All network tools are only available once you perform a full setup. Also keep in mind, that you should perform a whole new installation with your RAID hard drives unplugged (just in case).

Spare HDD to the rescue

Make your NAS available via SSH with all the tools you need.
To do this, you will have to have one spare hard drive (any SATA HDD will be ok). Now:

  1. Turn off your NAS.
  2. Plug in your HDD.
  3. Make sure your RAID HDDs are unplugged.
  4. Power on your NAS.
  5. Once it boots, go to admin page and perform a quick setup.
  6. Now you should be able to connect to it via SSH (ssh admin@your-nas-ip) user: admin, password: admin
  7. Once you connect, check if you have the following commands available: rsync, scp, mdadm

Reassembling RAID5 and mounting it to recover data

I used the first HDD slot for a temporary "rescue" HDD (/dev/sda). So it won't be included when I will reassemble the rest of HDDs.

Before you assemble anything, you need to check if there's valid RAID data on each of the remaining HDDs:

# You will have also 
mdadm --examine /dev/sdb3
mdadm --examine /dev/sdc3
mdadm --examine /dev/sdd3

For each of them, you should see something like that:

/dev/sdc3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0fcde09f:5258ded4:4c22c8ef:89a53221
  Creation Time : Sat Mar  9 21:13:27 2013
     Raid Level : raid5
  Used Dev Size : 1951945600 (1861.52 GiB 1998.79 GB)
     Array Size : 5855836800 (5584.56 GiB 5996.38 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1

    Update Time : Sun Feb  1 13:32:54 2015
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : fb959cff - correct
         Events : 0.1608150

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       35        2      active sync   /dev/sdc3

   0     0       0        0        0      removed
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       8       35        2      active sync   /dev/sdc3
   3     3       8       51        3      active sync   /dev/sdd3

Now reassembling:

# Use /dev/mdNR that is not taken already
mdadm --assemble /dev/md1 /dev/sdb3 /dev/sdc3 /dev/sdd3

After executing this command, your RAID should be assembled and ready to mount:

mkdir /share/QSAVE
mount -t ext4 /dev/md1 /share/QSAVE

If everything went ok, you can see your data, when you'll:

cd /share/QSAVE
ls

Local backup

If you used a decent "rescue" HDD, you can now use it as a backup hard drive for all of your NAS data (as long as it is big enough):

mkdir /share/HDA_DATA/backup
rsync -rv /share/QSAVE/ /share/HDA_DATA/backup/

Remote backup

You can also backup your NAS remotely:

mkdir ./qnap_backup
rsync -rv --exclude=".*" admin@your-nas-ip:/share/QSAVE/ ./qnap_backup

Also keep in mind, that even when you have RAID1, RAID5, RAID10 and so on, it is still worth having an external backup of all of your data.

MongoDB monitoring with Munin – Setting up Munin to work with MongoDB

To monitor MongoDB you can use many tools, some like MongoDB Management Service (MMS) are cloud based, some like Munin might be installed locally.

Today we will focus on setting up Munin to monitor MongoDB.

Getting started with Munin

If you don't know what Munin is, or how to install it on your platform, please refer to following articles:

This is not a tutorial on how to install Munin itself. I assume that from this point, you have Munin and Munin-node running on your system and that you see basic Munin stats charts.

MongoDB configuration

MongoDB provides a simple http interface listing information of interest to administrators. This interface may be accessed at the port with a numeric value 1000 more than the configured mongod port. The default port for the http interface is 28017 (description copy-pasted from here). By default it is not enabled but it is required by Munin MongoDB plugins, so we need to turn it on.

Warning! Keep in mind, that if you don't block it, it will listed on your public interface and it will be accessible by default from internet. Please use iptables to make it work only from localhost.

To enable it, edit your /etc/mongod.conf file, find httpinterface line and uncomment it (or set to true if set to false):

# vim /etc/mongod.conf
# Enable the HTTP interface (Defaults to port 28017).

httpinterface = true

After that you need to restart MongoDB:

# As a root or using sudo
/etc/init.d/mongod restart

[ ok ] Restarting database: mongod.

To test it, open http://localhost:28017/ (remember to replace localhost with your server host). You should see page similar to this one:

mongohttp

Installing Munin MongoDB plugins

Some of those plugins won't work out of the box, but we will take care of that later. For now let's focus on the install process:

# as a root or using sudo
git clone git://github.com/erh/mongo-munin.git ~/mongo-munin
cd ~/mongo-munin

# We copy all the plugins into munin plugins
cp mongo_* /usr/share/munin/plugins/

# We need to activate them now
ln -s /usr/share/munin/plugins/mongo_btree /etc/munin/plugins/
ln -s /usr/share/munin/plugins/mongo_conn /etc/munin/plugins/
ln -s /usr/share/munin/plugins/mongo_lock /etc/munin/plugins/
ln -s /usr/share/munin/plugins/mongo_mem /etc/munin/plugins/
ln -s /usr/share/munin/plugins/mongo_ops /etc/munin/plugins/

cd ~
# We don't need this anymore
rm -rf mongo-munin

# Restarting munin-node...
/etc/init.d/munin-node restart

After restarting munin-node and waiting few minutes, we should have a new section in your munin web ui (mongodb). Part of the graphs won't display any data, but you should at least see the mongodb section.

mongo_mem-day

Debugging and fixing Munin MongoDB broken plugins

Some of the plugins (like mongo_lock) won't work without a little tuneup. If you see graphs similar to this (without any data and -nan everywhere), then most likely those plugins aren't working.

mongo_lock-day

To check each of the plugins, you need to run munin-run with appropriate plugin name (as root):

mongo_btree

# munin-run mongo_btree
Traceback (most recent call last):
  File "/etc/munin/plugins/mongo_btree", line 61, in <module>
    doData()
  File "/etc/munin/plugins/mongo_btree", line 35, in doData
    for k,v in get().iteritems():
  File "/etc/munin/plugins/mongo_btree", line 32, in get
    return getServerStatus()["indexCounters"]["btree"]
KeyError: 'btree'

mongo_conn

# munin-run mongo_conn
connections.value 0

mongo_lock

# munin-run mongo_lock
Traceback (most recent call last):
  File "/etc/munin/plugins/mongo_lock", line 54, in <module>
    doData()
  File "/etc/munin/plugins/mongo_lock", line 34, in doData
    print name + ".value " + str( 100 * getServerStatus()["globalLock"]["ratio"] )
KeyError: 'ratio'

mongo_mem

# munin-run mongo_mem
resident.value 37748736
virtual.value 376438784
mapped.value 83886080

mongo_ops

# munin-run mongo_ops
getmore.value 0
insert.value 1
update.value 0
command.value 1
query.value 53
delete.value 0

Errors summary

Based on the plugins output, we can see, that 2 out of 5 plugins aren't working:

  • mongo_btree
  • mongo_lock

They aren't working because the MongoDB HTTP interface response slightly differs from what it used to be when the plugins were developed.

Patching mongo_btree plugin

Apply following patch to /usr/share/munin/plugins/mongo_btree:

@@ -29,7 +29,7 @@ def getServerStatus():
     return json.loads( raw )["serverStatus"]
 
 def get():
-    return getServerStatus()["indexCounters"]["btree"]
+    return getServerStatus()["indexCounters"]
 
 def doData():
     for k,v in get().iteritems():

After patching, execute munin-run mongo_btree:

# munin-run mongo_btree
missRatio.value 0
resets.value 0
hits.value 2
misses.value 0
accesses.value 2

Patching mongo_lock plugin

Apply following patch to /usr/share/munin/plugins/mongo_lock:

@@ -31,7 +31,7 @@ def getServerStatus():
 name = "locked"
 
 def doData():
-    print name + ".value " + str( 100 * getServerStatus()["globalLock"]["ratio"] )
+    print name + ".value " + str( 100 * round(float(getServerStatus()["globalLock"]["lockTime"]["$numberLong"])/float(getServerStatus()["globalLock"]["totalTime"]["$numberLong"]), 8) )
 
 def doConfig():

After patching, execute munin-run mongo_lock:

# munin-run mongo_lock
locked.value 0.000785

After that (and few minutes) all of the stats should be working.

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑