ZFS Corruption: Postmortem

Are you performing backups of your filesystems? A recent blog post described the process for using remote snapshots with ZFS to ensure data is backed up. This post describes an incident where data was almost lost on a ZFS filesystem due to a corrupted pool (Are you backing up your filesystems yet?).

A VM running within VirtualBox only had a single VirtualDisk with ZFS as its filesystem. One day, while using the VM, the power went off on the host operating system. ZFS, with its Copy-On-Write functionality should be resistant to this type of sudden power-loss, but in this case, something happened to the VirtualDisk provided by VirtualBox. One thing that can cause serious issues for any filesystem is hardware faults (though technically it was a software fault). When trying to boot up, FreeBSD was unable to mount root on ZFS. If this happens, you will be dropped to a boot prompt:


This points to an issue with the ZFS pool, which caused FreeBSD to not be able to mount it on startup. The first step was to grab a FreeBSD 10 disc, and boot up and select “Live CD”, and login as root, with no password. Once you are logged into the Live CD, run the following command to list the available storage pools:

# zpool import

For a default ZFS install, this should show the “bootpool” in good status. Because encryption was used (Root on ZFS with GELI Encryption), bootpool is required to mount the corrupted pool. Run the following commands to make a mount point for the bootpool:

# mkdir -p /tmp/bootpool
# zpool import -fR /tmp/bootpool bootpool

Run the following to decrypt the GELI partition and allow access to the pool (in this example, the /dev/ada0p4 device):

# geli attach -k /tmp/bootpool/bootpool/boot/encryption.key /dev/ada0p4
Enter passphrase:
# zpool import

There may be some GEOM_ELI console messages that appear, but the damaged pool should be listed. However, the status for the pool may be DEGRADED or worse:

cannot import 'tank': I/O error 
Destroy and re-create the pool from 
a backup source

At this point, try to import the corrupted pool (assume going forward that the pool is named tank):

# mkdir -p /tmp/tank 
# zpool import -f -R /tmp/tank tank

If this operation fails, you may see the following error:

 Solaris: WARNING: can't open objset for tank/tmp
cannot import 'tank': I/O error
Destroy and re-create the pool from
a backup source

In searching for a solution, there is a another interesting flag for importing that will try to fix errors in the dataset, discarding transactions as it goes backward (Note: This will attempt to get back non-corrupted data. It is worthwhile, only if you do not have a clean backup of your datasets):

# zpool import -fFX -R /tmp/tank tank

With roughly 150GB of storage, the import with the “-X” flag took several hours to complete. However, once it runs its course, you should be able to scrub the pool and mount/get access to the non-corrupted data:

# zpool scrub tank

The pool may be ONLINE, but will display output similar to the following when files are corrupted:

errors: Permanent errors have been detected in the following files: 

The best approach is to use one of the many examples for keeping a copies of snapshots on a separate device, removable media, or offsite. This ensure that no matter what happens, you have a recent copy of your data that can imported and restored.

Copyright © 2024 Daemon Security Inc. | Privacy Policy
Terms and Conditions