Message ID | 1455111048-25398-1-git-send-email-fdmanana@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Wed, Feb 10, 2016 at 01:30:48PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > If we delete a snapshot, fsync its parent directory and crash/power fail > before the next transaction commit, on the next mount when we attempt to > replay the log tree of the root containing the parent directory we will > fail and prevent the filesystem from mounting, which is solvable by wiping > out the log trees with the btrfs-zero-log tool but very inconvenient as > we will lose any data and metadata fsynced before the parent directory > was fsynced. > > For example: > > $ mkfs.btrfs -f /dev/sdc > $ mount /dev/sdc /mnt > $ mkdir /mnt/testdir > $ btrfs subvolume snapshot /mnt /mnt/testdir/snap > $ btrfs subvolume delete /mnt/testdir/snap > $ xfs_io -c "fsync" /mnt/testdir > < crash / power failure and reboot > > $ mount /dev/sdc /mnt > mount: mount(2) failed: No such file or directory > > And in dmesg/syslog we get the following message and trace: > > [192066.361162] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257 > [192066.363010] ------------[ cut here ]------------ > [192066.365268] WARNING: CPU: 4 PID: 5130 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x17a/0x354 [btrfs]() > [192066.367250] BTRFS: Transaction aborted (error -2) > [192066.368401] Modules linked in: btrfs dm_flakey dm_mod ppdev sha256_generic xor raid6_pq hmac drbg ansi_cprng aesni_intel acpi_cpufreq tpm_tis aes_x86_64 tpm ablk_helper evdev cryptd sg parport_pc i2c_piix4 psmouse lrw parport i2c_core pcspkr gf128mul processor serio_raw glue_helper button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > [192066.377154] CPU: 4 PID: 5130 Comm: mount Tainted: G W 4.4.0-rc6-btrfs-next-20+ #1 > [192066.378875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [192066.380889] 0000000000000000 ffff880143923670 ffffffff81257570 ffff8801439236b8 > [192066.382561] ffff8801439236a8 ffffffff8104ec07 ffffffffa039dc2c 00000000fffffffe > [192066.384191] ffff8801ed31d000 ffff8801b9fc9c88 ffff8801086875e0 ffff880143923710 > [192066.385827] Call Trace: > [192066.386373] [<ffffffff81257570>] dump_stack+0x4e/0x79 > [192066.387387] [<ffffffff8104ec07>] warn_slowpath_common+0x99/0xb2 > [192066.388429] [<ffffffffa039dc2c>] ? __btrfs_unlink_inode+0x17a/0x354 [btrfs] > [192066.389236] [<ffffffff8104ec68>] warn_slowpath_fmt+0x48/0x50 > [192066.389884] [<ffffffffa039dc2c>] __btrfs_unlink_inode+0x17a/0x354 [btrfs] > [192066.390621] [<ffffffff81184b55>] ? iput+0xb0/0x266 > [192066.391200] [<ffffffffa039ea25>] btrfs_unlink_inode+0x1c/0x3d [btrfs] > [192066.391930] [<ffffffffa03ca623>] check_item_in_log+0x1fe/0x29b [btrfs] > [192066.392715] [<ffffffffa03ca827>] replay_dir_deletes+0x167/0x1cf [btrfs] > [192066.393510] [<ffffffffa03cccc7>] replay_one_buffer+0x417/0x570 [btrfs] > [192066.394241] [<ffffffffa03ca164>] walk_up_log_tree+0x10e/0x1dc [btrfs] > [192066.394958] [<ffffffffa03cac72>] walk_log_tree+0xa5/0x190 [btrfs] > [192066.395628] [<ffffffffa03ce8b8>] btrfs_recover_log_trees+0x239/0x32c [btrfs] > [192066.396790] [<ffffffffa03cc8b0>] ? replay_one_extent+0x50a/0x50a [btrfs] > [192066.397891] [<ffffffffa0394041>] open_ctree+0x1d8b/0x2167 [btrfs] > [192066.398897] [<ffffffffa03706e1>] btrfs_mount+0x5ef/0x729 [btrfs] > [192066.399823] [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf > [192066.400739] [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3 > [192066.401700] [<ffffffff811714b9>] mount_fs+0x67/0x131 > [192066.402482] [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde > [192066.403930] [<ffffffffa03702bd>] btrfs_mount+0x1cb/0x729 [btrfs] > [192066.404831] [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf > [192066.405726] [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3 > [192066.406621] [<ffffffff811714b9>] mount_fs+0x67/0x131 > [192066.407401] [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde > [192066.408247] [<ffffffff8118ae36>] do_mount+0x893/0x9d2 > [192066.409047] [<ffffffff8113009b>] ? strndup_user+0x3f/0x8c > [192066.409842] [<ffffffff8118b187>] SyS_mount+0x75/0xa1 > [192066.410621] [<ffffffff8147e517>] entry_SYSCALL_64_fastpath+0x12/0x6b > [192066.411572] ---[ end trace 2de42126c1e0a0f0 ]--- > [192066.412344] BTRFS: error (device dm-0) in __btrfs_unlink_inode:3986: errno=-2 No such entry > [192066.413748] BTRFS: error (device dm-0) in btrfs_replay_log:2464: errno=-2 No such entry (Failed to recover log tree) > [192066.415458] BTRFS error (device dm-0): cleaner transaction attach returned -30 > [192066.444613] BTRFS: open_ctree failed > > This happens because when we are replaying the log and processing the > directory entry pointing to the snapshot in the subvolume tree, we treat > its btrfs_dir_item item as having a location with a key type matching > BTRFS_INODE_ITEM_KEY, which is wrong because the type matches > BTRFS_ROOT_ITEM_KEY and therefore must be processed differently, as the > object id refers to a root number and not to an inode in the root > containing the parent directory. > > So fix this by triggering a transaction commit if an fsync against the > parent directory is requested after deleting a snapshot. This is the > simplest approach for a rare use case. Some alternative that avoids the > transaction commit would require more code to explicitly delete the > snapshot at log replay time (factoring out common code from ioctl.c: > btrfs_ioctl_snap_destroy()), special care at fsync time to remove the > log tree of the snapshot's root from the log root of the root of tree > roots, amongst other steps. > > A test case for xfstests that triggers the issue follows. > > seq=`basename $0` > seqres=$RESULT_DIR/$seq > echo "QA output created by $seq" > tmp=/tmp/$$ > status=1 # failure is the default! > trap "_cleanup; exit \$status" 0 1 2 3 15 > > _cleanup() > { > _cleanup_flakey > cd / > rm -f $tmp.* > } > > # get standard environment, filters and checks > . ./common/rc > . ./common/filter > . ./common/dmflakey > > # real QA test starts here > _need_to_be_root > _supported_fs btrfs > _supported_os Linux > _require_scratch > _require_dm_target flakey > _require_metadata_journaling $SCRATCH_DEV > > rm -f $seqres.full > > _scratch_mkfs >>$seqres.full 2>&1 > _init_flakey > _mount_flakey > > # Create a snapshot at the root of our filesystem (mount point path), delete it, > # fsync the mount point path, crash and mount to replay the log. This should > # succeed and after the filesystem is mounted the snapshot should not be visible > # anymore. > _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap1 > _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap1 > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT > _flakey_drop_and_remount > [ -e $SCRATCH_MNT/snap1 ] && \ > echo "Snapshot snap1 still exists after log replay" > > # Similar scenario as above, but this time the snapshot is created inside a > # directory and not directly under the root (mount point path). > mkdir $SCRATCH_MNT/testdir > _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/testdir/snap2 > _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap2 > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir > _flakey_drop_and_remount > [ -e $SCRATCH_MNT/testdir/snap2 ] && \ > echo "Snapshot snap2 still exists after log replay" > > _unmount_flakey > > echo "Silence is golden" > status=0 > exit Tested-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Thanks, -liubo > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/ioctl.c | 3 +++ > fs/btrfs/tree-log.c | 15 +++++++++++++++ > fs/btrfs/tree-log.h | 2 ++ > 3 files changed, 20 insertions(+) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 561aa62..90f38d7 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -59,6 +59,7 @@ > #include "props.h" > #include "sysfs.h" > #include "qgroup.h" > +#include "tree-log.h" > > #ifdef CONFIG_64BIT > /* If we have a 32-bit userspace and 64-bit kernel, then the UAPI > @@ -2527,6 +2528,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, > out_end_trans: > trans->block_rsv = NULL; > trans->bytes_reserved = 0; > + if (!err) > + btrfs_record_snapshot_destroy(trans, dir); > ret = btrfs_end_transaction(trans, root); > if (ret && !err) > err = ret; > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 978c3a8..43c6781 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -5498,6 +5498,21 @@ record: > } > > /* > + * Make sure that if someone attempts to fsync the parent directory of a deleted > + * snapshot, it ends up triggering a transaction commit. This is to guarantee > + * that after replaying the log tree of the parent directory's root we will not > + * see the snapshot anymore and at log replay time we will not see any log tree > + * corresponding to the deleted snapshot's root, which could lead to replaying > + * it after replaying the log tree of the parent directory (which would replay > + * the snapshot delete operation). > + */ > +void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans, > + struct inode *dir) > +{ > + BTRFS_I(dir)->last_unlink_trans = trans->transid; > +} > + > +/* > * Call this after adding a new name for a file and it will properly > * update the log to reflect the new name. > * > diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h > index 6916a78..a9f1b75 100644 > --- a/fs/btrfs/tree-log.h > +++ b/fs/btrfs/tree-log.h > @@ -79,6 +79,8 @@ int btrfs_pin_log_trans(struct btrfs_root *root); > void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans, > struct inode *dir, struct inode *inode, > int for_rename); > +void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans, > + struct inode *dir); > int btrfs_log_new_name(struct btrfs_trans_handle *trans, > struct inode *inode, struct inode *old_dir, > struct dentry *parent); > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 561aa62..90f38d7 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -59,6 +59,7 @@ #include "props.h" #include "sysfs.h" #include "qgroup.h" +#include "tree-log.h" #ifdef CONFIG_64BIT /* If we have a 32-bit userspace and 64-bit kernel, then the UAPI @@ -2527,6 +2528,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, out_end_trans: trans->block_rsv = NULL; trans->bytes_reserved = 0; + if (!err) + btrfs_record_snapshot_destroy(trans, dir); ret = btrfs_end_transaction(trans, root); if (ret && !err) err = ret; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 978c3a8..43c6781 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5498,6 +5498,21 @@ record: } /* + * Make sure that if someone attempts to fsync the parent directory of a deleted + * snapshot, it ends up triggering a transaction commit. This is to guarantee + * that after replaying the log tree of the parent directory's root we will not + * see the snapshot anymore and at log replay time we will not see any log tree + * corresponding to the deleted snapshot's root, which could lead to replaying + * it after replaying the log tree of the parent directory (which would replay + * the snapshot delete operation). + */ +void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans, + struct inode *dir) +{ + BTRFS_I(dir)->last_unlink_trans = trans->transid; +} + +/* * Call this after adding a new name for a file and it will properly * update the log to reflect the new name. * diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h index 6916a78..a9f1b75 100644 --- a/fs/btrfs/tree-log.h +++ b/fs/btrfs/tree-log.h @@ -79,6 +79,8 @@ int btrfs_pin_log_trans(struct btrfs_root *root); void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans, struct inode *dir, struct inode *inode, int for_rename); +void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans, + struct inode *dir); int btrfs_log_new_name(struct btrfs_trans_handle *trans, struct inode *inode, struct inode *old_dir, struct dentry *parent);