Btrfs: send, flush dellaloc in order to avoid data loss
diff mbox series

Message ID 20190415082936.2173-1-fdmanana@kernel.org
State New
Headers show
Series
  • Btrfs: send, flush dellaloc in order to avoid data loss
Related show

Commit Message

Filipe Manana April 15, 2019, 8:29 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When we set a subvolume to read-only mode we do not flush dellaloc for any
of its inodes (except if the filesystem is mounted with -o flushoncommit),
since it does not affect correctness for any subsequent operations - except
for a future send operation. The send operation will not be able to see the
delalloc data since the respective file extent items, inode item updates,
backreferences, etc, have not hit yet the subvolume and extent trees.

Effectively this means data loss, since the send stream will not contain
any data from existing delalloc. Another problem from this is that if the
writeback starts and finishes while the send operation is in progress, we
have the subvolume tree being being modified concurrently which can result
in send failing unexpectedly with EIO or hitting runtime errors, assertion
failures or hitting BUG_ONs, etc.

Simple reproducer:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv
  $ xfs_io -f -c "pwrite -S 0xea 0 108K" /mnt/sv/foo

  $ btrfs property set /mnt/sv ro true
  $ btrfs send -f /tmp/send.stream /mnt/sv

  $ od -t x1 -A d /mnt/sv/foo
  0000000 ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea
  *
  0110592

  $ umount /mnt
  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  $ btrfs receive -f /tmp/send.stream /mnt
  $ echo $?
  0
  $ od -t x1 -A d /mnt/sv/foo
  0000000
  # ---> empty file

Since this a problem that affects send only, fix it in send by flushing
dellaloc for all the roots used by the send operation before send starts
to process the commit roots.

This is a problem that affects send since it was introduced (commit
31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive"))
but backporting it to older kernels has some dependencies:

- For kernels between 3.19 and 4.20, it depends on commit 3cd24c698004d2
  ("btrfs: use tagged writepage to mitigate livelock of snapshot") because
  the function btrfs_start_delalloc_snapshot() does not exist before that
  commit. So one has to either pick that commit or replace the calls to
  btrfs_start_delalloc_snapshot() in this patch with calls to
  btrfs_start_delalloc_inodes().

- For kernels older than 3.19 it also requires commit e5fa8f865b3324
  ("Btrfs: ensure send always works on roots without orphans") because
  it depends on the function ensure_commit_roots_uptodate() which that
  commits introduced.

- No dependencies for 5.0+ kernels.

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 3.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/send.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

Comments

David Sterba April 24, 2019, 4:16 p.m. UTC | #1
On Mon, Apr 15, 2019 at 09:29:36AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we set a subvolume to read-only mode we do not flush dellaloc for any
> of its inodes (except if the filesystem is mounted with -o flushoncommit),
> since it does not affect correctness for any subsequent operations - except
> for a future send operation. The send operation will not be able to see the
> delalloc data since the respective file extent items, inode item updates,
> backreferences, etc, have not hit yet the subvolume and extent trees.
> 
> Effectively this means data loss, since the send stream will not contain
> any data from existing delalloc. Another problem from this is that if the
> writeback starts and finishes while the send operation is in progress, we
> have the subvolume tree being being modified concurrently which can result
> in send failing unexpectedly with EIO or hitting runtime errors, assertion
> failures or hitting BUG_ONs, etc.
> 
> Simple reproducer:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
> 
>   $ btrfs subvolume create /mnt/sv
>   $ xfs_io -f -c "pwrite -S 0xea 0 108K" /mnt/sv/foo
> 
>   $ btrfs property set /mnt/sv ro true
>   $ btrfs send -f /tmp/send.stream /mnt/sv
> 
>   $ od -t x1 -A d /mnt/sv/foo
>   0000000 ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea ea
>   *
>   0110592
> 
>   $ umount /mnt
>   $ mkfs.btrfs -f /dev/sdc
>   $ mount /dev/sdc /mnt
> 
>   $ btrfs receive -f /tmp/send.stream /mnt
>   $ echo $?
>   0
>   $ od -t x1 -A d /mnt/sv/foo
>   0000000
>   # ---> empty file
> 
> Since this a problem that affects send only, fix it in send by flushing
> dellaloc for all the roots used by the send operation before send starts
> to process the commit roots.
> 
> This is a problem that affects send since it was introduced (commit
> 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive"))
> but backporting it to older kernels has some dependencies:
> 
> - For kernels between 3.19 and 4.20, it depends on commit 3cd24c698004d2
>   ("btrfs: use tagged writepage to mitigate livelock of snapshot") because
>   the function btrfs_start_delalloc_snapshot() does not exist before that
>   commit. So one has to either pick that commit or replace the calls to
>   btrfs_start_delalloc_snapshot() in this patch with calls to
>   btrfs_start_delalloc_inodes().
> 
> - For kernels older than 3.19 it also requires commit e5fa8f865b3324
>   ("Btrfs: ensure send always works on roots without orphans") because
>   it depends on the function ensure_commit_roots_uptodate() which that
>   commits introduced.
> 
> - No dependencies for 5.0+ kernels.

^^^ thanks for writing the backporting instructions.

> 
> A test case for fstests follows soon.
> 
> CC: stable@vger.kernel.org # 3.19+
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thanks.

Patch
diff mbox series

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 7ea2d6b1f170..fe700b228b5d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6579,6 +6579,38 @@  static int ensure_commit_roots_uptodate(struct send_ctx *sctx)
 	return btrfs_commit_transaction(trans);
 }
 
+/*
+ * Make sure any existing dellaloc is flushed for any root used by a send
+ * operation so that we do not miss any data and we do not race with writeback
+ * finishing and changing a tree while send is using the tree. This could
+ * happen if a subvolume is in RW mode, has delalloc, is turned to RO mode and
+ * a send operation then uses the subvolume.
+ * After flushing delalloc ensure_commit_roots_uptodate() must be called.
+ */
+static int flush_delalloc_roots(struct send_ctx *sctx)
+{
+	struct btrfs_root *root = sctx->parent_root;
+	int ret;
+	int i;
+
+	if (root) {
+		ret = btrfs_start_delalloc_snapshot(root);
+		if (ret)
+			return ret;
+		btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX);
+	}
+
+	for (i = 0; i < sctx->clone_roots_cnt; i++) {
+		root = sctx->clone_roots[i].root;
+		ret = btrfs_start_delalloc_snapshot(root);
+		if (ret)
+			return ret;
+		btrfs_wait_ordered_extents(root, U64_MAX, 0, U64_MAX);
+	}
+
+	return 0;
+}
+
 static void btrfs_root_dec_send_in_progress(struct btrfs_root* root)
 {
 	spin_lock(&root->root_item_lock);
@@ -6803,6 +6835,10 @@  long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 			NULL);
 	sort_clone_roots = 1;
 
+	ret = flush_delalloc_roots(sctx);
+	if (ret)
+		goto out;
+
 	ret = ensure_commit_roots_uptodate(sctx);
 	if (ret)
 		goto out;