diff mbox series

[1/2] btrfs: Simplify snapshot exclusion code

Message ID 20190423114207.7899-1-nborisov@suse.com (mailing list archive)
State New, archived
Headers show
Series [1/2] btrfs: Simplify snapshot exclusion code | expand

Commit Message

Nikolay Borisov April 23, 2019, 11:42 a.m. UTC
BTRFS sports a mechanism to provide exclusion when a snapshot is about
to be created. This is implemented via btrfs_start_write_no_snapshotting
et al. Currently the implementation of that mechanism is some perverse
amalgamation of a percpu variable, an explicit waitqueue, an atomic_t
variable and an implicit wait bit on said atomic_t via wait_var_event
family of API. And for good measure there is a memory barrier thrown in
the mix...

Astute reader should have concluded by now that it's bordering on
impossible to prove whether this scheme works. What's worse - all of
this is required to achieve something really simple - ensure certain
operations cannot run during snapshot creation. Let's simplify this by
relying on a single atomic_t used as a boolean flag. This commit changes
only the implementation and not the semantics of the existing mechanism.

Now, if the atomic is 1 (snapshot is in progress) callers of
btrfs_start_write_no_snapshotting will get a ret val of 0 that should be
handled accordingly.

btrfs_wait_for_snapshot_creation OTOH will block until snapshotting is
in progress and return when current snapshot in progress is finished and
will acquire the right to create a snapshot.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 20 +++++---------------
 fs/btrfs/ioctl.c       |  9 ++-------
 2 files changed, 7 insertions(+), 22 deletions(-)

Comments

Filipe Manana April 24, 2019, 9:49 a.m. UTC | #1
On Tue, Apr 23, 2019 at 12:43 PM Nikolay Borisov <nborisov@suse.com> wrote:
>
> BTRFS sports a mechanism to provide exclusion when a snapshot is about
> to be created. This is implemented via btrfs_start_write_no_snapshotting
> et al. Currently the implementation of that mechanism is some perverse
> amalgamation of a percpu variable, an explicit waitqueue, an atomic_t
> variable and an implicit wait bit on said atomic_t via wait_var_event
> family of API. And for good measure there is a memory barrier thrown in
> the mix...
>
> Astute reader should have concluded by now that it's bordering on
> impossible to prove whether this scheme works. What's worse - all of
> this is required to achieve something really simple - ensure certain
> operations cannot run during snapshot creation. Let's simplify this by
> relying on a single atomic_t used as a boolean flag.

Nop, can't work as a boolean, see below.

> This commit changes
> only the implementation and not the semantics of the existing mechanism.
>
> Now, if the atomic is 1 (snapshot is in progress) callers of
> btrfs_start_write_no_snapshotting will get a ret val of 0 that should be
> handled accordingly.
>
> btrfs_wait_for_snapshot_creation OTOH will block until snapshotting is
> in progress and return when current snapshot in progress is finished and
> will acquire the right to create a snapshot.
>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 20 +++++---------------
>  fs/btrfs/ioctl.c       |  9 ++-------
>  2 files changed, 7 insertions(+), 22 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8f2b7b29c3fd..d9e2e35700fd 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11333,25 +11333,15 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>   */
>  void btrfs_end_write_no_snapshotting(struct btrfs_root *root)
>  {
> -       percpu_counter_dec(&root->subv_writers->counter);
> -       cond_wake_up(&root->subv_writers->wait);
> +       ASSERT(atomic_read(&root->will_be_snapshotted) == 1);
> +       if (atomic_dec_and_test(&root->will_be_snapshotted))
> +               wake_up_var(&root->will_be_snapshotted);
>  }
>
>  int btrfs_start_write_no_snapshotting(struct btrfs_root *root)
>  {
> -       if (atomic_read(&root->will_be_snapshotted))
> -               return 0;
> -
> -       percpu_counter_inc(&root->subv_writers->counter);
> -       /*
> -        * Make sure counter is updated before we check for snapshot creation.
> -        */
> -       smp_mb();
> -       if (atomic_read(&root->will_be_snapshotted)) {
> -               btrfs_end_write_no_snapshotting(root);
> -               return 0;
> -       }
> -       return 1;
> +       ASSERT(atomic_read(&root->will_be_snapshotted) >= 0);
> +       return atomic_add_unless(&root->will_be_snapshotted, 1, 1);
>  }

So if two writes call btrfs_start_write_no_snapshotting(), we end up
with root->will_be_snapshotted == 1.

One task calls btrfs_end_write_no_snapshotting(), it decrements it to
1 - we wake up the snapshot creation task while there's still one
nodatacow writer - this is incorrect.
Now the second task calls btrfs_end_write_no_snapshotting(), sees
root->will_be_snapshotted == 0, assertion failure.

>
>  void btrfs_wait_for_snapshot_creation(struct btrfs_root *root)
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 8774d4be7c97..f9f66c8a5dad 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -794,11 +794,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>          * possible. This is to avoid later writeback (running dealloc) to
>          * fallback to COW mode and unexpectedly fail with ENOSPC.
>          */
> -       atomic_inc(&root->will_be_snapshotted);
> -       smp_mb__after_atomic();
> -       /* wait for no snapshot writes */
> -       wait_event(root->subv_writers->wait,
> -                  percpu_counter_sum(&root->subv_writers->counter) == 0);
> +       btrfs_wait_for_snapshot_creation(root);

This naming is also confusing now. The task that creates a snapshot is
calling btrfs_wait_for_snapshot_creation(), waiting for itself?

>
>         ret = btrfs_start_delalloc_snapshot(root);
>         if (ret)
> @@ -878,8 +874,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>  dec_and_free:
>         if (snapshot_force_cow)
>                 atomic_dec(&root->snapshot_force_cow);
> -       if (atomic_dec_and_test(&root->will_be_snapshotted))
> -               wake_up_var(&root->will_be_snapshotted);
> +       btrfs_end_write_no_snapshotting(root);

Also confusing. We are not ending a write operation, we are ending
snapshot creation.

Thanks.

>  free_pending:
>         kfree(pending_snapshot->root_item);
>         btrfs_free_path(pending_snapshot->path);
> --
> 2.17.1
>
Filipe Manana April 24, 2019, 10:19 a.m. UTC | #2
On Wed, Apr 24, 2019 at 10:49 AM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Tue, Apr 23, 2019 at 12:43 PM Nikolay Borisov <nborisov@suse.com> wrote:
> >
> > BTRFS sports a mechanism to provide exclusion when a snapshot is about
> > to be created. This is implemented via btrfs_start_write_no_snapshotting
> > et al. Currently the implementation of that mechanism is some perverse
> > amalgamation of a percpu variable, an explicit waitqueue, an atomic_t
> > variable and an implicit wait bit on said atomic_t via wait_var_event
> > family of API. And for good measure there is a memory barrier thrown in
> > the mix...
> >
> > Astute reader should have concluded by now that it's bordering on
> > impossible to prove whether this scheme works. What's worse - all of
> > this is required to achieve something really simple - ensure certain
> > operations cannot run during snapshot creation. Let's simplify this by
> > relying on a single atomic_t used as a boolean flag.
>
> Nop, can't work as a boolean, see below.
>
> > This commit changes
> > only the implementation and not the semantics of the existing mechanism.
> >
> > Now, if the atomic is 1 (snapshot is in progress) callers of
> > btrfs_start_write_no_snapshotting will get a ret val of 0 that should be
> > handled accordingly.
> >
> > btrfs_wait_for_snapshot_creation OTOH will block until snapshotting is
> > in progress and return when current snapshot in progress is finished and
> > will acquire the right to create a snapshot.
> >
> > Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> > ---
> >  fs/btrfs/extent-tree.c | 20 +++++---------------
> >  fs/btrfs/ioctl.c       |  9 ++-------
> >  2 files changed, 7 insertions(+), 22 deletions(-)
> >
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 8f2b7b29c3fd..d9e2e35700fd 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -11333,25 +11333,15 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
> >   */
> >  void btrfs_end_write_no_snapshotting(struct btrfs_root *root)
> >  {
> > -       percpu_counter_dec(&root->subv_writers->counter);
> > -       cond_wake_up(&root->subv_writers->wait);
> > +       ASSERT(atomic_read(&root->will_be_snapshotted) == 1);
> > +       if (atomic_dec_and_test(&root->will_be_snapshotted))
> > +               wake_up_var(&root->will_be_snapshotted);
> >  }
> >
> >  int btrfs_start_write_no_snapshotting(struct btrfs_root *root)
> >  {
> > -       if (atomic_read(&root->will_be_snapshotted))
> > -               return 0;
> > -
> > -       percpu_counter_inc(&root->subv_writers->counter);
> > -       /*
> > -        * Make sure counter is updated before we check for snapshot creation.
> > -        */
> > -       smp_mb();
> > -       if (atomic_read(&root->will_be_snapshotted)) {
> > -               btrfs_end_write_no_snapshotting(root);
> > -               return 0;
> > -       }
> > -       return 1;
> > +       ASSERT(atomic_read(&root->will_be_snapshotted) >= 0);
> > +       return atomic_add_unless(&root->will_be_snapshotted, 1, 1);
> >  }
>
> So if two writes call btrfs_start_write_no_snapshotting(), we end up
> with root->will_be_snapshotted == 1.
>
> One task calls btrfs_end_write_no_snapshotting(), it decrements it to
> 1 - we wake up the snapshot creation task while there's still one
> nodatacow writer - this is incorrect.
> Now the second task calls btrfs_end_write_no_snapshotting(), sees
> root->will_be_snapshotted == 0, assertion failure.

Actually take that out, I ignored the return value of atomic_add_unless().

So this change does not allow for concurrent no snapshot writers anymore,
multiple tasks calling btrfs_start_write_no_snapshotting(), only one
succeeds and all the others fail,
so that's a regression from what we currently have.

The rest of the confusing names still applies.

>
> >
> >  void btrfs_wait_for_snapshot_creation(struct btrfs_root *root)
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 8774d4be7c97..f9f66c8a5dad 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -794,11 +794,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
> >          * possible. This is to avoid later writeback (running dealloc) to
> >          * fallback to COW mode and unexpectedly fail with ENOSPC.
> >          */
> > -       atomic_inc(&root->will_be_snapshotted);
> > -       smp_mb__after_atomic();
> > -       /* wait for no snapshot writes */
> > -       wait_event(root->subv_writers->wait,
> > -                  percpu_counter_sum(&root->subv_writers->counter) == 0);
> > +       btrfs_wait_for_snapshot_creation(root);
>
> This naming is also confusing now. The task that creates a snapshot is
> calling btrfs_wait_for_snapshot_creation(), waiting for itself?
>
> >
> >         ret = btrfs_start_delalloc_snapshot(root);
> >         if (ret)
> > @@ -878,8 +874,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
> >  dec_and_free:
> >         if (snapshot_force_cow)
> >                 atomic_dec(&root->snapshot_force_cow);
> > -       if (atomic_dec_and_test(&root->will_be_snapshotted))
> > -               wake_up_var(&root->will_be_snapshotted);
> > +       btrfs_end_write_no_snapshotting(root);
>
> Also confusing. We are not ending a write operation, we are ending
> snapshot creation.
>
> Thanks.
>
> >  free_pending:
> >         kfree(pending_snapshot->root_item);
> >         btrfs_free_path(pending_snapshot->path);
> > --
> > 2.17.1
> >
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8f2b7b29c3fd..d9e2e35700fd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11333,25 +11333,15 @@  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
  */
 void btrfs_end_write_no_snapshotting(struct btrfs_root *root)
 {
-	percpu_counter_dec(&root->subv_writers->counter);
-	cond_wake_up(&root->subv_writers->wait);
+	ASSERT(atomic_read(&root->will_be_snapshotted) == 1);
+	if (atomic_dec_and_test(&root->will_be_snapshotted))
+		wake_up_var(&root->will_be_snapshotted);
 }
 
 int btrfs_start_write_no_snapshotting(struct btrfs_root *root)
 {
-	if (atomic_read(&root->will_be_snapshotted))
-		return 0;
-
-	percpu_counter_inc(&root->subv_writers->counter);
-	/*
-	 * Make sure counter is updated before we check for snapshot creation.
-	 */
-	smp_mb();
-	if (atomic_read(&root->will_be_snapshotted)) {
-		btrfs_end_write_no_snapshotting(root);
-		return 0;
-	}
-	return 1;
+	ASSERT(atomic_read(&root->will_be_snapshotted) >= 0);
+	return atomic_add_unless(&root->will_be_snapshotted, 1, 1);
 }
 
 void btrfs_wait_for_snapshot_creation(struct btrfs_root *root)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8774d4be7c97..f9f66c8a5dad 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -794,11 +794,7 @@  static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 	 * possible. This is to avoid later writeback (running dealloc) to
 	 * fallback to COW mode and unexpectedly fail with ENOSPC.
 	 */
-	atomic_inc(&root->will_be_snapshotted);
-	smp_mb__after_atomic();
-	/* wait for no snapshot writes */
-	wait_event(root->subv_writers->wait,
-		   percpu_counter_sum(&root->subv_writers->counter) == 0);
+	btrfs_wait_for_snapshot_creation(root);
 
 	ret = btrfs_start_delalloc_snapshot(root);
 	if (ret)
@@ -878,8 +874,7 @@  static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 dec_and_free:
 	if (snapshot_force_cow)
 		atomic_dec(&root->snapshot_force_cow);
-	if (atomic_dec_and_test(&root->will_be_snapshotted))
-		wake_up_var(&root->will_be_snapshotted);
+	btrfs_end_write_no_snapshotting(root);
 free_pending:
 	kfree(pending_snapshot->root_item);
 	btrfs_free_path(pending_snapshot->path);