diff mbox series

btrfs: only run delayed refs if we're committing

Message ID 20181121191016.25224-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs: only run delayed refs if we're committing | expand

Commit Message

Josef Bacik Nov. 21, 2018, 7:10 p.m. UTC
I noticed in a giant dbench run that we spent a lot of time on lock
contention while running transaction commit.  This is because dbench
results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
they all run the delayed refs first thing, so they all contend with
each other.  This leads to seconds of 0 throughput.  Change this to only
run the delayed refs if we're the ones committing the transaction.  This
makes the latency go away and we get no more lock contention.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

Comments

Nikolay Borisov Nov. 22, 2018, 7:24 a.m. UTC | #1
On 21.11.18 г. 21:10 ч., Josef Bacik wrote:
> I noticed in a giant dbench run that we spent a lot of time on lock
> contention while running transaction commit.  This is because dbench
> results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
> they all run the delayed refs first thing, so they all contend with
> each other.  This leads to seconds of 0 throughput.  Change this to only
> run the delayed refs if we're the ones committing the transaction.  This
> makes the latency go away and we get no more lock contention.
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Wohooo, we no longer run delayed refs at arbitrary points during
transaction commit.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/transaction.c | 24 +++++++++---------------
>  1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3c1be9db897c..41cc96cc59a3 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1918,15 +1918,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  	btrfs_trans_release_metadata(trans);
>  	trans->block_rsv = NULL;
>  
> -	/* make a pass through all the delayed refs we have so far
> -	 * any runnings procs may add more while we are here
> -	 */
> -	ret = btrfs_run_delayed_refs(trans, 0);
> -	if (ret) {
> -		btrfs_end_transaction(trans);
> -		return ret;
> -	}
> -
>  	cur_trans = trans->transaction;
>  
>  	/*
> @@ -1938,12 +1929,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  
>  	btrfs_create_pending_block_groups(trans);
>  
> -	ret = btrfs_run_delayed_refs(trans, 0);
> -	if (ret) {
> -		btrfs_end_transaction(trans);
> -		return ret;
> -	}
> -
>  	if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) {
>  		int run_it = 0;
>  
> @@ -2014,6 +1999,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  		spin_unlock(&fs_info->trans_lock);
>  	}
>  
> +	/*
> +	 * We are now the only one in the commit area, we can run delayed refs
> +	 * without hitting a bunch of lock contention from a lot of people
> +	 * trying to commit the transaction at once.
> +	 */
> +	ret = btrfs_run_delayed_refs(trans, 0);
> +	if (ret)
> +		goto cleanup_transaction;
> +
>  	extwriter_counter_dec(cur_trans, trans->type);
>  
>  	ret = btrfs_start_delalloc_flush(fs_info);
>
Filipe Manana Nov. 23, 2018, 4:59 p.m. UTC | #2
On Thu, Nov 22, 2018 at 12:35 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> I noticed in a giant dbench run that we spent a lot of time on lock
> contention while running transaction commit.  This is because dbench
> results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
> they all run the delayed refs first thing, so they all contend with
> each other.  This leads to seconds of 0 throughput.  Change this to only
> run the delayed refs if we're the ones committing the transaction.  This
> makes the latency go away and we get no more lock contention.

Can you share the following in the changelog please?

1) How did you ran dbench (parameters, config).

2) What results did you get before and after this change. So that we all get
    an idea of how good the impact is.

While the reduced contention makes all sense and seems valid, I'm not
sure this is always a win.
It certainly is when multiple tasks are calling
btrfs_commit_transaction() simultaneously, but,
what about when only one does it?

By running all delayed references inside the critical section of the
transaction commit
(state == TRANS_STATE_COMMIT_START), instead of running most of them
outside/before,
we will be blocking for a longer a time other tasks calling
btrfs_start_transaction() (which is used
a lot - creating files, unlinking files, adding links, etc, and even fsync).

Won't there by any other types of workload and tests other then dbench
that can get increased
latency and/or smaller throughput?

I find that sort of information useful to have in the changelog. If
you verified that or you think
it's irrelevant to measure/consider, it would be great to have it
mentioned in the changelog
(and explained).

Thanks.

>
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/transaction.c | 24 +++++++++---------------
>  1 file changed, 9 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3c1be9db897c..41cc96cc59a3 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1918,15 +1918,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>         btrfs_trans_release_metadata(trans);
>         trans->block_rsv = NULL;
>
> -       /* make a pass through all the delayed refs we have so far
> -        * any runnings procs may add more while we are here
> -        */
> -       ret = btrfs_run_delayed_refs(trans, 0);
> -       if (ret) {
> -               btrfs_end_transaction(trans);
> -               return ret;
> -       }
> -
>         cur_trans = trans->transaction;
>
>         /*
> @@ -1938,12 +1929,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>
>         btrfs_create_pending_block_groups(trans);
>
> -       ret = btrfs_run_delayed_refs(trans, 0);
> -       if (ret) {
> -               btrfs_end_transaction(trans);
> -               return ret;
> -       }
> -
>         if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) {
>                 int run_it = 0;
>
> @@ -2014,6 +1999,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>                 spin_unlock(&fs_info->trans_lock);
>         }
>
> +       /*
> +        * We are now the only one in the commit area, we can run delayed refs
> +        * without hitting a bunch of lock contention from a lot of people
> +        * trying to commit the transaction at once.
> +        */
> +       ret = btrfs_run_delayed_refs(trans, 0);
> +       if (ret)
> +               goto cleanup_transaction;
> +
>         extwriter_counter_dec(cur_trans, trans->type);
>
>         ret = btrfs_start_delalloc_flush(fs_info);
> --
> 2.14.3
>
Josef Bacik Nov. 27, 2018, 7:22 p.m. UTC | #3
On Fri, Nov 23, 2018 at 04:59:32PM +0000, Filipe Manana wrote:
> On Thu, Nov 22, 2018 at 12:35 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > I noticed in a giant dbench run that we spent a lot of time on lock
> > contention while running transaction commit.  This is because dbench
> > results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
> > they all run the delayed refs first thing, so they all contend with
> > each other.  This leads to seconds of 0 throughput.  Change this to only
> > run the delayed refs if we're the ones committing the transaction.  This
> > makes the latency go away and we get no more lock contention.
> 
> Can you share the following in the changelog please?
> 
> 1) How did you ran dbench (parameters, config).
> 
> 2) What results did you get before and after this change. So that we all get
>     an idea of how good the impact is.
> 
> While the reduced contention makes all sense and seems valid, I'm not
> sure this is always a win.
> It certainly is when multiple tasks are calling
> btrfs_commit_transaction() simultaneously, but,
> what about when only one does it?
> 
> By running all delayed references inside the critical section of the
> transaction commit
> (state == TRANS_STATE_COMMIT_START), instead of running most of them
> outside/before,
> we will be blocking for a longer a time other tasks calling
> btrfs_start_transaction() (which is used
> a lot - creating files, unlinking files, adding links, etc, and even fsync).
> 
> Won't there by any other types of workload and tests other then dbench
> that can get increased
> latency and/or smaller throughput?
> 
> I find that sort of information useful to have in the changelog. If
> you verified that or you think
> it's irrelevant to measure/consider, it would be great to have it
> mentioned in the changelog
> (and explained).
> 

Yeah I thought about the delayed refs being run in the critical section now,
that's not awesome.  I'll drop this for now, I think just having a mutex around
running delayed refs will be good enough, since we want people who care about
flushing delayed refs to wait around for that to finish happening.  Thanks,

Josef
Filipe Manana Nov. 27, 2018, 7:43 p.m. UTC | #4
On Tue, Nov 27, 2018 at 7:22 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Fri, Nov 23, 2018 at 04:59:32PM +0000, Filipe Manana wrote:
> > On Thu, Nov 22, 2018 at 12:35 AM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > I noticed in a giant dbench run that we spent a lot of time on lock
> > > contention while running transaction commit.  This is because dbench
> > > results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
> > > they all run the delayed refs first thing, so they all contend with
> > > each other.  This leads to seconds of 0 throughput.  Change this to only
> > > run the delayed refs if we're the ones committing the transaction.  This
> > > makes the latency go away and we get no more lock contention.
> >
> > Can you share the following in the changelog please?
> >
> > 1) How did you ran dbench (parameters, config).
> >
> > 2) What results did you get before and after this change. So that we all get
> >     an idea of how good the impact is.
> >
> > While the reduced contention makes all sense and seems valid, I'm not
> > sure this is always a win.
> > It certainly is when multiple tasks are calling
> > btrfs_commit_transaction() simultaneously, but,
> > what about when only one does it?
> >
> > By running all delayed references inside the critical section of the
> > transaction commit
> > (state == TRANS_STATE_COMMIT_START), instead of running most of them
> > outside/before,
> > we will be blocking for a longer a time other tasks calling
> > btrfs_start_transaction() (which is used
> > a lot - creating files, unlinking files, adding links, etc, and even fsync).
> >
> > Won't there by any other types of workload and tests other then dbench
> > that can get increased
> > latency and/or smaller throughput?
> >
> > I find that sort of information useful to have in the changelog. If
> > you verified that or you think
> > it's irrelevant to measure/consider, it would be great to have it
> > mentioned in the changelog
> > (and explained).
> >
>
> Yeah I thought about the delayed refs being run in the critical section now,
> that's not awesome.  I'll drop this for now, I think just having a mutex around
> running delayed refs will be good enough, since we want people who care about
> flushing delayed refs to wait around for that to finish happening.  Thanks,

Well, I think we can have a solution that doesn't bring such trade-off
nor introducing a mutex.
We could do like what is currently done for writing space caches, to
make sure only the first task
calling commit transaction does the work and all others do nothing
except waiting for the commit to finish:

btrfs_commit_transaction()
   if (!test_and_set_bit(BTRFS_TRANS_COMMIT_START, &cur_trans->flags)) {
       run delayed refs before entering critical section
   }

thanks

>
> Josef
Josef Bacik Nov. 27, 2018, 7:56 p.m. UTC | #5
On Tue, Nov 27, 2018 at 07:43:39PM +0000, Filipe Manana wrote:
> On Tue, Nov 27, 2018 at 7:22 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Fri, Nov 23, 2018 at 04:59:32PM +0000, Filipe Manana wrote:
> > > On Thu, Nov 22, 2018 at 12:35 AM Josef Bacik <josef@toxicpanda.com> wrote:
> > > >
> > > > I noticed in a giant dbench run that we spent a lot of time on lock
> > > > contention while running transaction commit.  This is because dbench
> > > > results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
> > > > they all run the delayed refs first thing, so they all contend with
> > > > each other.  This leads to seconds of 0 throughput.  Change this to only
> > > > run the delayed refs if we're the ones committing the transaction.  This
> > > > makes the latency go away and we get no more lock contention.
> > >
> > > Can you share the following in the changelog please?
> > >
> > > 1) How did you ran dbench (parameters, config).
> > >
> > > 2) What results did you get before and after this change. So that we all get
> > >     an idea of how good the impact is.
> > >
> > > While the reduced contention makes all sense and seems valid, I'm not
> > > sure this is always a win.
> > > It certainly is when multiple tasks are calling
> > > btrfs_commit_transaction() simultaneously, but,
> > > what about when only one does it?
> > >
> > > By running all delayed references inside the critical section of the
> > > transaction commit
> > > (state == TRANS_STATE_COMMIT_START), instead of running most of them
> > > outside/before,
> > > we will be blocking for a longer a time other tasks calling
> > > btrfs_start_transaction() (which is used
> > > a lot - creating files, unlinking files, adding links, etc, and even fsync).
> > >
> > > Won't there by any other types of workload and tests other then dbench
> > > that can get increased
> > > latency and/or smaller throughput?
> > >
> > > I find that sort of information useful to have in the changelog. If
> > > you verified that or you think
> > > it's irrelevant to measure/consider, it would be great to have it
> > > mentioned in the changelog
> > > (and explained).
> > >
> >
> > Yeah I thought about the delayed refs being run in the critical section now,
> > that's not awesome.  I'll drop this for now, I think just having a mutex around
> > running delayed refs will be good enough, since we want people who care about
> > flushing delayed refs to wait around for that to finish happening.  Thanks,
> 
> Well, I think we can have a solution that doesn't bring such trade-off
> nor introducing a mutex.
> We could do like what is currently done for writing space caches, to
> make sure only the first task
> calling commit transaction does the work and all others do nothing
> except waiting for the commit to finish:
> 
> btrfs_commit_transaction()
>    if (!test_and_set_bit(BTRFS_TRANS_COMMIT_START, &cur_trans->flags)) {
>        run delayed refs before entering critical section
>    }
> 

That was my first inclination but then one of the other committers goes past
this and gets into the start TRANS_STATE_COMMIT_START place and has to wait for
the guy running the delayed refs to finish before moving forward, essentially
extending the time in the critical section for the same reason.  The mutex means
that if 9000 people try to commit the transaction, 1 guy gets to run the delayed
refs and everybody else waits for him to finish, and in the meantime the
critical section is small.  Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3c1be9db897c..41cc96cc59a3 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1918,15 +1918,6 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
 
-	/* make a pass through all the delayed refs we have so far
-	 * any runnings procs may add more while we are here
-	 */
-	ret = btrfs_run_delayed_refs(trans, 0);
-	if (ret) {
-		btrfs_end_transaction(trans);
-		return ret;
-	}
-
 	cur_trans = trans->transaction;
 
 	/*
@@ -1938,12 +1929,6 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	btrfs_create_pending_block_groups(trans);
 
-	ret = btrfs_run_delayed_refs(trans, 0);
-	if (ret) {
-		btrfs_end_transaction(trans);
-		return ret;
-	}
-
 	if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) {
 		int run_it = 0;
 
@@ -2014,6 +1999,15 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		spin_unlock(&fs_info->trans_lock);
 	}
 
+	/*
+	 * We are now the only one in the commit area, we can run delayed refs
+	 * without hitting a bunch of lock contention from a lot of people
+	 * trying to commit the transaction at once.
+	 */
+	ret = btrfs_run_delayed_refs(trans, 0);
+	if (ret)
+		goto cleanup_transaction;
+
 	extwriter_counter_dec(cur_trans, trans->type);
 
 	ret = btrfs_start_delalloc_flush(fs_info);