Message ID | 20180830174225.2200-22-josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | My current patch queue | expand |
On Thu, Aug 30, 2018 at 01:42:11PM -0400, Josef Bacik wrote: > I noticed in a giant dbench run that we spent a lot of time on lock > contention while running transaction commit. This is because dbench > results in a lot of fsync()'s that do a btrfs_transaction_commit(), and > they all run the delayed refs first thing, so they all contend with > each other. This leads to seconds of 0 throughput. Change this to only > run the delayed refs if we're the ones committing the transaction. This > makes the latency go away and we get no more lock contention. This means that we're going to spend more time running delayed refs while in TRANS_STATE_COMMIT_START, so couldn't we end up blocking new transactions more than before? > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/transaction.c | 24 +++++++++--------------- > 1 file changed, 9 insertions(+), 15 deletions(-) > > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index ebb0c0405598..2bb19e2ded5e 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -1918,15 +1918,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) > btrfs_trans_release_metadata(trans); > trans->block_rsv = NULL; > > - /* make a pass through all the delayed refs we have so far > - * any runnings procs may add more while we are here > - */ > - ret = btrfs_run_delayed_refs(trans, 0); > - if (ret) { > - btrfs_end_transaction(trans); > - return ret; > - } > - > cur_trans = trans->transaction; > > /* > @@ -1939,12 +1930,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) > if (!list_empty(&trans->new_bgs)) > btrfs_create_pending_block_groups(trans); > > - ret = btrfs_run_delayed_refs(trans, 0); > - if (ret) { > - btrfs_end_transaction(trans); > - return ret; > - } > - > if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) { > int run_it = 0; > > @@ -2015,6 +2000,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) > spin_unlock(&fs_info->trans_lock); > } > > + /* > + * We are now the only one in the commit area, we can run delayed refs > + * without hitting a bunch of lock contention from a lot of people > + * trying to commit the transaction at once. > + */ > + ret = btrfs_run_delayed_refs(trans, 0); > + if (ret) > + goto cleanup_transaction; > + > extwriter_counter_dec(cur_trans, trans->type); > > ret = btrfs_start_delalloc_flush(fs_info); > -- > 2.14.3 >
On Fri, Aug 31, 2018 at 05:28:09PM -0700, Omar Sandoval wrote: > On Thu, Aug 30, 2018 at 01:42:11PM -0400, Josef Bacik wrote: > > I noticed in a giant dbench run that we spent a lot of time on lock > > contention while running transaction commit. This is because dbench > > results in a lot of fsync()'s that do a btrfs_transaction_commit(), and > > they all run the delayed refs first thing, so they all contend with > > each other. This leads to seconds of 0 throughput. Change this to only > > run the delayed refs if we're the ones committing the transaction. This > > makes the latency go away and we get no more lock contention. > > This means that we're going to spend more time running delayed refs > while in TRANS_STATE_COMMIT_START, so couldn't we end up blocking new > transactions more than before? > You'd think that, but the lock contention is enough that it makes it unfuckingpossible for anything to run for several seconds while everybody competes for either the delayed refs lock or the extent root lock. With the delayed refs rsv we actually end up running the delayed refs often enough because of the extra ENOSPC pressure that we don't really end up with long chunks of time running delayed refs while blocking out START transactions. If at some point down the line this turns out to be an actual issue we can revisit the best way to do this. Off the top of my head we do something like wrap it in a "run all the delayed refs" mutex so that all the committers just wait on whoever wins, and we move it back outside of the start logic in order to make it better all the way around. But I don't think that's something we need to do at this point. Thanks, Josef
On Tue, Sep 04, 2018 at 01:54:13PM -0400, Josef Bacik wrote: > On Fri, Aug 31, 2018 at 05:28:09PM -0700, Omar Sandoval wrote: > > On Thu, Aug 30, 2018 at 01:42:11PM -0400, Josef Bacik wrote: > > > I noticed in a giant dbench run that we spent a lot of time on lock > > > contention while running transaction commit. This is because dbench > > > results in a lot of fsync()'s that do a btrfs_transaction_commit(), and > > > they all run the delayed refs first thing, so they all contend with > > > each other. This leads to seconds of 0 throughput. Change this to only > > > run the delayed refs if we're the ones committing the transaction. This > > > makes the latency go away and we get no more lock contention. > > > > This means that we're going to spend more time running delayed refs > > while in TRANS_STATE_COMMIT_START, so couldn't we end up blocking new > > transactions more than before? > > > > You'd think that, but the lock contention is enough that it makes it > unfuckingpossible for anything to run for several seconds while everybody > competes for either the delayed refs lock or the extent root lock. > > With the delayed refs rsv we actually end up running the delayed refs often > enough because of the extra ENOSPC pressure that we don't really end up with > long chunks of time running delayed refs while blocking out START transactions. > > If at some point down the line this turns out to be an actual issue we can > revisit the best way to do this. Off the top of my head we do something like > wrap it in a "run all the delayed refs" mutex so that all the committers just > wait on whoever wins, and we move it back outside of the start logic in order to > make it better all the way around. But I don't think that's something we need > to do at this point. Thanks, Ok, that's good enough for me. Reviewed-by: Omar Sandoval <osandov@fb.com>
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index ebb0c0405598..2bb19e2ded5e 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1918,15 +1918,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) btrfs_trans_release_metadata(trans); trans->block_rsv = NULL; - /* make a pass through all the delayed refs we have so far - * any runnings procs may add more while we are here - */ - ret = btrfs_run_delayed_refs(trans, 0); - if (ret) { - btrfs_end_transaction(trans); - return ret; - } - cur_trans = trans->transaction; /* @@ -1939,12 +1930,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) if (!list_empty(&trans->new_bgs)) btrfs_create_pending_block_groups(trans); - ret = btrfs_run_delayed_refs(trans, 0); - if (ret) { - btrfs_end_transaction(trans); - return ret; - } - if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) { int run_it = 0; @@ -2015,6 +2000,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) spin_unlock(&fs_info->trans_lock); } + /* + * We are now the only one in the commit area, we can run delayed refs + * without hitting a bunch of lock contention from a lot of people + * trying to commit the transaction at once. + */ + ret = btrfs_run_delayed_refs(trans, 0); + if (ret) + goto cleanup_transaction; + extwriter_counter_dec(cur_trans, trans->type); ret = btrfs_start_delalloc_flush(fs_info);
I noticed in a giant dbench run that we spent a lot of time on lock contention while running transaction commit. This is because dbench results in a lot of fsync()'s that do a btrfs_transaction_commit(), and they all run the delayed refs first thing, so they all contend with each other. This leads to seconds of 0 throughput. Change this to only run the delayed refs if we're the ones committing the transaction. This makes the latency go away and we get no more lock contention. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/transaction.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-)