[6/8] btrfs: rework wake_all_tickets

Message ID	20190816141952.19369-7-josef@toxicpanda.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Josef Bacik <josef@toxicpanda.com> To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 6/8] btrfs: rework wake_all_tickets Date: Fri, 16 Aug 2019 10:19:50 -0400 Message-Id: <20190816141952.19369-7-josef@toxicpanda.com> In-Reply-To: <20190816141952.19369-1-josef@toxicpanda.com> References: <20190816141952.19369-1-josef@toxicpanda.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	Rework reserve ticket handling \| expand [0/8,v2] Rework reserve ticket handling [1/8] btrfs: do not allow reservations if we have pending tickets [2/8] btrfs: roll tracepoint into btrfs_space_info_update helper [3/8] btrfs: add space reservation tracepoint for reserved bytes [4/8] btrfs: rework btrfs_space_info_add_old_bytes [5/8] btrfs: refactor the ticket wakeup code [6/8] btrfs: rework wake_all_tickets [7/8] btrfs: fix may_commit_transaction to deal with no partial filling [8/8] btrfs: remove orig_bytes from reserve_ticket

Message ID

20190816141952.19369-7-josef@toxicpanda.com (mailing list archive)

State

New, archived

Headers

From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 6/8] btrfs: rework wake_all_tickets
Date: Fri, 16 Aug 2019 10:19:50 -0400
Message-Id: <20190816141952.19369-7-josef@toxicpanda.com>
In-Reply-To: <20190816141952.19369-1-josef@toxicpanda.com>
References: <20190816141952.19369-1-josef@toxicpanda.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Series

Rework reserve ticket handling | expand

Commit Message

Josef Bacik Aug. 16, 2019, 2:19 p.m. UTC

Now that we no longer partially fill tickets we need to rework
wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
if any subsequent tickets are able to be satisfied.  If our tickets_id
changes we know something happened and we can keep flushing.

Also if we find a ticket that is smaller than the first ticket in our
queue then we want to retry the flushing loop again in case
may_commit_transaction() decides we could satisfy the ticket by
committing the transaction.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

Comments

Nikolay Borisov Aug. 19, 2019, 2:49 p.m. UTC | #1

On 16.08.19 г. 17:19 ч., Josef Bacik wrote:
> Now that we no longer partially fill tickets we need to rework
> wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
> if any subsequent tickets are able to be satisfied.  If our tickets_id
> changes we know something happened and we can keep flushing.
> 
> Also if we find a ticket that is smaller than the first ticket in our
> queue then we want to retry the flushing loop again in case
> may_commit_transaction() decides we could satisfy the ticket by
> committing the transaction.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++-------
>  1 file changed, 27 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 8a1c7ada67cb..bd485be783b8 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
>  		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
>  }
>  
> -static bool wake_all_tickets(struct list_head *head)
> +static bool wake_all_tickets(struct btrfs_fs_info *fs_info,
> +			     struct btrfs_space_info *space_info)
>  {
>  	struct reserve_ticket *ticket;
> +	u64 tickets_id = space_info->tickets_id;
> +	u64 first_ticket_bytes = 0;
> +
> +	while (!list_empty(&space_info->tickets) &&
> +	       tickets_id == space_info->tickets_id) {
> +		ticket = list_first_entry(&space_info->tickets,
> +					  struct reserve_ticket, list);
> +
> +		/*
> +		 * may_commit_transaction will avoid committing the transaction
> +		 * if it doesn't feel like the space reclaimed by the commit
> +		 * would result in the ticket succeeding.  However if we have a
> +		 * smaller ticket in the queue it may be small enough to be
> +		 * satisified by committing the transaction, so if any
> +		 * subsequent ticket is smaller than the first ticket go ahead
> +		 * and send us back for another loop through the enospc flushing
> +		 * code.
> +		 */
> +		if (first_ticket_bytes == 0)
> +			first_ticket_bytes = ticket->bytes;
> +		else if (first_ticket_bytes > ticket->bytes)
> +			return true;
>  
> -	while (!list_empty(head)) {
> -		ticket = list_first_entry(head, struct reserve_ticket, list);
>  		list_del_init(&ticket->list);
>  		ticket->error = -ENOSPC;
>  		wake_up(&ticket->wait);
> -		if (ticket->bytes != ticket->orig_bytes)
> -			return true;
> +		btrfs_try_to_wakeup_tickets(fs_info, space_info);

So the change in this logic is directly related to the implementation of
btrfs_try_to_wakeup_tickets. Because when we fail and remove a ticket in
this function we give a chance that the next ticket *could* be
satisfied. But how well does that work in practice, given you fail
normal prio tickets here, whereas btrfs_try_to_wakeup_tickets first
checks the prio ticket. So even if you are failing normal ticket but
there is one unsatifiable prio ticket that won't really change anything.

>  	}
> -	return false;
> +	return (tickets_id != space_info->tickets_id);
>  }
>  
>  /*
> @@ -756,7 +776,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
>  		if (flush_state > COMMIT_TRANS) {
>  			commit_cycles++;
>  			if (commit_cycles > 2) {
> -				if (wake_all_tickets(&space_info->tickets)) {
> +				if (wake_all_tickets(fs_info, space_info)) {
>  					flush_state = FLUSH_DELAYED_ITEMS_NR;
>  					commit_cycles--;
>  				} else {
>

Josef Bacik Aug. 19, 2019, 3:06 p.m. UTC | #2

On Mon, Aug 19, 2019 at 05:49:45PM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.08.19 г. 17:19 ч., Josef Bacik wrote:
> > Now that we no longer partially fill tickets we need to rework
> > wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
> > if any subsequent tickets are able to be satisfied.  If our tickets_id
> > changes we know something happened and we can keep flushing.
> > 
> > Also if we find a ticket that is smaller than the first ticket in our
> > queue then we want to retry the flushing loop again in case
> > may_commit_transaction() decides we could satisfy the ticket by
> > committing the transaction.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++-------
> >  1 file changed, 27 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> > index 8a1c7ada67cb..bd485be783b8 100644
> > --- a/fs/btrfs/space-info.c
> > +++ b/fs/btrfs/space-info.c
> > @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
> >  		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
> >  }
> >  
> > -static bool wake_all_tickets(struct list_head *head)
> > +static bool wake_all_tickets(struct btrfs_fs_info *fs_info,
> > +			     struct btrfs_space_info *space_info)
> >  {
> >  	struct reserve_ticket *ticket;
> > +	u64 tickets_id = space_info->tickets_id;
> > +	u64 first_ticket_bytes = 0;
> > +
> > +	while (!list_empty(&space_info->tickets) &&
> > +	       tickets_id == space_info->tickets_id) {
> > +		ticket = list_first_entry(&space_info->tickets,
> > +					  struct reserve_ticket, list);
> > +
> > +		/*
> > +		 * may_commit_transaction will avoid committing the transaction
> > +		 * if it doesn't feel like the space reclaimed by the commit
> > +		 * would result in the ticket succeeding.  However if we have a
> > +		 * smaller ticket in the queue it may be small enough to be
> > +		 * satisified by committing the transaction, so if any
> > +		 * subsequent ticket is smaller than the first ticket go ahead
> > +		 * and send us back for another loop through the enospc flushing
> > +		 * code.
> > +		 */
> > +		if (first_ticket_bytes == 0)
> > +			first_ticket_bytes = ticket->bytes;
> > +		else if (first_ticket_bytes > ticket->bytes)
> > +			return true;
> >  
> > -	while (!list_empty(head)) {
> > -		ticket = list_first_entry(head, struct reserve_ticket, list);
> >  		list_del_init(&ticket->list);
> >  		ticket->error = -ENOSPC;
> >  		wake_up(&ticket->wait);
> > -		if (ticket->bytes != ticket->orig_bytes)
> > -			return true;
> > +		btrfs_try_to_wakeup_tickets(fs_info, space_info);
> 
> So the change in this logic is directly related to the implementation of
> btrfs_try_to_wakeup_tickets. Because when we fail and remove a ticket in
> this function we give a chance that the next ticket *could* be
> satisfied. But how well does that work in practice, given you fail
> normal prio tickets here, whereas btrfs_try_to_wakeup_tickets first
> checks the prio ticket. So even if you are failing normal ticket but
> there is one unsatifiable prio ticket that won't really change anything.

In practice we don't get to this state with high priority tickets on the list.
Anything that would be long-ish term on the priority list is evict, and we wait
for iput()'s in the normal flushing code.  At the point we hit wake_all_tickets
we generally should only have tickets on the normal list.

I suppose we could possibly get into this situation, but again the high priority
tickets are going to be evict, truncate block, and relocate, which all have
significantly lower reservation amounts than things like create or unlink.  If
those things are unable to get reservations then we are truly out of space.
Thanks,

Josef

Nikolay Borisov Aug. 20, 2019, 7:51 a.m. UTC | #3

On 19.08.19 г. 18:06 ч., Josef Bacik wrote:
> On Mon, Aug 19, 2019 at 05:49:45PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 16.08.19 г. 17:19 ч., Josef Bacik wrote:
>>> Now that we no longer partially fill tickets we need to rework
>>> wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see
>>> if any subsequent tickets are able to be satisfied.  If our tickets_id
>>> changes we know something happened and we can keep flushing.
>>>
>>> Also if we find a ticket that is smaller than the first ticket in our
>>> queue then we want to retry the flushing loop again in case
>>> may_commit_transaction() decides we could satisfy the ticket by
>>> committing the transaction.
>>>
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>>  fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++-------
>>>  1 file changed, 27 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>>> index 8a1c7ada67cb..bd485be783b8 100644
>>> --- a/fs/btrfs/space-info.c
>>> +++ b/fs/btrfs/space-info.c
>>> @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
>>>  		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
>>>  }
>>>  
>>> -static bool wake_all_tickets(struct list_head *head)
>>> +static bool wake_all_tickets(struct btrfs_fs_info *fs_info,
>>> +			     struct btrfs_space_info *space_info)
>>>  {
>>>  	struct reserve_ticket *ticket;
>>> +	u64 tickets_id = space_info->tickets_id;
>>> +	u64 first_ticket_bytes = 0;
>>> +
>>> +	while (!list_empty(&space_info->tickets) &&
>>> +	       tickets_id == space_info->tickets_id) {
>>> +		ticket = list_first_entry(&space_info->tickets,
>>> +					  struct reserve_ticket, list);
>>> +
>>> +		/*
>>> +		 * may_commit_transaction will avoid committing the transaction
>>> +		 * if it doesn't feel like the space reclaimed by the commit
>>> +		 * would result in the ticket succeeding.  However if we have a
>>> +		 * smaller ticket in the queue it may be small enough to be
>>> +		 * satisified by committing the transaction, so if any
>>> +		 * subsequent ticket is smaller than the first ticket go ahead
>>> +		 * and send us back for another loop through the enospc flushing
>>> +		 * code.
>>> +		 */
>>> +		if (first_ticket_bytes == 0)
>>> +			first_ticket_bytes = ticket->bytes;
>>> +		else if (first_ticket_bytes > ticket->bytes)
>>> +			return true;
>>>  
>>> -	while (!list_empty(head)) {
>>> -		ticket = list_first_entry(head, struct reserve_ticket, list);
>>>  		list_del_init(&ticket->list);
>>>  		ticket->error = -ENOSPC;
>>>  		wake_up(&ticket->wait);
>>> -		if (ticket->bytes != ticket->orig_bytes)
>>> -			return true;
>>> +		btrfs_try_to_wakeup_tickets(fs_info, space_info);
>>
>> So the change in this logic is directly related to the implementation of
>> btrfs_try_to_wakeup_tickets. Because when we fail and remove a ticket in
>> this function we give a chance that the next ticket *could* be
>> satisfied. But how well does that work in practice, given you fail
>> normal prio tickets here, whereas btrfs_try_to_wakeup_tickets first
>> checks the prio ticket. So even if you are failing normal ticket but
>> there is one unsatifiable prio ticket that won't really change anything.
> 
> In practice we don't get to this state with high priority tickets on the list.
> Anything that would be long-ish term on the priority list is evict, and we wait
> for iput()'s in the normal flushing code.  At the point we hit wake_all_tickets
> we generally should only have tickets on the normal list.

Be that as it may, I think this assumption needs to be codified via an
assert or WARN_ON.

> 
> I suppose we could possibly get into this situation, but again the high priority
> tickets are going to be evict, truncate block, and relocate, which all have
> significantly lower reservation amounts than things like create or unlink.  If
> those things are unable to get reservations then we are truly out of space.
> Thanks,
> 
> Josef
>

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 8a1c7ada67cb..bd485be783b8 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -676,19 +676,39 @@  static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info,
 		!test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
 }
 
-static bool wake_all_tickets(struct list_head *head)
+static bool wake_all_tickets(struct btrfs_fs_info *fs_info,
+			     struct btrfs_space_info *space_info)
 {
 	struct reserve_ticket *ticket;
+	u64 tickets_id = space_info->tickets_id;
+	u64 first_ticket_bytes = 0;
+
+	while (!list_empty(&space_info->tickets) &&
+	       tickets_id == space_info->tickets_id) {
+		ticket = list_first_entry(&space_info->tickets,
+					  struct reserve_ticket, list);
+
+		/*
+		 * may_commit_transaction will avoid committing the transaction
+		 * if it doesn't feel like the space reclaimed by the commit
+		 * would result in the ticket succeeding.  However if we have a
+		 * smaller ticket in the queue it may be small enough to be
+		 * satisified by committing the transaction, so if any
+		 * subsequent ticket is smaller than the first ticket go ahead
+		 * and send us back for another loop through the enospc flushing
+		 * code.
+		 */
+		if (first_ticket_bytes == 0)
+			first_ticket_bytes = ticket->bytes;
+		else if (first_ticket_bytes > ticket->bytes)
+			return true;
 
-	while (!list_empty(head)) {
-		ticket = list_first_entry(head, struct reserve_ticket, list);
 		list_del_init(&ticket->list);
 		ticket->error = -ENOSPC;
 		wake_up(&ticket->wait);
-		if (ticket->bytes != ticket->orig_bytes)
-			return true;
+		btrfs_try_to_wakeup_tickets(fs_info, space_info);
 	}
-	return false;
+	return (tickets_id != space_info->tickets_id);
 }
 
 /*
@@ -756,7 +776,7 @@  static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
 		if (flush_state > COMMIT_TRANS) {
 			commit_cycles++;
 			if (commit_cycles > 2) {
-				if (wake_all_tickets(&space_info->tickets)) {
+				if (wake_all_tickets(fs_info, space_info)) {
 					flush_state = FLUSH_DELAYED_ITEMS_NR;
 					commit_cycles--;
 				} else {

[6/8] btrfs: rework wake_all_tickets

Commit Message

Comments

Patch