Message ID | 20190816141952.19369-7-josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Rework reserve ticket handling | expand |
On 16.08.19 г. 17:19 ч., Josef Bacik wrote: > Now that we no longer partially fill tickets we need to rework > wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see > if any subsequent tickets are able to be satisfied. If our tickets_id > changes we know something happened and we can keep flushing. > > Also if we find a ticket that is smaller than the first ticket in our > queue then we want to retry the flushing loop again in case > may_commit_transaction() decides we could satisfy the ticket by > committing the transaction. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++------- > 1 file changed, 27 insertions(+), 7 deletions(-) > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > index 8a1c7ada67cb..bd485be783b8 100644 > --- a/fs/btrfs/space-info.c > +++ b/fs/btrfs/space-info.c > @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info, > !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)); > } > > -static bool wake_all_tickets(struct list_head *head) > +static bool wake_all_tickets(struct btrfs_fs_info *fs_info, > + struct btrfs_space_info *space_info) > { > struct reserve_ticket *ticket; > + u64 tickets_id = space_info->tickets_id; > + u64 first_ticket_bytes = 0; > + > + while (!list_empty(&space_info->tickets) && > + tickets_id == space_info->tickets_id) { > + ticket = list_first_entry(&space_info->tickets, > + struct reserve_ticket, list); > + > + /* > + * may_commit_transaction will avoid committing the transaction > + * if it doesn't feel like the space reclaimed by the commit > + * would result in the ticket succeeding. However if we have a > + * smaller ticket in the queue it may be small enough to be > + * satisified by committing the transaction, so if any > + * subsequent ticket is smaller than the first ticket go ahead > + * and send us back for another loop through the enospc flushing > + * code. > + */ > + if (first_ticket_bytes == 0) > + first_ticket_bytes = ticket->bytes; > + else if (first_ticket_bytes > ticket->bytes) > + return true; > > - while (!list_empty(head)) { > - ticket = list_first_entry(head, struct reserve_ticket, list); > list_del_init(&ticket->list); > ticket->error = -ENOSPC; > wake_up(&ticket->wait); > - if (ticket->bytes != ticket->orig_bytes) > - return true; > + btrfs_try_to_wakeup_tickets(fs_info, space_info); So the change in this logic is directly related to the implementation of btrfs_try_to_wakeup_tickets. Because when we fail and remove a ticket in this function we give a chance that the next ticket *could* be satisfied. But how well does that work in practice, given you fail normal prio tickets here, whereas btrfs_try_to_wakeup_tickets first checks the prio ticket. So even if you are failing normal ticket but there is one unsatifiable prio ticket that won't really change anything. > } > - return false; > + return (tickets_id != space_info->tickets_id); > } > > /* > @@ -756,7 +776,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) > if (flush_state > COMMIT_TRANS) { > commit_cycles++; > if (commit_cycles > 2) { > - if (wake_all_tickets(&space_info->tickets)) { > + if (wake_all_tickets(fs_info, space_info)) { > flush_state = FLUSH_DELAYED_ITEMS_NR; > commit_cycles--; > } else { >
On Mon, Aug 19, 2019 at 05:49:45PM +0300, Nikolay Borisov wrote: > > > On 16.08.19 г. 17:19 ч., Josef Bacik wrote: > > Now that we no longer partially fill tickets we need to rework > > wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see > > if any subsequent tickets are able to be satisfied. If our tickets_id > > changes we know something happened and we can keep flushing. > > > > Also if we find a ticket that is smaller than the first ticket in our > > queue then we want to retry the flushing loop again in case > > may_commit_transaction() decides we could satisfy the ticket by > > committing the transaction. > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > --- > > fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++------- > > 1 file changed, 27 insertions(+), 7 deletions(-) > > > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > > index 8a1c7ada67cb..bd485be783b8 100644 > > --- a/fs/btrfs/space-info.c > > +++ b/fs/btrfs/space-info.c > > @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info, > > !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)); > > } > > > > -static bool wake_all_tickets(struct list_head *head) > > +static bool wake_all_tickets(struct btrfs_fs_info *fs_info, > > + struct btrfs_space_info *space_info) > > { > > struct reserve_ticket *ticket; > > + u64 tickets_id = space_info->tickets_id; > > + u64 first_ticket_bytes = 0; > > + > > + while (!list_empty(&space_info->tickets) && > > + tickets_id == space_info->tickets_id) { > > + ticket = list_first_entry(&space_info->tickets, > > + struct reserve_ticket, list); > > + > > + /* > > + * may_commit_transaction will avoid committing the transaction > > + * if it doesn't feel like the space reclaimed by the commit > > + * would result in the ticket succeeding. However if we have a > > + * smaller ticket in the queue it may be small enough to be > > + * satisified by committing the transaction, so if any > > + * subsequent ticket is smaller than the first ticket go ahead > > + * and send us back for another loop through the enospc flushing > > + * code. > > + */ > > + if (first_ticket_bytes == 0) > > + first_ticket_bytes = ticket->bytes; > > + else if (first_ticket_bytes > ticket->bytes) > > + return true; > > > > - while (!list_empty(head)) { > > - ticket = list_first_entry(head, struct reserve_ticket, list); > > list_del_init(&ticket->list); > > ticket->error = -ENOSPC; > > wake_up(&ticket->wait); > > - if (ticket->bytes != ticket->orig_bytes) > > - return true; > > + btrfs_try_to_wakeup_tickets(fs_info, space_info); > > So the change in this logic is directly related to the implementation of > btrfs_try_to_wakeup_tickets. Because when we fail and remove a ticket in > this function we give a chance that the next ticket *could* be > satisfied. But how well does that work in practice, given you fail > normal prio tickets here, whereas btrfs_try_to_wakeup_tickets first > checks the prio ticket. So even if you are failing normal ticket but > there is one unsatifiable prio ticket that won't really change anything. In practice we don't get to this state with high priority tickets on the list. Anything that would be long-ish term on the priority list is evict, and we wait for iput()'s in the normal flushing code. At the point we hit wake_all_tickets we generally should only have tickets on the normal list. I suppose we could possibly get into this situation, but again the high priority tickets are going to be evict, truncate block, and relocate, which all have significantly lower reservation amounts than things like create or unlink. If those things are unable to get reservations then we are truly out of space. Thanks, Josef
On 19.08.19 г. 18:06 ч., Josef Bacik wrote: > On Mon, Aug 19, 2019 at 05:49:45PM +0300, Nikolay Borisov wrote: >> >> >> On 16.08.19 г. 17:19 ч., Josef Bacik wrote: >>> Now that we no longer partially fill tickets we need to rework >>> wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see >>> if any subsequent tickets are able to be satisfied. If our tickets_id >>> changes we know something happened and we can keep flushing. >>> >>> Also if we find a ticket that is smaller than the first ticket in our >>> queue then we want to retry the flushing loop again in case >>> may_commit_transaction() decides we could satisfy the ticket by >>> committing the transaction. >>> >>> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >>> --- >>> fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++------- >>> 1 file changed, 27 insertions(+), 7 deletions(-) >>> >>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c >>> index 8a1c7ada67cb..bd485be783b8 100644 >>> --- a/fs/btrfs/space-info.c >>> +++ b/fs/btrfs/space-info.c >>> @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info, >>> !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)); >>> } >>> >>> -static bool wake_all_tickets(struct list_head *head) >>> +static bool wake_all_tickets(struct btrfs_fs_info *fs_info, >>> + struct btrfs_space_info *space_info) >>> { >>> struct reserve_ticket *ticket; >>> + u64 tickets_id = space_info->tickets_id; >>> + u64 first_ticket_bytes = 0; >>> + >>> + while (!list_empty(&space_info->tickets) && >>> + tickets_id == space_info->tickets_id) { >>> + ticket = list_first_entry(&space_info->tickets, >>> + struct reserve_ticket, list); >>> + >>> + /* >>> + * may_commit_transaction will avoid committing the transaction >>> + * if it doesn't feel like the space reclaimed by the commit >>> + * would result in the ticket succeeding. However if we have a >>> + * smaller ticket in the queue it may be small enough to be >>> + * satisified by committing the transaction, so if any >>> + * subsequent ticket is smaller than the first ticket go ahead >>> + * and send us back for another loop through the enospc flushing >>> + * code. >>> + */ >>> + if (first_ticket_bytes == 0) >>> + first_ticket_bytes = ticket->bytes; >>> + else if (first_ticket_bytes > ticket->bytes) >>> + return true; >>> >>> - while (!list_empty(head)) { >>> - ticket = list_first_entry(head, struct reserve_ticket, list); >>> list_del_init(&ticket->list); >>> ticket->error = -ENOSPC; >>> wake_up(&ticket->wait); >>> - if (ticket->bytes != ticket->orig_bytes) >>> - return true; >>> + btrfs_try_to_wakeup_tickets(fs_info, space_info); >> >> So the change in this logic is directly related to the implementation of >> btrfs_try_to_wakeup_tickets. Because when we fail and remove a ticket in >> this function we give a chance that the next ticket *could* be >> satisfied. But how well does that work in practice, given you fail >> normal prio tickets here, whereas btrfs_try_to_wakeup_tickets first >> checks the prio ticket. So even if you are failing normal ticket but >> there is one unsatifiable prio ticket that won't really change anything. > > In practice we don't get to this state with high priority tickets on the list. > Anything that would be long-ish term on the priority list is evict, and we wait > for iput()'s in the normal flushing code. At the point we hit wake_all_tickets > we generally should only have tickets on the normal list. Be that as it may, I think this assumption needs to be codified via an assert or WARN_ON. > > I suppose we could possibly get into this situation, but again the high priority > tickets are going to be evict, truncate block, and relocate, which all have > significantly lower reservation amounts than things like create or unlink. If > those things are unable to get reservations then we are truly out of space. > Thanks, > > Josef >
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 8a1c7ada67cb..bd485be783b8 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -676,19 +676,39 @@ static inline int need_do_async_reclaim(struct btrfs_fs_info *fs_info, !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)); } -static bool wake_all_tickets(struct list_head *head) +static bool wake_all_tickets(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info) { struct reserve_ticket *ticket; + u64 tickets_id = space_info->tickets_id; + u64 first_ticket_bytes = 0; + + while (!list_empty(&space_info->tickets) && + tickets_id == space_info->tickets_id) { + ticket = list_first_entry(&space_info->tickets, + struct reserve_ticket, list); + + /* + * may_commit_transaction will avoid committing the transaction + * if it doesn't feel like the space reclaimed by the commit + * would result in the ticket succeeding. However if we have a + * smaller ticket in the queue it may be small enough to be + * satisified by committing the transaction, so if any + * subsequent ticket is smaller than the first ticket go ahead + * and send us back for another loop through the enospc flushing + * code. + */ + if (first_ticket_bytes == 0) + first_ticket_bytes = ticket->bytes; + else if (first_ticket_bytes > ticket->bytes) + return true; - while (!list_empty(head)) { - ticket = list_first_entry(head, struct reserve_ticket, list); list_del_init(&ticket->list); ticket->error = -ENOSPC; wake_up(&ticket->wait); - if (ticket->bytes != ticket->orig_bytes) - return true; + btrfs_try_to_wakeup_tickets(fs_info, space_info); } - return false; + return (tickets_id != space_info->tickets_id); } /* @@ -756,7 +776,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) if (flush_state > COMMIT_TRANS) { commit_cycles++; if (commit_cycles > 2) { - if (wake_all_tickets(&space_info->tickets)) { + if (wake_all_tickets(fs_info, space_info)) { flush_state = FLUSH_DELAYED_ITEMS_NR; commit_cycles--; } else {
Now that we no longer partially fill tickets we need to rework wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see if any subsequent tickets are able to be satisfied. If our tickets_id changes we know something happened and we can keep flushing. Also if we find a ticket that is smaller than the first ticket in our queue then we want to retry the flushing loop again in case may_commit_transaction() decides we could satisfy the ticket by committing the transaction. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/space-info.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-)