Message ID | 20160902025846.4133-1-wangxg.fnst@cn.fujitsu.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On 09/01/2016 10:58 PM, Wang Xiaoguang wrote: > In btrfs_async_reclaim_metadata_space(), we use ticket's address to > determine whether asynchronous metadata reclaim work is making progress. > > ticket = list_first_entry(&space_info->tickets, > struct reserve_ticket, list); > if (last_ticket == ticket) { > flush_state++; > } else { > last_ticket = ticket; > flush_state = FLUSH_DELAYED_ITEMS_NR; > if (commit_cycles) > commit_cycles--; > } > > But indeed it's wrong, we should not rely on local variable's address to > do this check, because addresses may be same. In my test environment, I > dd one 168MB file in a 256MB fs, found that for this file, every time > wait_reserve_ticket() called, local variable ticket's address is same, > > For above codes, assume a previous ticket's address is addrA, last_ticket > is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and > wake up it, then another ticket is added, but with the same address addrA, > now last_ticket will be same to current ticket, then current ticket's flush > work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR, > which may result in some enospc issues(I have seen this in my test machine). > > Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> We want to track the progress of the individual tickets, not whether or not we make progress on the space_info, so instead store the global ticket id in space_info and store the individual ticket_id in the ticket itself, and use that as the last_tickets_id. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hello, On 09/02/2016 09:28 PM, Josef Bacik wrote: > On 09/01/2016 10:58 PM, Wang Xiaoguang wrote: >> In btrfs_async_reclaim_metadata_space(), we use ticket's address to >> determine whether asynchronous metadata reclaim work is making progress. >> >> ticket = list_first_entry(&space_info->tickets, >> struct reserve_ticket, list); >> if (last_ticket == ticket) { >> flush_state++; >> } else { >> last_ticket = ticket; >> flush_state = FLUSH_DELAYED_ITEMS_NR; >> if (commit_cycles) >> commit_cycles--; >> } >> >> But indeed it's wrong, we should not rely on local variable's address to >> do this check, because addresses may be same. In my test environment, I >> dd one 168MB file in a 256MB fs, found that for this file, every time >> wait_reserve_ticket() called, local variable ticket's address is same, >> >> For above codes, assume a previous ticket's address is addrA, >> last_ticket >> is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and >> wake up it, then another ticket is added, but with the same address >> addrA, >> now last_ticket will be same to current ticket, then current ticket's >> flush >> work will start from current flush_state, not initial >> FLUSH_DELAYED_ITEMS_NR, >> which may result in some enospc issues(I have seen this in my test >> machine). >> >> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> > > We want to track the progress of the individual tickets, not whether > or not we make progress on the space_info, so instead store the global > ticket id in space_info and store the individual ticket_id in the > ticket itself, and use that as the last_tickets_id. Thanks, Sorry for being late. In btrfs_async_reclaim_metadata_space(), there is codes: if (flush_state > COMMIT_TRANS) { commit_cycles++; if (commit_cycles > 2) { wake_all_tickets(&space_info->tickets); space_info->flush = 0; } else { flush_state = FLUSH_DELAYED_ITEMS_NR; } } From above codes, if it can not satisfy one current ticket, it'll discard all current queued tickets. So I think your original codes is to track whether or not we make progress on the space_info, and this patch can fix the issue which is described in commit message. Also I'm not sure whether I have understood your design completely. If I'm wrong, sorry. Regards, Xiaoguang Wang > > Josef > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/06/2016 06:18 AM, Wang Xiaoguang wrote: > hello, > > On 09/02/2016 09:28 PM, Josef Bacik wrote: >> On 09/01/2016 10:58 PM, Wang Xiaoguang wrote: >>> In btrfs_async_reclaim_metadata_space(), we use ticket's address to >>> determine whether asynchronous metadata reclaim work is making progress. >>> >>> ticket = list_first_entry(&space_info->tickets, >>> struct reserve_ticket, list); >>> if (last_ticket == ticket) { >>> flush_state++; >>> } else { >>> last_ticket = ticket; >>> flush_state = FLUSH_DELAYED_ITEMS_NR; >>> if (commit_cycles) >>> commit_cycles--; >>> } >>> >>> But indeed it's wrong, we should not rely on local variable's address to >>> do this check, because addresses may be same. In my test environment, I >>> dd one 168MB file in a 256MB fs, found that for this file, every time >>> wait_reserve_ticket() called, local variable ticket's address is same, >>> >>> For above codes, assume a previous ticket's address is addrA, last_ticket >>> is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and >>> wake up it, then another ticket is added, but with the same address addrA, >>> now last_ticket will be same to current ticket, then current ticket's flush >>> work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR, >>> which may result in some enospc issues(I have seen this in my test machine). >>> >>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> >> >> We want to track the progress of the individual tickets, not whether or not we >> make progress on the space_info, so instead store the global ticket id in >> space_info and store the individual ticket_id in the ticket itself, and use >> that as the last_tickets_id. Thanks, > Sorry for being late. > > In btrfs_async_reclaim_metadata_space(), there is codes: > if (flush_state > COMMIT_TRANS) { > commit_cycles++; > if (commit_cycles > 2) { > wake_all_tickets(&space_info->tickets); > space_info->flush = 0; > } else { > flush_state = FLUSH_DELAYED_ITEMS_NR; > } > } > From above codes, if it can not satisfy one current ticket, it'll discard all > current > queued tickets. So I think your original codes is to track whether or not we > make progress on the space_info, and this patch can fix the issue which is > described > in commit message. > Also I'm not sure whether I have understood your design completely. If I'm > wrong, sorry. Sorry I misread your patch, I thought you were doing space_info->ticket_id++ when you added a ticket to the list, not when you removed it. You can add Reviewed-by: Josef Bacik <jbacik@fb.com> Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index eff3993..33fe035 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -427,6 +427,7 @@ struct btrfs_space_info { struct list_head ro_bgs; struct list_head priority_tickets; struct list_head tickets; + u64 tickets_id; struct rw_semaphore groups_sem; /* for block groups in our same type */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8c8a4d1..2f9a18f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4966,12 +4966,12 @@ static void wake_all_tickets(struct list_head *head) */ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) { - struct reserve_ticket *last_ticket = NULL; struct btrfs_fs_info *fs_info; struct btrfs_space_info *space_info; u64 to_reclaim; int flush_state; int commit_cycles = 0; + u64 last_tickets_id; fs_info = container_of(work, struct btrfs_fs_info, async_reclaim_work); space_info = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA); @@ -4984,8 +4984,7 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) spin_unlock(&space_info->lock); return; } - last_ticket = list_first_entry(&space_info->tickets, - struct reserve_ticket, list); + last_tickets_id = space_info->tickets_id; spin_unlock(&space_info->lock); flush_state = FLUSH_DELAYED_ITEMS_NR; @@ -5005,10 +5004,10 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) space_info); ticket = list_first_entry(&space_info->tickets, struct reserve_ticket, list); - if (last_ticket == ticket) { + if (last_tickets_id == space_info->tickets_id) { flush_state++; } else { - last_ticket = ticket; + last_tickets_id = space_info->tickets_id; flush_state = FLUSH_DELAYED_ITEMS_NR; if (commit_cycles) commit_cycles--; @@ -5384,6 +5383,7 @@ again: list_del_init(&ticket->list); num_bytes -= ticket->bytes; ticket->bytes = 0; + space_info->tickets_id++; wake_up(&ticket->wait); } else { ticket->bytes -= num_bytes; @@ -5426,6 +5426,7 @@ again: num_bytes -= ticket->bytes; space_info->bytes_may_use += ticket->bytes; ticket->bytes = 0; + space_info->tickets_id++; wake_up(&ticket->wait); } else { trace_btrfs_space_reservation(fs_info, "space_info",
In btrfs_async_reclaim_metadata_space(), we use ticket's address to determine whether asynchronous metadata reclaim work is making progress. ticket = list_first_entry(&space_info->tickets, struct reserve_ticket, list); if (last_ticket == ticket) { flush_state++; } else { last_ticket = ticket; flush_state = FLUSH_DELAYED_ITEMS_NR; if (commit_cycles) commit_cycles--; } But indeed it's wrong, we should not rely on local variable's address to do this check, because addresses may be same. In my test environment, I dd one 168MB file in a 256MB fs, found that for this file, every time wait_reserve_ticket() called, local variable ticket's address is same, For above codes, assume a previous ticket's address is addrA, last_ticket is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and wake up it, then another ticket is added, but with the same address addrA, now last_ticket will be same to current ticket, then current ticket's flush work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR, which may result in some enospc issues(I have seen this in my test machine). Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 11 ++++++----- 2 files changed, 7 insertions(+), 5 deletions(-)