Message ID | 06288ff9c090a5bfe76e0a2080eb1fbd640cdd62.1686028197.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fixes for reclaim | expand |
On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote: > > The reclaiming process only starts after the FS volumes are allocated to a > certain level (75% by default). Thus, the list of reclaiming target block > groups can build up so huge at the time the reclaim process kicks in. On a > test run, there were over 1000 BGs in the reclaim list. > > As the reclaim involves rewriting the data, it takes really long time to > reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs() > won't proceed because the reclaim side is holding > fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused > BGs kept in the unused list. On my test run, I got 1057 unused BGs. > > Since deleting a block group is relatively easy and fast work, we can call > btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up > unused BGs. > > Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") > CC: stable@vger.kernel.org # 5.15+ > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good, great catch. > --- > fs/btrfs/block-group.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index 618ba7670e66..c5547da0f6eb 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > > next: > btrfs_put_block_group(bg); > + > + mutex_unlock(&fs_info->reclaim_bgs_lock); > + /* > + * Reclaiming all the BGs in the list can take really long. > + * Prioritize cleanning up unused BGs. > + */ > + btrfs_delete_unused_bgs(fs_info); > + /* > + * If we are interrupted by a balance, we can just bail out. The > + * cleaner thread call me again if necessary. > + */ > + if (!mutex_trylock(&fs_info->reclaim_bgs_lock)) > + goto end; > spin_lock(&fs_info->unused_bgs_lock); > } > spin_unlock(&fs_info->unused_bgs_lock); > mutex_unlock(&fs_info->reclaim_bgs_lock); > +end: > btrfs_exclop_finish(fs_info); > sb_end_write(fs_info->sb); > } > -- > 2.40.1 >
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On Tue, Jun 06, 2023 at 02:36:33PM +0900, Naohiro Aota wrote: > The reclaiming process only starts after the FS volumes are allocated to a > certain level (75% by default). Thus, the list of reclaiming target block > groups can build up so huge at the time the reclaim process kicks in. On a > test run, there were over 1000 BGs in the reclaim list. > > As the reclaim involves rewriting the data, it takes really long time to > reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs() > won't proceed because the reclaim side is holding > fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused > BGs kept in the unused list. On my test run, I got 1057 unused BGs. > > Since deleting a block group is relatively easy and fast work, we can call > btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up > unused BGs. > > Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") > CC: stable@vger.kernel.org # 5.15+ > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > fs/btrfs/block-group.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index 618ba7670e66..c5547da0f6eb 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > > next: > btrfs_put_block_group(bg); > + > + mutex_unlock(&fs_info->reclaim_bgs_lock); > + /* > + * Reclaiming all the BGs in the list can take really long. > + * Prioritize cleanning up unused BGs. > + */ > + btrfs_delete_unused_bgs(fs_info); > + /* > + * If we are interrupted by a balance, we can just bail out. The > + * cleaner thread call me again if necessary. > + */ > + if (!mutex_trylock(&fs_info->reclaim_bgs_lock)) > + goto end; I agree that this fix makes sense and a lot of reclaim should not block deleting unused bgs. However, it feels quite hacky to me, because by the current design, we are explicitly calling btrfs_delete_unused_bgs as a first class cleanup action from the cleaner thread, before calling reclaim. I think a little more cleanup to integrate the two together would reduce the "throw things at the wall" feel here. I would propose either: 1. Run them in parallel and make sure they release locks appropriately so that everyone makes good forward progress. I think it's a good model, but kind of a departure from the single cleaner thread so maybe risky/a pain to code. 2. Just get rid of the explicit delete unused from cleaner and integrate it as a first class component of this reclaim loop. This loop becomes *the* delete unused work except shutdown type cases. FWIW, not a NAK, just my 2c. Thanks, Boris > spin_lock(&fs_info->unused_bgs_lock); > } > spin_unlock(&fs_info->unused_bgs_lock); > mutex_unlock(&fs_info->reclaim_bgs_lock); > +end: > btrfs_exclop_finish(fs_info); > sb_end_write(fs_info->sb); > } > -- > 2.40.1 >
On Wed, Jun 07, 2023 at 11:48:37AM -0700, Boris Burkov wrote: > On Tue, Jun 06, 2023 at 02:36:33PM +0900, Naohiro Aota wrote: > > The reclaiming process only starts after the FS volumes are allocated to a > > certain level (75% by default). Thus, the list of reclaiming target block > > groups can build up so huge at the time the reclaim process kicks in. On a > > test run, there were over 1000 BGs in the reclaim list. > > > > As the reclaim involves rewriting the data, it takes really long time to > > reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs() > > won't proceed because the reclaim side is holding > > fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused > > BGs kept in the unused list. On my test run, I got 1057 unused BGs. > > > > Since deleting a block group is relatively easy and fast work, we can call > > btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up > > unused BGs. > > > > Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") > > CC: stable@vger.kernel.org # 5.15+ > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > --- > > fs/btrfs/block-group.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > index 618ba7670e66..c5547da0f6eb 100644 > > --- a/fs/btrfs/block-group.c > > +++ b/fs/btrfs/block-group.c > > @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > > > > next: > > btrfs_put_block_group(bg); > > + > > + mutex_unlock(&fs_info->reclaim_bgs_lock); > > + /* > > + * Reclaiming all the BGs in the list can take really long. > > + * Prioritize cleanning up unused BGs. > > + */ > > + btrfs_delete_unused_bgs(fs_info); > > + /* > > + * If we are interrupted by a balance, we can just bail out. The > > + * cleaner thread call me again if necessary. > > + */ > > + if (!mutex_trylock(&fs_info->reclaim_bgs_lock)) > > + goto end; > > I agree that this fix makes sense and a lot of reclaim should not block > deleting unused bgs. > > However, it feels quite hacky to me, because by the current design, we > are explicitly calling btrfs_delete_unused_bgs as a first class cleanup > action from the cleaner thread, before calling reclaim. I think a little > more cleanup to integrate the two together would reduce the "throw > things at the wall" feel here. > > I would propose either: > 1. Run them in parallel and make sure they release locks appropriately > so that everyone makes good forward progress. I think it's a good > model, but kind of a departure from the single cleaner thread so > maybe risky/a pain to code. > 2. Just get rid of the explicit delete unused from cleaner and > integrate it as a first class component of this reclaim loop. This > loop becomes *the* delete unused work except shutdown type cases. > > FWIW, not a NAK, just my 2c. Thank you for your suggestion. Yeah, I also considered something like your option 2, but I hesitated to do so as a fix patch because it will change the current cleaner and reclaim thread model. I'll rework them an a integrated loop if btrfs devs agree with it. > Thanks, > Boris > > > spin_lock(&fs_info->unused_bgs_lock); > > } > > spin_unlock(&fs_info->unused_bgs_lock); > > mutex_unlock(&fs_info->reclaim_bgs_lock); > > +end: > > btrfs_exclop_finish(fs_info); > > sb_end_write(fs_info->sb); > > } > > -- > > 2.40.1 > >
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 618ba7670e66..c5547da0f6eb 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) next: btrfs_put_block_group(bg); + + mutex_unlock(&fs_info->reclaim_bgs_lock); + /* + * Reclaiming all the BGs in the list can take really long. + * Prioritize cleanning up unused BGs. + */ + btrfs_delete_unused_bgs(fs_info); + /* + * If we are interrupted by a balance, we can just bail out. The + * cleaner thread call me again if necessary. + */ + if (!mutex_trylock(&fs_info->reclaim_bgs_lock)) + goto end; spin_lock(&fs_info->unused_bgs_lock); } spin_unlock(&fs_info->unused_bgs_lock); mutex_unlock(&fs_info->reclaim_bgs_lock); +end: btrfs_exclop_finish(fs_info); sb_end_write(fs_info->sb); }
The reclaiming process only starts after the FS volumes are allocated to a certain level (75% by default). Thus, the list of reclaiming target block groups can build up so huge at the time the reclaim process kicks in. On a test run, there were over 1000 BGs in the reclaim list. As the reclaim involves rewriting the data, it takes really long time to reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs() won't proceed because the reclaim side is holding fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused BGs kept in the unused list. On my test run, I got 1057 unused BGs. Since deleting a block group is relatively easy and fast work, we can call btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up unused BGs. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/block-group.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)