Message ID | e8acfcfefeb3156e11e60ea97dcd2c6ecf984101.1686028197.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fixes for reclaim | expand |
On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote: > > The reclaim process can temporally fail. For example, if the space is temporally -> temporarily > getting tight, it fails to make the block group read-only. If there are no > further writes on that block group, the block group will never get back to > the reclaim list, and the BG never get reclaimed. In a certain workload, we get -> gets > can leave many such block groups never reclaimed. > > So, let's get it back to the list and give it a chance to be reclaimed. > > Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") > CC: stable@vger.kernel.org # 5.15+ > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > fs/btrfs/block-group.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index db9bee071434..36e0d9b5d824 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -1833,7 +1833,18 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > } > > next: > - btrfs_put_block_group(bg); > + if (!ret) { > + btrfs_put_block_group(bg); > + } else { > + spin_lock(&bg->lock); Why do we need to take bg->lock? The ->bg_list is protected by fs_info->unused_bgs_lock alone. > + spin_lock(&fs_info->unused_bgs_lock); > + if (list_empty(&bg->bg_list)) > + list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); > + else > + btrfs_put_block_group(bg); > + spin_unlock(&fs_info->unused_bgs_lock); > + spin_unlock(&bg->lock); > + } Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should use that, and simply have: btrfs_mark_bg_to_reclaim(); btrfs_put_block_group(bg); Thanks. > > mutex_unlock(&fs_info->reclaim_bgs_lock); > /* > -- > 2.40.1 >
On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote: > On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote: > > + spin_lock(&fs_info->unused_bgs_lock); > > + if (list_empty(&bg->bg_list)) > > + list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); > > + else > > + btrfs_put_block_group(bg); > > + spin_unlock(&fs_info->unused_bgs_lock); > > + spin_unlock(&bg->lock); > > + } > > Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should > use that, and simply have: > > btrfs_mark_bg_to_reclaim(); > btrfs_put_block_group(bg); I can fold the diff below if you agree --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) } next: - if (!ret) { - btrfs_put_block_group(bg); - } else { - spin_lock(&bg->lock); - spin_lock(&fs_info->unused_bgs_lock); - if (list_empty(&bg->bg_list)) - list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); - else - btrfs_put_block_group(bg); - spin_unlock(&fs_info->unused_bgs_lock); - spin_unlock(&bg->lock); - } + if (!ret) + btrfs_mark_bg_to_reclaim(bg); + btrfs_put_block_group(bg); mutex_unlock(&fs_info->reclaim_bgs_lock); /* ---
On Tue, Jun 06, 2023 at 06:23:23PM +0200, David Sterba wrote: > On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote: > > On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote: > > > + spin_lock(&fs_info->unused_bgs_lock); > > > + if (list_empty(&bg->bg_list)) > > > + list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); > > > + else > > > + btrfs_put_block_group(bg); > > > + spin_unlock(&fs_info->unused_bgs_lock); > > > + spin_unlock(&bg->lock); > > > + } > > > > Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should > > use that, and simply have: > > > > btrfs_mark_bg_to_reclaim(); > > btrfs_put_block_group(bg); Yeah, this looks nice. Thank you. > > I can fold the diff below if you agree > > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > } > > next: > - if (!ret) { > - btrfs_put_block_group(bg); > - } else { > - spin_lock(&bg->lock); > - spin_lock(&fs_info->unused_bgs_lock); > - if (list_empty(&bg->bg_list)) > - list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); > - else > - btrfs_put_block_group(bg); > - spin_unlock(&fs_info->unused_bgs_lock); > - spin_unlock(&bg->lock); > - } > + if (!ret) > + btrfs_mark_bg_to_reclaim(bg); Thank you for folding this, but the condition is flipped. We should add it back to the list in a failure case. > + btrfs_put_block_group(bg); > > mutex_unlock(&fs_info->reclaim_bgs_lock); > /* > ---
On Tue, Jun 06, 2023 at 11:55:48PM +0000, Naohiro Aota wrote: > On Tue, Jun 06, 2023 at 06:23:23PM +0200, David Sterba wrote: > > On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote: > > > On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote: > > > > + spin_lock(&fs_info->unused_bgs_lock); > > > > + if (list_empty(&bg->bg_list)) > > > > + list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); > > > > + else > > > > + btrfs_put_block_group(bg); > > > > + spin_unlock(&fs_info->unused_bgs_lock); > > > > + spin_unlock(&bg->lock); > > > > + } > > > > > > Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should > > > use that, and simply have: > > > > > > btrfs_mark_bg_to_reclaim(); > > > btrfs_put_block_group(bg); > > Yeah, this looks nice. Thank you. > > > > > I can fold the diff below if you agree > > > > --- a/fs/btrfs/block-group.c > > +++ b/fs/btrfs/block-group.c > > @@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > > } > > > > next: > > - if (!ret) { > > - btrfs_put_block_group(bg); > > - } else { > > - spin_lock(&bg->lock); > > - spin_lock(&fs_info->unused_bgs_lock); > > - if (list_empty(&bg->bg_list)) > > - list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); > > - else > > - btrfs_put_block_group(bg); > > - spin_unlock(&fs_info->unused_bgs_lock); > > - spin_unlock(&bg->lock); > > - } > > + if (!ret) > > + btrfs_mark_bg_to_reclaim(bg); > > Thank you for folding this, but the condition is flipped. We should add it > back to the list in a failure case. Ah right, fixed and pushed. The whole diff becomes: if (ret) btrfs_mark_bg_to_reclaim(bg);
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index db9bee071434..36e0d9b5d824 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1833,7 +1833,18 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) } next: - btrfs_put_block_group(bg); + if (!ret) { + btrfs_put_block_group(bg); + } else { + spin_lock(&bg->lock); + spin_lock(&fs_info->unused_bgs_lock); + if (list_empty(&bg->bg_list)) + list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs); + else + btrfs_put_block_group(bg); + spin_unlock(&fs_info->unused_bgs_lock); + spin_unlock(&bg->lock); + } mutex_unlock(&fs_info->reclaim_bgs_lock); /*
The reclaim process can temporally fail. For example, if the space is getting tight, it fails to make the block group read-only. If there are no further writes on that block group, the block group will never get back to the reclaim list, and the BG never get reclaimed. In a certain workload, we can leave many such block groups never reclaimed. So, let's get it back to the list and give it a chance to be reclaimed. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/block-group.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)