Message ID | 3ed106331a7ff61303c9e1c2930f8a7673b80a86.1717790627.git.boris@bur.io (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: retry bg reclaim without infinite loop | expand |
On 07.06.24 22:04, Boris Burkov wrote: [...] > Cc: stable@vger.kernel.org > Fixes: 7e2718099438 ("btrfs: reinsert BGs failed to reclaim") > Signed-off-by: Boris Burkov <boris@bur.io> I think the code used to look like that before, but I can't remember if my mind serves me right. > - if (ret) > - btrfs_mark_bg_to_reclaim(bg); > + if (ret) { > + // refcount held by the reclaim_bgs list after splice Nit: I don't think we use C++ style comments Anyways, as is it looks good to me. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On Fri, Jun 7, 2024 at 9:04 PM Boris Burkov <boris@bur.io> wrote: > > If inc_block_group_ro systematically fails (e.g. due to ETXTBUSY from > swap!) or btrfs_relocate_chunk systematically fails (from lack of > space), then this worker becomes an infinite loop. > > At the very least, this strands the cleaner thread, but can also result > in hung tasks/rcu stalls on PREEMPT_NONE kernels and if the > reclaim_bgs_lock mutex is not contended. > > I believe the best long term fix is to manage reclaim via work queue, > where we queue up a relocation on the triggering condition and re-queue > on failure. In the meantime, this is an easy fix to apply to avoid the > immediate pain. > > Cc: stable@vger.kernel.org > Fixes: 7e2718099438 ("btrfs: reinsert BGs failed to reclaim") > Signed-off-by: Boris Burkov <boris@bur.io> > --- > fs/btrfs/block-group.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index 9910bae89966..fe61fc2df950 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -1792,6 +1792,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > container_of(work, struct btrfs_fs_info, reclaim_bgs_work); > struct btrfs_block_group *bg; > struct btrfs_space_info *space_info; > + LIST_HEAD(retry_list); > > if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) > return; > @@ -1928,8 +1929,11 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > } > > next: > - if (ret) > - btrfs_mark_bg_to_reclaim(bg); > + if (ret) { > + // refcount held by the reclaim_bgs list after splice Apart from the comment style as pointed out by Johannes before: Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > + btrfs_get_block_group(bg); > + list_add_tail(&bg->bg_list, &retry_list); > + } > btrfs_put_block_group(bg); > > mutex_unlock(&fs_info->reclaim_bgs_lock); > @@ -1949,6 +1953,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) > spin_unlock(&fs_info->unused_bgs_lock); > mutex_unlock(&fs_info->reclaim_bgs_lock); > end: > + spin_lock(&fs_info->unused_bgs_lock); > + list_splice_tail(&retry_list, &fs_info->reclaim_bgs); > + spin_unlock(&fs_info->unused_bgs_lock); > btrfs_exclop_finish(fs_info); > sb_end_write(fs_info->sb); > } > -- > 2.45.2 > >
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 9910bae89966..fe61fc2df950 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1792,6 +1792,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) container_of(work, struct btrfs_fs_info, reclaim_bgs_work); struct btrfs_block_group *bg; struct btrfs_space_info *space_info; + LIST_HEAD(retry_list); if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return; @@ -1928,8 +1929,11 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) } next: - if (ret) - btrfs_mark_bg_to_reclaim(bg); + if (ret) { + // refcount held by the reclaim_bgs list after splice + btrfs_get_block_group(bg); + list_add_tail(&bg->bg_list, &retry_list); + } btrfs_put_block_group(bg); mutex_unlock(&fs_info->reclaim_bgs_lock); @@ -1949,6 +1953,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) spin_unlock(&fs_info->unused_bgs_lock); mutex_unlock(&fs_info->reclaim_bgs_lock); end: + spin_lock(&fs_info->unused_bgs_lock); + list_splice_tail(&retry_list, &fs_info->reclaim_bgs); + spin_unlock(&fs_info->unused_bgs_lock); btrfs_exclop_finish(fs_info); sb_end_write(fs_info->sb); }
If inc_block_group_ro systematically fails (e.g. due to ETXTBUSY from swap!) or btrfs_relocate_chunk systematically fails (from lack of space), then this worker becomes an infinite loop. At the very least, this strands the cleaner thread, but can also result in hung tasks/rcu stalls on PREEMPT_NONE kernels and if the reclaim_bgs_lock mutex is not contended. I believe the best long term fix is to manage reclaim via work queue, where we queue up a relocation on the triggering condition and re-queue on failure. In the meantime, this is an easy fix to apply to avoid the immediate pain. Cc: stable@vger.kernel.org Fixes: 7e2718099438 ("btrfs: reinsert BGs failed to reclaim") Signed-off-by: Boris Burkov <boris@bur.io> --- fs/btrfs/block-group.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)