Message ID | 65dce6b59ca3aa2bcd2e492271dcc3faa6f0a9ad.1741583506.git.wqu@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs: add extra warning if delayed iput is added when it's not allowed | expand |
On Mon, Mar 10, 2025 at 5:15 AM Qu Wenruo <wqu@suse.com> wrote: > > Since I have triggered the ASSERT() on the delayed iput too many times, > now is the time to add some extra debug warnings for delayed iput. > > All delayed iputs should be queued after all ordered extents finish > their IO and all involved workqueues are flushed. > > Thus after the btrfs_run_delayed_iputs() inside close_ctree(), there > should be no more delayed puts added. > > So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT, set after the above > mentioned timing. > And all btrfs_add_delayed_iput() will check that flag and give a > WARN_ON_ONCE() for debug builds. > > And since we're here, remove the btrfs_run_delayed_iputs() call inside > btrfs_error_commit_super(). This is a bad idea to make such unrelated change in the same patch. After all this one is about adding a warning and making things easier to catch. It should either go as a separate patch or it fits much better in the previous patch ("btrfs: run btrfs_error_commit_super() early"). > As that function will only wait for ordered extents to finish their IO, > delayed iputs will only be added when those ordered extents all finished > and removed from io tree, this is only ensured after the workqueue It's not "io tree", that's used for extent locking, setting delalloc ranges, etc. You mean the ordered extents tree in the inode (struct btrfs_inode::ordered_tree). > flush. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > Changelog: > v2: > - Set the new flag early > - Make the new flag unconditional except the WARN_ON_ONCE() check > - Remove the btrfs_run_delayed_iputs() inside btrfs_error_commit_super() > --- > fs/btrfs/disk-io.c | 6 ++---- > fs/btrfs/fs.h | 3 +++ > fs/btrfs/inode.c | 4 ++++ > 3 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 671f11787f34..466d9c434030 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -4384,6 +4384,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > /* Ordered extents for free space inodes. */ > btrfs_flush_workqueue(fs_info->endio_freespace_worker); > btrfs_run_delayed_iputs(fs_info); > + /* There should be no more workload to generate new delayed iputs. */ > + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state); > > cancel_work_sync(&fs_info->async_reclaim_work); > cancel_work_sync(&fs_info->async_data_reclaim_work); > @@ -4540,10 +4542,6 @@ static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info) > /* cleanup FS via transaction */ > btrfs_cleanup_transaction(fs_info); > > - mutex_lock(&fs_info->cleaner_mutex); > - btrfs_run_delayed_iputs(fs_info); > - mutex_unlock(&fs_info->cleaner_mutex); > - > down_write(&fs_info->cleanup_work_sem); > up_write(&fs_info->cleanup_work_sem); > } > diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h > index b8c2e59ffc43..3a9e0a4c54e5 100644 > --- a/fs/btrfs/fs.h > +++ b/fs/btrfs/fs.h > @@ -117,6 +117,9 @@ enum { > /* Indicates there was an error cleaning up a log tree. */ > BTRFS_FS_STATE_LOG_CLEANUP_ERROR, > > + /* No more delayed iput can be queued. */ > + BTRFS_FS_STATE_NO_DELAYED_IPUT, > + > BTRFS_FS_STATE_COUNT > }; > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8ac4858b70e7..d2bf81c08f13 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode) > if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1)) > return; > > +#ifdef CONFIG_BTRFS_DEBUG > + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state)); > +#endif As commented before, I would not add this under CONFIG_BTRFS_DEBUG, just leave it there to always run. Like that it'll be much faster to discover issues, as users typically don't run a kernel with CONFIG_BTRFS_DEBUG. Thanks. > + > atomic_inc(&fs_info->nr_delayed_iputs); > /* > * Need to be irq safe here because we can be called from either an irq > -- > 2.48.1 > >
在 2025/3/10 21:12, Filipe Manana 写道: > On Mon, Mar 10, 2025 at 5:15 AM Qu Wenruo <wqu@suse.com> wrote: >> >> Since I have triggered the ASSERT() on the delayed iput too many times, >> now is the time to add some extra debug warnings for delayed iput. >> >> All delayed iputs should be queued after all ordered extents finish >> their IO and all involved workqueues are flushed. >> >> Thus after the btrfs_run_delayed_iputs() inside close_ctree(), there >> should be no more delayed puts added. >> >> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT, set after the above >> mentioned timing. >> And all btrfs_add_delayed_iput() will check that flag and give a >> WARN_ON_ONCE() for debug builds. >> >> And since we're here, remove the btrfs_run_delayed_iputs() call inside >> btrfs_error_commit_super(). > > This is a bad idea to make such unrelated change in the same patch. > After all this one is about adding a warning and making things easier > to catch. > > It should either go as a separate patch or it fits much better in the > previous patch ("btrfs: run btrfs_error_commit_super() early"). Sure, I also prefer to push the removal of btrfs_run_delayed_iputs() into that commit. I'll update that one with extra commit message. > > >> As that function will only wait for ordered extents to finish their IO, >> delayed iputs will only be added when those ordered extents all finished >> and removed from io tree, this is only ensured after the workqueue > > It's not "io tree", that's used for extent locking, setting delalloc > ranges, etc. > You mean the ordered extents tree in the inode (struct > btrfs_inode::ordered_tree). > >> flush. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> Changelog: >> v2: >> - Set the new flag early >> - Make the new flag unconditional except the WARN_ON_ONCE() check >> - Remove the btrfs_run_delayed_iputs() inside btrfs_error_commit_super() >> --- >> fs/btrfs/disk-io.c | 6 ++---- >> fs/btrfs/fs.h | 3 +++ >> fs/btrfs/inode.c | 4 ++++ >> 3 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 671f11787f34..466d9c434030 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -4384,6 +4384,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) >> /* Ordered extents for free space inodes. */ >> btrfs_flush_workqueue(fs_info->endio_freespace_worker); >> btrfs_run_delayed_iputs(fs_info); >> + /* There should be no more workload to generate new delayed iputs. */ >> + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state); >> >> cancel_work_sync(&fs_info->async_reclaim_work); >> cancel_work_sync(&fs_info->async_data_reclaim_work); >> @@ -4540,10 +4542,6 @@ static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info) >> /* cleanup FS via transaction */ >> btrfs_cleanup_transaction(fs_info); >> >> - mutex_lock(&fs_info->cleaner_mutex); >> - btrfs_run_delayed_iputs(fs_info); >> - mutex_unlock(&fs_info->cleaner_mutex); >> - >> down_write(&fs_info->cleanup_work_sem); >> up_write(&fs_info->cleanup_work_sem); >> } >> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h >> index b8c2e59ffc43..3a9e0a4c54e5 100644 >> --- a/fs/btrfs/fs.h >> +++ b/fs/btrfs/fs.h >> @@ -117,6 +117,9 @@ enum { >> /* Indicates there was an error cleaning up a log tree. */ >> BTRFS_FS_STATE_LOG_CLEANUP_ERROR, >> >> + /* No more delayed iput can be queued. */ >> + BTRFS_FS_STATE_NO_DELAYED_IPUT, >> + >> BTRFS_FS_STATE_COUNT >> }; >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index 8ac4858b70e7..d2bf81c08f13 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode) >> if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1)) >> return; >> >> +#ifdef CONFIG_BTRFS_DEBUG >> + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state)); >> +#endif > > As commented before, I would not add this under CONFIG_BTRFS_DEBUG, > just leave it there to always run. > Like that it'll be much faster to discover issues, as users typically > don't run a kernel with CONFIG_BTRFS_DEBUG. OK, since it's already WARN_ON_ONCE(), which won't flood the dmesg again and again, I'm also fine put it into production kernels. Thanks, Qu > > Thanks. > > >> + >> atomic_inc(&fs_info->nr_delayed_iputs); >> /* >> * Need to be irq safe here because we can be called from either an irq >> -- >> 2.48.1 >> >> >
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 671f11787f34..466d9c434030 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4384,6 +4384,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) /* Ordered extents for free space inodes. */ btrfs_flush_workqueue(fs_info->endio_freespace_worker); btrfs_run_delayed_iputs(fs_info); + /* There should be no more workload to generate new delayed iputs. */ + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state); cancel_work_sync(&fs_info->async_reclaim_work); cancel_work_sync(&fs_info->async_data_reclaim_work); @@ -4540,10 +4542,6 @@ static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info) /* cleanup FS via transaction */ btrfs_cleanup_transaction(fs_info); - mutex_lock(&fs_info->cleaner_mutex); - btrfs_run_delayed_iputs(fs_info); - mutex_unlock(&fs_info->cleaner_mutex); - down_write(&fs_info->cleanup_work_sem); up_write(&fs_info->cleanup_work_sem); } diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index b8c2e59ffc43..3a9e0a4c54e5 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -117,6 +117,9 @@ enum { /* Indicates there was an error cleaning up a log tree. */ BTRFS_FS_STATE_LOG_CLEANUP_ERROR, + /* No more delayed iput can be queued. */ + BTRFS_FS_STATE_NO_DELAYED_IPUT, + BTRFS_FS_STATE_COUNT }; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ac4858b70e7..d2bf81c08f13 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode) if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1)) return; +#ifdef CONFIG_BTRFS_DEBUG + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state)); +#endif + atomic_inc(&fs_info->nr_delayed_iputs); /* * Need to be irq safe here because we can be called from either an irq
Since I have triggered the ASSERT() on the delayed iput too many times, now is the time to add some extra debug warnings for delayed iput. All delayed iputs should be queued after all ordered extents finish their IO and all involved workqueues are flushed. Thus after the btrfs_run_delayed_iputs() inside close_ctree(), there should be no more delayed puts added. So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT, set after the above mentioned timing. And all btrfs_add_delayed_iput() will check that flag and give a WARN_ON_ONCE() for debug builds. And since we're here, remove the btrfs_run_delayed_iputs() call inside btrfs_error_commit_super(). As that function will only wait for ordered extents to finish their IO, delayed iputs will only be added when those ordered extents all finished and removed from io tree, this is only ensured after the workqueue flush. Signed-off-by: Qu Wenruo <wqu@suse.com> --- Changelog: v2: - Set the new flag early - Make the new flag unconditional except the WARN_ON_ONCE() check - Remove the btrfs_run_delayed_iputs() inside btrfs_error_commit_super() --- fs/btrfs/disk-io.c | 6 ++---- fs/btrfs/fs.h | 3 +++ fs/btrfs/inode.c | 4 ++++ 3 files changed, 9 insertions(+), 4 deletions(-)