diff mbox series

btrfs: add extra warning if delayed iput is added when it's not allowed

Message ID 65dce6b59ca3aa2bcd2e492271dcc3faa6f0a9ad.1741583506.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: add extra warning if delayed iput is added when it's not allowed | expand

Commit Message

Qu Wenruo March 10, 2025, 5:14 a.m. UTC
Since I have triggered the ASSERT() on the delayed iput too many times,
now is the time to add some extra debug warnings for delayed iput.

All delayed iputs should be queued after all ordered extents finish
their IO and all involved workqueues are flushed.

Thus after the btrfs_run_delayed_iputs() inside close_ctree(), there
should be no more delayed puts added.

So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT, set after the above
mentioned timing.
And all btrfs_add_delayed_iput() will check that flag and give a
WARN_ON_ONCE() for debug builds.

And since we're here, remove the btrfs_run_delayed_iputs() call inside
btrfs_error_commit_super().
As that function will only wait for ordered extents to finish their IO,
delayed iputs will only be added when those ordered extents all finished
and removed from io tree, this is only ensured after the workqueue
flush.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Set the new flag early
- Make the new flag unconditional except the WARN_ON_ONCE() check
- Remove the btrfs_run_delayed_iputs() inside btrfs_error_commit_super()
---
 fs/btrfs/disk-io.c | 6 ++----
 fs/btrfs/fs.h      | 3 +++
 fs/btrfs/inode.c   | 4 ++++
 3 files changed, 9 insertions(+), 4 deletions(-)

Comments

Filipe Manana March 10, 2025, 10:42 a.m. UTC | #1
On Mon, Mar 10, 2025 at 5:15 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Since I have triggered the ASSERT() on the delayed iput too many times,
> now is the time to add some extra debug warnings for delayed iput.
>
> All delayed iputs should be queued after all ordered extents finish
> their IO and all involved workqueues are flushed.
>
> Thus after the btrfs_run_delayed_iputs() inside close_ctree(), there
> should be no more delayed puts added.
>
> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT, set after the above
> mentioned timing.
> And all btrfs_add_delayed_iput() will check that flag and give a
> WARN_ON_ONCE() for debug builds.
>
> And since we're here, remove the btrfs_run_delayed_iputs() call inside
> btrfs_error_commit_super().

This is a bad idea to make such unrelated change in the same patch.
After all this one is about adding a warning and making things easier
to catch.

It should either go as a separate patch or it fits much better in the
previous patch ("btrfs: run btrfs_error_commit_super() early").


> As that function will only wait for ordered extents to finish their IO,
> delayed iputs will only be added when those ordered extents all finished
> and removed from io tree, this is only ensured after the workqueue

It's not "io tree", that's used for extent locking, setting delalloc
ranges, etc.
You mean the ordered extents tree in the inode (struct
btrfs_inode::ordered_tree).

> flush.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Set the new flag early
> - Make the new flag unconditional except the WARN_ON_ONCE() check
> - Remove the btrfs_run_delayed_iputs() inside btrfs_error_commit_super()
> ---
>  fs/btrfs/disk-io.c | 6 ++----
>  fs/btrfs/fs.h      | 3 +++
>  fs/btrfs/inode.c   | 4 ++++
>  3 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 671f11787f34..466d9c434030 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4384,6 +4384,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>         /* Ordered extents for free space inodes. */
>         btrfs_flush_workqueue(fs_info->endio_freespace_worker);
>         btrfs_run_delayed_iputs(fs_info);
> +       /* There should be no more workload to generate new delayed iputs. */
> +       set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
>
>         cancel_work_sync(&fs_info->async_reclaim_work);
>         cancel_work_sync(&fs_info->async_data_reclaim_work);
> @@ -4540,10 +4542,6 @@ static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info)
>         /* cleanup FS via transaction */
>         btrfs_cleanup_transaction(fs_info);
>
> -       mutex_lock(&fs_info->cleaner_mutex);
> -       btrfs_run_delayed_iputs(fs_info);
> -       mutex_unlock(&fs_info->cleaner_mutex);
> -
>         down_write(&fs_info->cleanup_work_sem);
>         up_write(&fs_info->cleanup_work_sem);
>  }
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index b8c2e59ffc43..3a9e0a4c54e5 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -117,6 +117,9 @@ enum {
>         /* Indicates there was an error cleaning up a log tree. */
>         BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
>
> +       /* No more delayed iput can be queued. */
> +       BTRFS_FS_STATE_NO_DELAYED_IPUT,
> +
>         BTRFS_FS_STATE_COUNT
>  };
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8ac4858b70e7..d2bf81c08f13 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
>         if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
>                 return;
>
> +#ifdef CONFIG_BTRFS_DEBUG
> +       WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
> +#endif

As commented before, I would not add this under CONFIG_BTRFS_DEBUG,
just leave it there to always run.
Like that it'll be much faster to discover issues, as users typically
don't run a kernel with CONFIG_BTRFS_DEBUG.

Thanks.


> +
>         atomic_inc(&fs_info->nr_delayed_iputs);
>         /*
>          * Need to be irq safe here because we can be called from either an irq
> --
> 2.48.1
>
>
Qu Wenruo March 11, 2025, 3:02 a.m. UTC | #2
在 2025/3/10 21:12, Filipe Manana 写道:
> On Mon, Mar 10, 2025 at 5:15 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Since I have triggered the ASSERT() on the delayed iput too many times,
>> now is the time to add some extra debug warnings for delayed iput.
>>
>> All delayed iputs should be queued after all ordered extents finish
>> their IO and all involved workqueues are flushed.
>>
>> Thus after the btrfs_run_delayed_iputs() inside close_ctree(), there
>> should be no more delayed puts added.
>>
>> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT, set after the above
>> mentioned timing.
>> And all btrfs_add_delayed_iput() will check that flag and give a
>> WARN_ON_ONCE() for debug builds.
>>
>> And since we're here, remove the btrfs_run_delayed_iputs() call inside
>> btrfs_error_commit_super().
>
> This is a bad idea to make such unrelated change in the same patch.
> After all this one is about adding a warning and making things easier
> to catch.
>
> It should either go as a separate patch or it fits much better in the
> previous patch ("btrfs: run btrfs_error_commit_super() early").

Sure, I also prefer to push the removal of btrfs_run_delayed_iputs()
into that commit.

I'll update that one with extra commit message.

>
>
>> As that function will only wait for ordered extents to finish their IO,
>> delayed iputs will only be added when those ordered extents all finished
>> and removed from io tree, this is only ensured after the workqueue
>
> It's not "io tree", that's used for extent locking, setting delalloc
> ranges, etc.
> You mean the ordered extents tree in the inode (struct
> btrfs_inode::ordered_tree).
>
>> flush.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Set the new flag early
>> - Make the new flag unconditional except the WARN_ON_ONCE() check
>> - Remove the btrfs_run_delayed_iputs() inside btrfs_error_commit_super()
>> ---
>>   fs/btrfs/disk-io.c | 6 ++----
>>   fs/btrfs/fs.h      | 3 +++
>>   fs/btrfs/inode.c   | 4 ++++
>>   3 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 671f11787f34..466d9c434030 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -4384,6 +4384,8 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>>          /* Ordered extents for free space inodes. */
>>          btrfs_flush_workqueue(fs_info->endio_freespace_worker);
>>          btrfs_run_delayed_iputs(fs_info);
>> +       /* There should be no more workload to generate new delayed iputs. */
>> +       set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
>>
>>          cancel_work_sync(&fs_info->async_reclaim_work);
>>          cancel_work_sync(&fs_info->async_data_reclaim_work);
>> @@ -4540,10 +4542,6 @@ static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info)
>>          /* cleanup FS via transaction */
>>          btrfs_cleanup_transaction(fs_info);
>>
>> -       mutex_lock(&fs_info->cleaner_mutex);
>> -       btrfs_run_delayed_iputs(fs_info);
>> -       mutex_unlock(&fs_info->cleaner_mutex);
>> -
>>          down_write(&fs_info->cleanup_work_sem);
>>          up_write(&fs_info->cleanup_work_sem);
>>   }
>> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
>> index b8c2e59ffc43..3a9e0a4c54e5 100644
>> --- a/fs/btrfs/fs.h
>> +++ b/fs/btrfs/fs.h
>> @@ -117,6 +117,9 @@ enum {
>>          /* Indicates there was an error cleaning up a log tree. */
>>          BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
>>
>> +       /* No more delayed iput can be queued. */
>> +       BTRFS_FS_STATE_NO_DELAYED_IPUT,
>> +
>>          BTRFS_FS_STATE_COUNT
>>   };
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 8ac4858b70e7..d2bf81c08f13 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
>>          if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
>>                  return;
>>
>> +#ifdef CONFIG_BTRFS_DEBUG
>> +       WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
>> +#endif
>
> As commented before, I would not add this under CONFIG_BTRFS_DEBUG,
> just leave it there to always run.
> Like that it'll be much faster to discover issues, as users typically
> don't run a kernel with CONFIG_BTRFS_DEBUG.

OK, since it's already WARN_ON_ONCE(), which won't flood the dmesg again
and again, I'm also fine put it into production kernels.

Thanks,
Qu

>
> Thanks.
>
>
>> +
>>          atomic_inc(&fs_info->nr_delayed_iputs);
>>          /*
>>           * Need to be irq safe here because we can be called from either an irq
>> --
>> 2.48.1
>>
>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 671f11787f34..466d9c434030 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4384,6 +4384,8 @@  void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	/* Ordered extents for free space inodes. */
 	btrfs_flush_workqueue(fs_info->endio_freespace_worker);
 	btrfs_run_delayed_iputs(fs_info);
+	/* There should be no more workload to generate new delayed iputs. */
+	set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
 
 	cancel_work_sync(&fs_info->async_reclaim_work);
 	cancel_work_sync(&fs_info->async_data_reclaim_work);
@@ -4540,10 +4542,6 @@  static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info)
 	/* cleanup FS via transaction */
 	btrfs_cleanup_transaction(fs_info);
 
-	mutex_lock(&fs_info->cleaner_mutex);
-	btrfs_run_delayed_iputs(fs_info);
-	mutex_unlock(&fs_info->cleaner_mutex);
-
 	down_write(&fs_info->cleanup_work_sem);
 	up_write(&fs_info->cleanup_work_sem);
 }
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index b8c2e59ffc43..3a9e0a4c54e5 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -117,6 +117,9 @@  enum {
 	/* Indicates there was an error cleaning up a log tree. */
 	BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
 
+	/* No more delayed iput can be queued. */
+	BTRFS_FS_STATE_NO_DELAYED_IPUT,
+
 	BTRFS_FS_STATE_COUNT
 };
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8ac4858b70e7..d2bf81c08f13 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3435,6 +3435,10 @@  void btrfs_add_delayed_iput(struct btrfs_inode *inode)
 	if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
 		return;
 
+#ifdef CONFIG_BTRFS_DEBUG
+	WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
+#endif
+
 	atomic_inc(&fs_info->nr_delayed_iputs);
 	/*
 	 * Need to be irq safe here because we can be called from either an irq