diff mbox series

[v2,1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers

Message ID e1cf2949e4b03fba268f923947543bbf4a7b6752.1741198394.git.fdmanana@suse.com (mailing list archive)
State New
Headers show
Series btrfs: fix unexpected delayed iputs at umount time and cleanups | expand

Commit Message

Filipe Manana March 5, 2025, 6:17 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

At close_ctree() after we have ran delayed iputs either through explicitly
calling btrfs_run_delayed_iputs() or later during the call to
btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
delayed iputs list is empty.

Sometimes this assertion may fail because delayed iputs may have been
added to the list after we last ran delayed iputs, and this happens due
to workers in the endio_workers workqueue still running. These workers can
do a final put on an ordered extent attached to a data bio, which results
in adding a delayed iput. This is done at btrfs_bio_end_io() and its
helper __btrfs_bio_end_io().

Fix this by flushing the endio_workers workqueue before running delayed
iputs at close_ctree().

David reported this when running generic/648.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct btrfs_bio")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/disk-io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Qu Wenruo March 5, 2025, 10:46 p.m. UTC | #1
在 2025/3/6 04:47, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
> 
> At close_ctree() after we have ran delayed iputs either through explicitly
> calling btrfs_run_delayed_iputs() or later during the call to
> btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
> delayed iputs list is empty.
> 
> Sometimes this assertion may fail because delayed iputs may have been
> added to the list after we last ran delayed iputs, and this happens due
> to workers in the endio_workers workqueue still running. These workers can
> do a final put on an ordered extent attached to a data bio, which results
> in adding a delayed iput. This is done at btrfs_bio_end_io() and its
> helper __btrfs_bio_end_io().

But the endio_workers workqueue is only utilized by data READ 
operations, in function btrfs_end_io_wq(), which is called in 
btrfs_simple_end_io().

Furthermore, for the endio_workers workqueue, for data inodes it only 
handles btrfs_check_read_bio(), which shouldn't generate ordered extent 
either.

Did I miss something here?

Thanks,
Qu

> 
> Fix this by flushing the endio_workers workqueue before running delayed
> iputs at close_ctree().
> 
> David reported this when running generic/648.
> 
> Reported-by: David Sterba <dsterba@suse.com>
> Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct btrfs_bio")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/disk-io.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d96ea974ef73..b6194ae97361 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4340,6 +4340,15 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>   	 */
>   	btrfs_flush_workqueue(fs_info->delalloc_workers);
>   
> +	/*
> +	 * We can also have ordered extents getting their last reference dropped
> +	 * from the endio_workers workqueue because for data bios we keep a
> +	 * reference on an ordered extent which gets dropped when running
> +	 * btrfs_bio_end_io() in that workqueue, and that final drop results in
> +	 * adding a delayed iput for the inode.
> +	 */
> +	flush_workqueue(fs_info->endio_workers);
> +
>   	/*
>   	 * After we parked the cleaner kthread, ordered extents may have
>   	 * completed and created new delayed iputs. If one of the async reclaim
Qu Wenruo March 5, 2025, 10:50 p.m. UTC | #2
在 2025/3/6 09:16, Qu Wenruo 写道:
> 
> 
> 在 2025/3/6 04:47, fdmanana@kernel.org 写道:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> At close_ctree() after we have ran delayed iputs either through 
>> explicitly
>> calling btrfs_run_delayed_iputs() or later during the call to
>> btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
>> delayed iputs list is empty.
>>
>> Sometimes this assertion may fail because delayed iputs may have been
>> added to the list after we last ran delayed iputs, and this happens due
>> to workers in the endio_workers workqueue still running. These workers 
>> can
>> do a final put on an ordered extent attached to a data bio, which results
>> in adding a delayed iput. This is done at btrfs_bio_end_io() and its
>> helper __btrfs_bio_end_io().
> 
> But the endio_workers workqueue is only utilized by data READ 
> operations, in function btrfs_end_io_wq(), which is called in 
> btrfs_simple_end_io().
> 
> Furthermore, for the endio_workers workqueue, for data inodes it only 
> handles btrfs_check_read_bio(), which shouldn't generate ordered extent 
> either.
> 
> Did I miss something here?

Oh, I missed the recently disabled (for subpage) uncached io.

So I guess the real fixes tag should be that not-yet-upstreamed uncached 
io patch.

Thanks,
Qu

> 
> Thanks,
> Qu
> 
>>
>> Fix this by flushing the endio_workers workqueue before running delayed
>> iputs at close_ctree().
>>
>> David reported this when running generic/648.
>>
>> Reported-by: David Sterba <dsterba@suse.com>
>> Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct 
>> btrfs_bio")
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> ---
>>   fs/btrfs/disk-io.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d96ea974ef73..b6194ae97361 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -4340,6 +4340,15 @@ void __cold close_ctree(struct btrfs_fs_info 
>> *fs_info)
>>        */
>>       btrfs_flush_workqueue(fs_info->delalloc_workers);
>> +    /*
>> +     * We can also have ordered extents getting their last reference 
>> dropped
>> +     * from the endio_workers workqueue because for data bios we keep a
>> +     * reference on an ordered extent which gets dropped when running
>> +     * btrfs_bio_end_io() in that workqueue, and that final drop 
>> results in
>> +     * adding a delayed iput for the inode.
>> +     */
>> +    flush_workqueue(fs_info->endio_workers);
>> +
>>       /*
>>        * After we parked the cleaner kthread, ordered extents may have
>>        * completed and created new delayed iputs. If one of the async 
>> reclaim
> 
>
Filipe Manana March 5, 2025, 11:14 p.m. UTC | #3
On Wed, Mar 5, 2025 at 10:50 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2025/3/6 09:16, Qu Wenruo 写道:
> >
> >
> > 在 2025/3/6 04:47, fdmanana@kernel.org 写道:
> >> From: Filipe Manana <fdmanana@suse.com>
> >>
> >> At close_ctree() after we have ran delayed iputs either through
> >> explicitly
> >> calling btrfs_run_delayed_iputs() or later during the call to
> >> btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
> >> delayed iputs list is empty.
> >>
> >> Sometimes this assertion may fail because delayed iputs may have been
> >> added to the list after we last ran delayed iputs, and this happens due
> >> to workers in the endio_workers workqueue still running. These workers
> >> can
> >> do a final put on an ordered extent attached to a data bio, which results
> >> in adding a delayed iput. This is done at btrfs_bio_end_io() and its
> >> helper __btrfs_bio_end_io().
> >
> > But the endio_workers workqueue is only utilized by data READ
> > operations, in function btrfs_end_io_wq(), which is called in
> > btrfs_simple_end_io().
> >
> > Furthermore, for the endio_workers workqueue, for data inodes it only
> > handles btrfs_check_read_bio(), which shouldn't generate ordered extent
> > either.
> >
> > Did I miss something here?
>
> Oh, I missed the recently disabled (for subpage) uncached io.
>
> So I guess the real fixes tag should be that not-yet-upstreamed uncached
> io patch.

Oh yes, I picked a wrong commit somehow.

Since the uncached stuff is not in Linus' tree, I'll remove the Fixes
tag and just mention it happens for uncached IO added recently and the
subject of the patch that introduced it.

Thanks.

>
> Thanks,
> Qu
>
> >
> > Thanks,
> > Qu
> >
> >>
> >> Fix this by flushing the endio_workers workqueue before running delayed
> >> iputs at close_ctree().
> >>
> >> David reported this when running generic/648.
> >>
> >> Reported-by: David Sterba <dsterba@suse.com>
> >> Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct
> >> btrfs_bio")
> >> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >> ---
> >>   fs/btrfs/disk-io.c | 9 +++++++++
> >>   1 file changed, 9 insertions(+)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index d96ea974ef73..b6194ae97361 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -4340,6 +4340,15 @@ void __cold close_ctree(struct btrfs_fs_info
> >> *fs_info)
> >>        */
> >>       btrfs_flush_workqueue(fs_info->delalloc_workers);
> >> +    /*
> >> +     * We can also have ordered extents getting their last reference
> >> dropped
> >> +     * from the endio_workers workqueue because for data bios we keep a
> >> +     * reference on an ordered extent which gets dropped when running
> >> +     * btrfs_bio_end_io() in that workqueue, and that final drop
> >> results in
> >> +     * adding a delayed iput for the inode.
> >> +     */
> >> +    flush_workqueue(fs_info->endio_workers);
> >> +
> >>       /*
> >>        * After we parked the cleaner kthread, ordered extents may have
> >>        * completed and created new delayed iputs. If one of the async
> >> reclaim
> >
> >
>
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d96ea974ef73..b6194ae97361 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4340,6 +4340,15 @@  void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	 */
 	btrfs_flush_workqueue(fs_info->delalloc_workers);
 
+	/*
+	 * We can also have ordered extents getting their last reference dropped
+	 * from the endio_workers workqueue because for data bios we keep a
+	 * reference on an ordered extent which gets dropped when running
+	 * btrfs_bio_end_io() in that workqueue, and that final drop results in
+	 * adding a delayed iput for the inode.
+	 */
+	flush_workqueue(fs_info->endio_workers);
+
 	/*
 	 * After we parked the cleaner kthread, ordered extents may have
 	 * completed and created new delayed iputs. If one of the async reclaim