diff mbox series

[03/20] btrfs: handle block group lookup error when it's being removed

Message ID cfe1bf94b8a6f24407795d3e1823a187ead04570.1706130791.git.dsterba@suse.com (mailing list archive)
State New, archived
Headers show
Series Error handling fixes | expand

Commit Message

David Sterba Jan. 24, 2024, 9:17 p.m. UTC
The unlikely case of lookup error in btrfs_remove_block_group() can be
handled properly, in its caller this would lead to a transaction abort.
We can't do anything else, a block group must have been loaded first.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/block-group.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Qu Wenruo Jan. 25, 2024, 3:58 a.m. UTC | #1
On 2024/1/25 07:47, David Sterba wrote:
> The unlikely case of lookup error in btrfs_remove_block_group() can be
> handled properly, in its caller this would lead to a transaction abort.
> We can't do anything else, a block group must have been loaded first.
>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/block-group.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 1905d76772a9..16a2b8609989 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>   	bool remove_rsv = false;
>
>   	block_group = btrfs_lookup_block_group(fs_info, map->start);
> -	BUG_ON(!block_group);
> +	if (!block_group)
> +		return -ENOENT;

This -ENOENT return value is fine, as the only caller would call
btrfs_abort_transaction() to be noisy enough.

And talking about btrfs_abort_transaction(), I think we can call it
early to make debug a little easierly.

> +
>   	BUG_ON(!block_group->ro);

But shouldn't we also handle the RO case?

Thanks,
Qu

>
>   	trace_btrfs_remove_block_group(block_group);
David Sterba Jan. 25, 2024, 4:12 p.m. UTC | #2
On Thu, Jan 25, 2024 at 02:28:02PM +1030, Qu Wenruo wrote:
> 
> 
> On 2024/1/25 07:47, David Sterba wrote:
> > The unlikely case of lookup error in btrfs_remove_block_group() can be
> > handled properly, in its caller this would lead to a transaction abort.
> > We can't do anything else, a block group must have been loaded first.
> >
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> >   fs/btrfs/block-group.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index 1905d76772a9..16a2b8609989 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> >   	bool remove_rsv = false;
> >
> >   	block_group = btrfs_lookup_block_group(fs_info, map->start);
> > -	BUG_ON(!block_group);
> > +	if (!block_group)
> > +		return -ENOENT;
> 
> This -ENOENT return value is fine, as the only caller would call
> btrfs_abort_transaction() to be noisy enough.
> 
> And talking about btrfs_abort_transaction(), I think we can call it
> early to make debug a little easierly.

There are several patterns, one is that transaction abort is called by
the function that started it. It's not consistent but as a hint abort
can be used anywhere if things go so bad that it's impossible to roll
back, eg. in a middle of a big loop setting up block groups and such.

> > +
> >   	BUG_ON(!block_group->ro);
> 
> But shouldn't we also handle the RO case?

Of course but it's fixing a different problem with tracking of read-only
status of a bg and I will not mix that to that patch.
Qu Wenruo Jan. 25, 2024, 11:09 p.m. UTC | #3
On 2024/1/26 02:42, David Sterba wrote:
> On Thu, Jan 25, 2024 at 02:28:02PM +1030, Qu Wenruo wrote:
>>
>>
>> On 2024/1/25 07:47, David Sterba wrote:
>>> The unlikely case of lookup error in btrfs_remove_block_group() can be
>>> handled properly, in its caller this would lead to a transaction abort.
>>> We can't do anything else, a block group must have been loaded first.
>>>
>>> Signed-off-by: David Sterba <dsterba@suse.com>
>>> ---
>>>    fs/btrfs/block-group.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>>> index 1905d76772a9..16a2b8609989 100644
>>> --- a/fs/btrfs/block-group.c
>>> +++ b/fs/btrfs/block-group.c
>>> @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>>    	bool remove_rsv = false;
>>>
>>>    	block_group = btrfs_lookup_block_group(fs_info, map->start);
>>> -	BUG_ON(!block_group);
>>> +	if (!block_group)
>>> +		return -ENOENT;
>>
>> This -ENOENT return value is fine, as the only caller would call
>> btrfs_abort_transaction() to be noisy enough.
>>
>> And talking about btrfs_abort_transaction(), I think we can call it
>> early to make debug a little easierly.
>
> There are several patterns, one is that transaction abort is called by
> the function that started it. It's not consistent but as a hint abort
> can be used anywhere if things go so bad that it's impossible to roll
> back, eg. in a middle of a big loop setting up block groups and such.

In this case, I don't think we can do anything to really rollback, thus
aborting early would help debug and provide a better call trace.

Thanks,
Qu

>
>>> +
>>>    	BUG_ON(!block_group->ro);
>>
>> But shouldn't we also handle the RO case?
>
> Of course but it's fixing a different problem with tracking of read-only
> status of a bg and I will not mix that to that patch.
>
David Sterba Jan. 29, 2024, 3:49 p.m. UTC | #4
On Fri, Jan 26, 2024 at 09:39:14AM +1030, Qu Wenruo wrote:
> 
> 
> On 2024/1/26 02:42, David Sterba wrote:
> > On Thu, Jan 25, 2024 at 02:28:02PM +1030, Qu Wenruo wrote:
> >>
> >>
> >> On 2024/1/25 07:47, David Sterba wrote:
> >>> The unlikely case of lookup error in btrfs_remove_block_group() can be
> >>> handled properly, in its caller this would lead to a transaction abort.
> >>> We can't do anything else, a block group must have been loaded first.
> >>>
> >>> Signed-off-by: David Sterba <dsterba@suse.com>
> >>> ---
> >>>    fs/btrfs/block-group.c | 4 +++-
> >>>    1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> >>> index 1905d76772a9..16a2b8609989 100644
> >>> --- a/fs/btrfs/block-group.c
> >>> +++ b/fs/btrfs/block-group.c
> >>> @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> >>>    	bool remove_rsv = false;
> >>>
> >>>    	block_group = btrfs_lookup_block_group(fs_info, map->start);
> >>> -	BUG_ON(!block_group);
> >>> +	if (!block_group)
> >>> +		return -ENOENT;
> >>
> >> This -ENOENT return value is fine, as the only caller would call
> >> btrfs_abort_transaction() to be noisy enough.
> >>
> >> And talking about btrfs_abort_transaction(), I think we can call it
> >> early to make debug a little easierly.
> >
> > There are several patterns, one is that transaction abort is called by
> > the function that started it. It's not consistent but as a hint abort
> > can be used anywhere if things go so bad that it's impossible to roll
> > back, eg. in a middle of a big loop setting up block groups and such.
> 
> In this case, I don't think we can do anything to really rollback, thus
> aborting early would help debug and provide a better call trace.

Yeah, although looking again what btrfs_remove_block_group() does, there
are more error checks and none of them does transaction abort.

If we want to do make it the recommended way, then it's "once there's a
transaction handle, any error that's not recoverable must do transaction
abort". The decision is in the 'recoverable'. That would also require
audit everything once we settle on the way it's supposed to be handled.
I could add it here but then it's going to be an outlier, I'd rather do
another pass and keep things simpler for now.
diff mbox series

Patch

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 1905d76772a9..16a2b8609989 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1063,7 +1063,9 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	bool remove_rsv = false;
 
 	block_group = btrfs_lookup_block_group(fs_info, map->start);
-	BUG_ON(!block_group);
+	if (!block_group)
+		return -ENOENT;
+
 	BUG_ON(!block_group->ro);
 
 	trace_btrfs_remove_block_group(block_group);