Message ID | cfe1bf94b8a6f24407795d3e1823a187ead04570.1706130791.git.dsterba@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Error handling fixes | expand |
On 2024/1/25 07:47, David Sterba wrote: > The unlikely case of lookup error in btrfs_remove_block_group() can be > handled properly, in its caller this would lead to a transaction abort. > We can't do anything else, a block group must have been loaded first. > > Signed-off-by: David Sterba <dsterba@suse.com> > --- > fs/btrfs/block-group.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index 1905d76772a9..16a2b8609989 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, > bool remove_rsv = false; > > block_group = btrfs_lookup_block_group(fs_info, map->start); > - BUG_ON(!block_group); > + if (!block_group) > + return -ENOENT; This -ENOENT return value is fine, as the only caller would call btrfs_abort_transaction() to be noisy enough. And talking about btrfs_abort_transaction(), I think we can call it early to make debug a little easierly. > + > BUG_ON(!block_group->ro); But shouldn't we also handle the RO case? Thanks, Qu > > trace_btrfs_remove_block_group(block_group);
On Thu, Jan 25, 2024 at 02:28:02PM +1030, Qu Wenruo wrote: > > > On 2024/1/25 07:47, David Sterba wrote: > > The unlikely case of lookup error in btrfs_remove_block_group() can be > > handled properly, in its caller this would lead to a transaction abort. > > We can't do anything else, a block group must have been loaded first. > > > > Signed-off-by: David Sterba <dsterba@suse.com> > > --- > > fs/btrfs/block-group.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > index 1905d76772a9..16a2b8609989 100644 > > --- a/fs/btrfs/block-group.c > > +++ b/fs/btrfs/block-group.c > > @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, > > bool remove_rsv = false; > > > > block_group = btrfs_lookup_block_group(fs_info, map->start); > > - BUG_ON(!block_group); > > + if (!block_group) > > + return -ENOENT; > > This -ENOENT return value is fine, as the only caller would call > btrfs_abort_transaction() to be noisy enough. > > And talking about btrfs_abort_transaction(), I think we can call it > early to make debug a little easierly. There are several patterns, one is that transaction abort is called by the function that started it. It's not consistent but as a hint abort can be used anywhere if things go so bad that it's impossible to roll back, eg. in a middle of a big loop setting up block groups and such. > > + > > BUG_ON(!block_group->ro); > > But shouldn't we also handle the RO case? Of course but it's fixing a different problem with tracking of read-only status of a bg and I will not mix that to that patch.
On 2024/1/26 02:42, David Sterba wrote: > On Thu, Jan 25, 2024 at 02:28:02PM +1030, Qu Wenruo wrote: >> >> >> On 2024/1/25 07:47, David Sterba wrote: >>> The unlikely case of lookup error in btrfs_remove_block_group() can be >>> handled properly, in its caller this would lead to a transaction abort. >>> We can't do anything else, a block group must have been loaded first. >>> >>> Signed-off-by: David Sterba <dsterba@suse.com> >>> --- >>> fs/btrfs/block-group.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c >>> index 1905d76772a9..16a2b8609989 100644 >>> --- a/fs/btrfs/block-group.c >>> +++ b/fs/btrfs/block-group.c >>> @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, >>> bool remove_rsv = false; >>> >>> block_group = btrfs_lookup_block_group(fs_info, map->start); >>> - BUG_ON(!block_group); >>> + if (!block_group) >>> + return -ENOENT; >> >> This -ENOENT return value is fine, as the only caller would call >> btrfs_abort_transaction() to be noisy enough. >> >> And talking about btrfs_abort_transaction(), I think we can call it >> early to make debug a little easierly. > > There are several patterns, one is that transaction abort is called by > the function that started it. It's not consistent but as a hint abort > can be used anywhere if things go so bad that it's impossible to roll > back, eg. in a middle of a big loop setting up block groups and such. In this case, I don't think we can do anything to really rollback, thus aborting early would help debug and provide a better call trace. Thanks, Qu > >>> + >>> BUG_ON(!block_group->ro); >> >> But shouldn't we also handle the RO case? > > Of course but it's fixing a different problem with tracking of read-only > status of a bg and I will not mix that to that patch. >
On Fri, Jan 26, 2024 at 09:39:14AM +1030, Qu Wenruo wrote: > > > On 2024/1/26 02:42, David Sterba wrote: > > On Thu, Jan 25, 2024 at 02:28:02PM +1030, Qu Wenruo wrote: > >> > >> > >> On 2024/1/25 07:47, David Sterba wrote: > >>> The unlikely case of lookup error in btrfs_remove_block_group() can be > >>> handled properly, in its caller this would lead to a transaction abort. > >>> We can't do anything else, a block group must have been loaded first. > >>> > >>> Signed-off-by: David Sterba <dsterba@suse.com> > >>> --- > >>> fs/btrfs/block-group.c | 4 +++- > >>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > >>> index 1905d76772a9..16a2b8609989 100644 > >>> --- a/fs/btrfs/block-group.c > >>> +++ b/fs/btrfs/block-group.c > >>> @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, > >>> bool remove_rsv = false; > >>> > >>> block_group = btrfs_lookup_block_group(fs_info, map->start); > >>> - BUG_ON(!block_group); > >>> + if (!block_group) > >>> + return -ENOENT; > >> > >> This -ENOENT return value is fine, as the only caller would call > >> btrfs_abort_transaction() to be noisy enough. > >> > >> And talking about btrfs_abort_transaction(), I think we can call it > >> early to make debug a little easierly. > > > > There are several patterns, one is that transaction abort is called by > > the function that started it. It's not consistent but as a hint abort > > can be used anywhere if things go so bad that it's impossible to roll > > back, eg. in a middle of a big loop setting up block groups and such. > > In this case, I don't think we can do anything to really rollback, thus > aborting early would help debug and provide a better call trace. Yeah, although looking again what btrfs_remove_block_group() does, there are more error checks and none of them does transaction abort. If we want to do make it the recommended way, then it's "once there's a transaction handle, any error that's not recoverable must do transaction abort". The decision is in the 'recoverable'. That would also require audit everything once we settle on the way it's supposed to be handled. I could add it here but then it's going to be an outlier, I'd rather do another pass and keep things simpler for now.
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1905d76772a9..16a2b8609989 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1063,7 +1063,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, bool remove_rsv = false; block_group = btrfs_lookup_block_group(fs_info, map->start); - BUG_ON(!block_group); + if (!block_group) + return -ENOENT; + BUG_ON(!block_group->ro); trace_btrfs_remove_block_group(block_group);
The unlikely case of lookup error in btrfs_remove_block_group() can be handled properly, in its caller this would lead to a transaction abort. We can't do anything else, a block group must have been loaded first. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/block-group.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)