[0/5] remove BUG_ON()s in btrfs_close_one_device()
diff mbox

Message ID 20191112122416.30672-1-jthumshirn@suse.de
State New
Headers show

Commit Message

Johannes Thumshirn Nov. 12, 2019, 12:24 p.m. UTC
This series attempts to remove the BUG_ON()s in btrfs_close_one_device().
Therefore some reorganization of btrfs_close_one_device() and
close_fs_devices() was needed.

Forthermore a new BUG_ON() had to be temporarily introduced but is removed
again in the last patch of theis series.

Although it is generally legal to return -ENOMEM on umount(2), the error
handling up until close_ctree() as neither close_ctree() nor btrfs_put_super()
would be able to handle the error.

This series has passed fstests without any deviation from the baseline and
also the new error handling was tested via error injection using this snippet:


Johannes Thumshirn (5):
  btrfs: decrement number of open devices after closing the device not
    before
  btrfs: handle device allocation failure in btrfs_close_one_device()
  btrfs: handle allocation failure in strdup
  btrfs: handle error return of close_fs_devices()
  btrfs: remove final BUG_ON() in close_fs_devices()

 fs/btrfs/volumes.c | 68 ++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 48 insertions(+), 20 deletions(-)

Comments

Qu Wenruo Nov. 12, 2019, 12:40 p.m. UTC | #1
On 2019/11/12 下午8:24, Johannes Thumshirn wrote:
> This series attempts to remove the BUG_ON()s in btrfs_close_one_device().
> Therefore some reorganization of btrfs_close_one_device() and
> close_fs_devices() was needed.
>
> Forthermore a new BUG_ON() had to be temporarily introduced but is removed
> again in the last patch of theis series.
>
> Although it is generally legal to return -ENOMEM on umount(2), the error
> handling up until close_ctree() as neither close_ctree() nor btrfs_put_super()
> would be able to handle the error.
>
> This series has passed fstests without any deviation from the baseline and
> also the new error handling was tested via error injection using this snippet:
>

Good patchset, but for error injection we have more formal way to do that.

ALLOW_ERROR_INJECTION() macro along with BPF to override function return
value.

There is even a more generic script to do that:
https://github.com/adam900710/btrfs-profiler/blob/master/inject.py

Such tool allow us to only inject error when certain call sites are met,
so it can be pretty handful.

Thanks,
Qu

> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7c55169c0613..c58802c9c39c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1069,6 +1069,9 @@ static int btrfs_close_one_device(struct btrfs_device *device)
>
>  	new_device = btrfs_alloc_device(NULL, &device->devid,
>  					device->uuid);
> +	btrfs_free_device(new_device);
> +	pr_err("%s() INJECTING -ENOMEM\n", __func__);
> +	new_device = ERR_PTR(-ENOMEM);
>  	if (IS_ERR(new_device))
>  		return PTR_ERR(new_device);
>
>
> Johannes Thumshirn (5):
>   btrfs: decrement number of open devices after closing the device not
>     before
>   btrfs: handle device allocation failure in btrfs_close_one_device()
>   btrfs: handle allocation failure in strdup
>   btrfs: handle error return of close_fs_devices()
>   btrfs: remove final BUG_ON() in close_fs_devices()
>
>  fs/btrfs/volumes.c | 68 ++++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 48 insertions(+), 20 deletions(-)
>
David Sterba Nov. 12, 2019, 12:49 p.m. UTC | #2
On Tue, Nov 12, 2019 at 01:24:11PM +0100, Johannes Thumshirn wrote:
> This series attempts to remove the BUG_ON()s in btrfs_close_one_device().
> Therefore some reorganization of btrfs_close_one_device() and
> close_fs_devices() was needed.
> 
> Forthermore a new BUG_ON() had to be temporarily introduced but is removed
> again in the last patch of theis series.

Yeah, that's fine.

> Although it is generally legal to return -ENOMEM on umount(2), the error
> handling up until close_ctree() as neither close_ctree() nor btrfs_put_super()
> would be able to handle the error.
> 
> This series has passed fstests without any deviation from the baseline and
> also the new error handling was tested via error injection using this snippet:

This BUG_ON is one of the syzbot reports so once this patchset is
reviewed, we can ask for a retest.

I have a WIP fix that tries to avoid ENOMEM completely as it's not
strictly necessary. The empty device is like a placeholder in the list
while the real device is RCU-removed. The fix is to mark the device as
"being deleted" so it can be skipped but this is more intrusive and more
error prone than handling the error.

As this is not a frequent error at all, syzbot artifically limits the
amount of memory, otherwise we haven't seen the ENOMEM in practice. So
the fix is IMO sufficient for now.
Johannes Thumshirn Nov. 12, 2019, 12:57 p.m. UTC | #3
On 12/11/2019 13:40, Qu Wenruo wrote:
[...]
> Good patchset, but for error injection we have more formal way to do that.
> 
> ALLOW_ERROR_INJECTION() macro along with BPF to override function return
> value.
> 
> There is even a more generic script to do that:
> https://github.com/adam900710/btrfs-profiler/blob/master/inject.py
> 
> Such tool allow us to only inject error when certain call sites are met,
> so it can be pretty handful.

Yes I know, but my current test environment is not yet able to handle
this. So I resorted to just throw in some hand crafted error injection here.

If we find injecting errors in the alloc_device() code path is
worthwhile I'll add an ALLOW_ERROR_INJECTION() there.

Thanks,
	Johannes

Patch
diff mbox

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7c55169c0613..c58802c9c39c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1069,6 +1069,9 @@  static int btrfs_close_one_device(struct btrfs_device *device)
 
 	new_device = btrfs_alloc_device(NULL, &device->devid,
 					device->uuid);
+	btrfs_free_device(new_device);
+	pr_err("%s() INJECTING -ENOMEM\n", __func__);
+	new_device = ERR_PTR(-ENOMEM);
 	if (IS_ERR(new_device))
 		return PTR_ERR(new_device);