diff mbox series

btrfs: handle error in btrfs_cache_block_group

Message ID 20191119185900.2985-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs: handle error in btrfs_cache_block_group | expand

Commit Message

Josef Bacik Nov. 19, 2019, 6:59 p.m. UTC
We have a BUG_ON(ret < 0) in find_free_extent from
btrfs_cache_block_group.  If we fail to allocate our ctl we'll just
panic, which is not good.  Instead just go on to another block group.
If we fail to find a block group we don't want to return ENOSPC, because
really we got a ENOMEM and that's the root of the problem.  Save our
return from btrfs_cache_block_group(), and then if we still fail to make
our allocation return that ret so we get the right error back.

Tested with inject-error.py from bcc.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Johannes Thumshirn Nov. 20, 2019, 9:24 a.m. UTC | #1
Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
David Sterba Nov. 25, 2019, 1:07 p.m. UTC | #2
On Tue, Nov 19, 2019 at 01:59:00PM -0500, Josef Bacik wrote:
> We have a BUG_ON(ret < 0) in find_free_extent from
> btrfs_cache_block_group.  If we fail to allocate our ctl we'll just
> panic, which is not good.  Instead just go on to another block group.
> If we fail to find a block group we don't want to return ENOSPC, because
> really we got a ENOMEM and that's the root of the problem.  Save our
> return from btrfs_cache_block_group(), and then if we still fail to make
> our allocation return that ret so we get the right error back.
> 
> Tested with inject-error.py from bcc.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 153f71a5bba9..18df434bfe52 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3799,6 +3799,7 @@  static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 				u64 flags, int delalloc)
 {
 	int ret = 0;
+	int cache_block_group_error = 0;
 	struct btrfs_free_cluster *last_ptr = NULL;
 	struct btrfs_block_group *block_group = NULL;
 	struct find_free_extent_ctl ffe_ctl = {0};
@@ -3958,7 +3959,20 @@  static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		if (unlikely(!ffe_ctl.cached)) {
 			ffe_ctl.have_caching_bg = true;
 			ret = btrfs_cache_block_group(block_group, 0);
-			BUG_ON(ret < 0);
+
+			/*
+			 * If we get ENOMEM here or something else we want to
+			 * try other block groups, because it may not be fatal.
+			 * However if we can't find anything else we need to
+			 * save our return here so that we return the actual
+			 * error that caused problems, not ENOSPC.
+			 */
+			if (ret < 0) {
+				if (!cache_block_group_error)
+					cache_block_group_error = ret;
+				ret = 0;
+				goto loop;
+			}
 			ret = 0;
 		}
 
@@ -4045,7 +4059,7 @@  static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 	if (ret > 0)
 		goto search;
 
-	if (ret == -ENOSPC) {
+	if (ret == -ENOSPC && !cache_block_group_error) {
 		/*
 		 * Use ffe_ctl->total_free_space as fallback if we can't find
 		 * any contiguous hole.
@@ -4056,6 +4070,8 @@  static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 		space_info->max_extent_size = ffe_ctl.max_extent_size;
 		spin_unlock(&space_info->lock);
 		ins->offset = ffe_ctl.max_extent_size;
+	} else if (ret == -ENOSPC) {
+		ret = cache_block_group_error;
 	}
 	return ret;
 }