diff mbox series

btrfs: fallback if compressed IO fails for ENOSPC

Message ID 4607d44ee4cf72a302d3adeba5d0ae99518a5f36.1712391866.git.sweettea-kernel@dorminy.me (mailing list archive)
State New
Headers show
Series btrfs: fallback if compressed IO fails for ENOSPC | expand

Commit Message

Sweet Tea Dorminy April 6, 2024, 8:45 a.m. UTC
In commit b4ccace878f4 ("btrfs: refactor submit_compressed_extents()"), if
an async extent compressed but failed to find enough space, we changed
from falling back to an uncompressed write to just failing the write
altogether. The principle was that if there's not enough space to write
the compressed version of the data, there can't possibly be enough space
to write the larger, uncompressed version of the data.

However, this isn't necessarily true: due to fragmentation, there could
be enough discontiguous free blocks to write the uncompressed version,
but not enough contiguous free blocks to write the smaller but
unsplittable compressed version.

This has occurred to an internal workload which relied on write()'s
return value indicating there was space. While rare, it has happened a
few times.

Thus, in order to prevent early ENOSPC, re-add a fallback to
uncompressed writing.

Fixes: b4ccace878f4 ("btrfs: refactor submit_compressed_extents()")
Co-developed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 fs/btrfs/inode.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Comments

Qu Wenruo April 6, 2024, 9:02 a.m. UTC | #1
在 2024/4/6 19:15, Sweet Tea Dorminy 写道:
> In commit b4ccace878f4 ("btrfs: refactor submit_compressed_extents()"), if
> an async extent compressed but failed to find enough space, we changed
> from falling back to an uncompressed write to just failing the write
> altogether. The principle was that if there's not enough space to write
> the compressed version of the data, there can't possibly be enough space
> to write the larger, uncompressed version of the data.
>
> However, this isn't necessarily true: due to fragmentation, there could
> be enough discontiguous free blocks to write the uncompressed version,
> but not enough contiguous free blocks to write the smaller but
> unsplittable compressed version.

You indeed got me, since for reserved writes we reserve the whole
compressed size, meanwhile for uncompressed fallback we can afford
minimal size of just a single sector, it indeed has its own chance.

>
> This has occurred to an internal workload which relied on write()'s
> return value indicating there was space. While rare, it has happened a
> few times.

Just curious, how rare is such case?

>
> Thus, in order to prevent early ENOSPC, re-add a fallback to
> uncompressed writing.
>
> Fixes: b4ccace878f4 ("btrfs: refactor submit_compressed_extents()")
> Co-developed-by: Neal Gompa <neal@gompa.dev>
> Signed-off-by: Neal Gompa <neal@gompa.dev>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

And the fix is really small and doesn't fallback to the old hellish
code, looks pretty good to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/inode.c | 13 ++++++-------
>   1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3442dedff53d..15a13e191ee7 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1148,13 +1148,13 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
>   				   0, *alloc_hint, &ins, 1, 1);
>   	if (ret) {
>   		/*
> -		 * Here we used to try again by going back to non-compressed
> -		 * path for ENOSPC.  But we can't reserve space even for
> -		 * compressed size, how could it work for uncompressed size
> -		 * which requires larger size?  So here we directly go error
> -		 * path.
> +		 * We can't reserve contiguous space for the compressed size.
> +		 * Unlikely, but it's possible that we could have enough
> +		 * non-contiguous space for the uncompressed size instead.  So
> +		 * fall back to uncompressed.
>   		 */
> -		goto out_free;
> +		submit_uncompressed_range(inode, async_extent, locked_page);
> +		goto done;
>   	}
>
>   	/* Here we're doing allocation and writeback of the compressed pages */
> @@ -1206,7 +1206,6 @@ static void submit_one_async_extent(struct async_chunk *async_chunk,
>   out_free_reserve:
>   	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
>   	btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
> -out_free:
>   	mapping_set_error(inode->vfs_inode.i_mapping, -EIO);
>   	extent_clear_unlock_delalloc(inode, start, end,
>   				     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |
David Sterba April 9, 2024, 11:27 a.m. UTC | #2
On Sat, Apr 06, 2024 at 04:45:02AM -0400, Sweet Tea Dorminy wrote:
> In commit b4ccace878f4 ("btrfs: refactor submit_compressed_extents()"), if
> an async extent compressed but failed to find enough space, we changed
> from falling back to an uncompressed write to just failing the write
> altogether. The principle was that if there's not enough space to write
> the compressed version of the data, there can't possibly be enough space
> to write the larger, uncompressed version of the data.
> 
> However, this isn't necessarily true: due to fragmentation, there could
> be enough discontiguous free blocks to write the uncompressed version,
> but not enough contiguous free blocks to write the smaller but
> unsplittable compressed version.
> 
> This has occurred to an internal workload which relied on write()'s
> return value indicating there was space. While rare, it has happened a
> few times.
> 
> Thus, in order to prevent early ENOSPC, re-add a fallback to
> uncompressed writing.
> 
> Fixes: b4ccace878f4 ("btrfs: refactor submit_compressed_extents()")
> Co-developed-by: Neal Gompa <neal@gompa.dev>
> Signed-off-by: Neal Gompa <neal@gompa.dev>
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

Reviewed-by: David Sterba <dsterba@suse.com>
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3442dedff53d..15a13e191ee7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1148,13 +1148,13 @@  static void submit_one_async_extent(struct async_chunk *async_chunk,
 				   0, *alloc_hint, &ins, 1, 1);
 	if (ret) {
 		/*
-		 * Here we used to try again by going back to non-compressed
-		 * path for ENOSPC.  But we can't reserve space even for
-		 * compressed size, how could it work for uncompressed size
-		 * which requires larger size?  So here we directly go error
-		 * path.
+		 * We can't reserve contiguous space for the compressed size.
+		 * Unlikely, but it's possible that we could have enough
+		 * non-contiguous space for the uncompressed size instead.  So
+		 * fall back to uncompressed.
 		 */
-		goto out_free;
+		submit_uncompressed_range(inode, async_extent, locked_page);
+		goto done;
 	}
 
 	/* Here we're doing allocation and writeback of the compressed pages */
@@ -1206,7 +1206,6 @@  static void submit_one_async_extent(struct async_chunk *async_chunk,
 out_free_reserve:
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 	btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
-out_free:
 	mapping_set_error(inode->vfs_inode.i_mapping, -EIO);
 	extent_clear_unlock_delalloc(inode, start, end,
 				     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |