diff mbox series

[1/4] btrfs: fix incorrect splitting in btrfs_drop_extent_map_range

Message ID e12c86889b1a64879ce6c6cf6f6d315d577295a7.1692305624.git.josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series Fix incorrect splitting logic in btrfs_drop_extent_map_range | expand

Commit Message

Josef Bacik Aug. 17, 2023, 8:57 p.m. UTC
In production we were seeing a variety of WARN_ON()'s in the extent_map
code, specifically in btrfs_drop_extent_map_range() when we have to call
add_extent_mapping() for our second split.

Consider the following extent map layout

PINNED
[0 16K)  [32K, 48K)

and then we call btrfs_drop_extent_map_range for [0, 36K), with
skip_pinned == true.  The initial loop will have

start = 0
end = 36K
len = 36K

we will find the [0, 16k) extent, but since we are pinned we will skip
it, which has this cod

	start = em_end;
	if (end != (u64)-1)
		len = start + len - em_end;

em_end here is 16K, so now the values are

start = 16K
len = 16K + 36K - 16K = 36K

len should instead be 20K.  This is a problem when we find the next
extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
however the code for the split looks like this

	split->start = start + len;
	split->len = em_end - (start + len);

In this case we have

em_end = 48K
split->start = 16K + 36K //this should be 16K + 20K
split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K

and now we have an invalid extent_map in the tree that potentially
overlaps other entries in the extent map.  Even in the non-overlapping
case we will have split->start set improperly, which will cause problems
with any block related calculations.

We don't actually need len in this loop, we can simply use end as our
end point, and only adjust start up when we find a pinned extent we need
to skip.

Adjust the logic to do this, which keeps us from inserting an invalid
extent map.

We only skip_pinned in the relocation case, so this is relatively rare,
except in the case where you are running relocation a lot, which can
happen with auto relocation on.

Fixes: 55ef68990029 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_map.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Filipe Manana Aug. 18, 2023, 10:46 a.m. UTC | #1
On Thu, Aug 17, 2023 at 04:57:30PM -0400, Josef Bacik wrote:
> In production we were seeing a variety of WARN_ON()'s in the extent_map
> code, specifically in btrfs_drop_extent_map_range() when we have to call
> add_extent_mapping() for our second split.
> 
> Consider the following extent map layout
> 
> PINNED
> [0 16K)  [32K, 48K)
> 
> and then we call btrfs_drop_extent_map_range for [0, 36K), with
> skip_pinned == true.  The initial loop will have
> 
> start = 0
> end = 36K
> len = 36K
> 
> we will find the [0, 16k) extent, but since we are pinned we will skip
> it, which has this cod
> 
> 	start = em_end;
> 	if (end != (u64)-1)
> 		len = start + len - em_end;
> 
> em_end here is 16K, so now the values are
> 
> start = 16K
> len = 16K + 36K - 16K = 36K
> 
> len should instead be 20K.  This is a problem when we find the next
> extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
> however the code for the split looks like this
> 
> 	split->start = start + len;
> 	split->len = em_end - (start + len);
> 
> In this case we have
> 
> em_end = 48K
> split->start = 16K + 36K //this should be 16K + 20K
> split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
> 
> and now we have an invalid extent_map in the tree that potentially
> overlaps other entries in the extent map.  Even in the non-overlapping
> case we will have split->start set improperly, which will cause problems
> with any block related calculations.
> 
> We don't actually need len in this loop, we can simply use end as our
> end point, and only adjust start up when we find a pinned extent we need
> to skip.
> 
> Adjust the logic to do this, which keeps us from inserting an invalid
> extent map.
> 
> We only skip_pinned in the relocation case, so this is relatively rare,
> except in the case where you are running relocation a lot, which can
> happen with auto relocation on.
> 
> Fixes: 55ef68990029 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/extent_map.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index 0cdb3e86f29b..a6d8368ed0ed 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -760,8 +760,6 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>  
>  		if (skip_pinned && test_bit(EXTENT_FLAG_PINNED, &em->flags)) {
>  			start = em_end;
> -			if (end != (u64)-1)
> -				len = start + len - em_end;
>  			goto next;
>  		}
>  
> @@ -829,8 +827,8 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>  				if (!split)
>  					goto remove_em;
>  			}
> -			split->start = start + len;
> -			split->len = em_end - (start + len);
> +			split->start = end;
> +			split->len = em_end - end;
>  			split->block_start = em->block_start;
>  			split->flags = flags;
>  			split->compress_type = em->compress_type;
> -- 
> 2.26.3
>
diff mbox series

Patch

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 0cdb3e86f29b..a6d8368ed0ed 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -760,8 +760,6 @@  void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 
 		if (skip_pinned && test_bit(EXTENT_FLAG_PINNED, &em->flags)) {
 			start = em_end;
-			if (end != (u64)-1)
-				len = start + len - em_end;
 			goto next;
 		}
 
@@ -829,8 +827,8 @@  void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
 				if (!split)
 					goto remove_em;
 			}
-			split->start = start + len;
-			split->len = em_end - (start + len);
+			split->start = end;
+			split->len = em_end - end;
 			split->block_start = em->block_start;
 			split->flags = flags;
 			split->compress_type = em->compress_type;