[v2,1/2] btrfs: warn on invalid slot in tree mod log rewind

Message ID	bc3d9ec0aac7a1514165170c5fb73ed8f5d3a68b.1685546114.git.boris@bur.io (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> Feedback-ID: i083147f8:Fastmail From: Boris Burkov <boris@bur.io> To: linux-btrfs@vger.kernel.org, kernel-team@fb.com, fdmanana@kernel.org Subject: [PATCH v2 1/2] btrfs: warn on invalid slot in tree mod log rewind Date: Wed, 31 May 2023 09:22:29 -0700 Message-Id: <bc3d9ec0aac7a1514165170c5fb73ed8f5d3a68b.1685546114.git.boris@bur.io> In-Reply-To: <cover.1685546114.git.boris@bur.io> References: <cover.1685546114.git.boris@bur.io> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: fix logical_to_ino panic in btrfs_map_bio \| expand [v2,0/2] btrfs: fix logical_to_ino panic in btrfs_map_bio [v2,1/2] btrfs: warn on invalid slot in tree mod log rewind [v2,2/2] btrfs: insert tree mod log move in push_node_left

Message ID

bc3d9ec0aac7a1514165170c5fb73ed8f5d3a68b.1685546114.git.boris@bur.io (mailing list archive)

State

New, archived

Headers

Feedback-ID: i083147f8:Fastmail
From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
        fdmanana@kernel.org
Subject: [PATCH v2 1/2] btrfs: warn on invalid slot in tree mod log rewind
Date: Wed, 31 May 2023 09:22:29 -0700
Message-Id: 
 <bc3d9ec0aac7a1514165170c5fb73ed8f5d3a68b.1685546114.git.boris@bur.io>
In-Reply-To: <cover.1685546114.git.boris@bur.io>
References: <cover.1685546114.git.boris@bur.io>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

btrfs: fix logical_to_ino panic in btrfs_map_bio | expand

Commit Message

Boris Burkov May 31, 2023, 4:22 p.m. UTC

The way that tree mod log tracks the ultimate length of the eb, the
variable 'n', eventually turns up the correct value, but at intermediate
steps during the rewind, n can be inaccurate as a representation of the
end of the eb. For example, it doesn't get updated on move rewinds, and
it does get updated for add/remove in the middle of the eb.

To detect cases with invalid moves, introduce a separate variable called
max_slot which tries to track the maximum valid slot in the rewind eb.
We can then warn if we do a move whose src range goes beyond the max
valid slot.

There is a commented caveat that it is possible to have this value be an
overestimate due to the challenge of properly handling 'add' operations
in the middle of the eb, but in practice it doesn't cause enough of a
problem to throw out the max idea in favor of tracking every valid slot.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/tree-mod-log.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Filipe Manana June 1, 2023, 9:28 a.m. UTC | #1

On Wed, May 31, 2023 at 5:22 PM Boris Burkov <boris@bur.io> wrote:
>
> The way that tree mod log tracks the ultimate length of the eb, the
> variable 'n', eventually turns up the correct value, but at intermediate
> steps during the rewind, n can be inaccurate as a representation of the
> end of the eb. For example, it doesn't get updated on move rewinds, and
> it does get updated for add/remove in the middle of the eb.
>
> To detect cases with invalid moves, introduce a separate variable called
> max_slot which tries to track the maximum valid slot in the rewind eb.
> We can then warn if we do a move whose src range goes beyond the max
> valid slot.
>
> There is a commented caveat that it is possible to have this value be an
> overestimate due to the challenge of properly handling 'add' operations
> in the middle of the eb, but in practice it doesn't cause enough of a
> problem to throw out the max idea in favor of tracking every valid slot.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/tree-mod-log.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
> index a555baa0143a..43f2ffa6f44d 100644
> --- a/fs/btrfs/tree-mod-log.c
> +++ b/fs/btrfs/tree-mod-log.c
> @@ -664,8 +664,10 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
>         unsigned long o_dst;
>         unsigned long o_src;
>         unsigned long p_size = sizeof(struct btrfs_key_ptr);
> +       u32 max_slot;
>
>         n = btrfs_header_nritems(eb);
> +       max_slot = n - 1;
>         read_lock(&fs_info->tree_mod_log_lock);
>         while (tm && tm->seq >= time_seq) {
>                 /*
> @@ -684,6 +686,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
>                         btrfs_set_node_ptr_generation(eb, tm->slot,
>                                                       tm->generation);
>                         n++;
> +                       if (max_slot == (u32)-1 || tm->slot > max_slot)

It would be better to document the intention of the special value
(u32)-1, and perhaps also let the type be
a signed integer and just check for -1 as meaning "no valid slot".

> +                               max_slot = tm->slot;
>                         break;
>                 case BTRFS_MOD_LOG_KEY_REPLACE:
>                         BUG_ON(tm->slot >= n);
> @@ -693,12 +697,30 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
>                                                       tm->generation);
>                         break;
>                 case BTRFS_MOD_LOG_KEY_ADD:
> +                       /*
> +                        * It is possible we could have already removed keys behind the known
> +                        * max slot, so this will be an overestimate. In practice, the copy
> +                        * operation inserts them in increasing order, and overestimating just
> +                        * means we miss some warnings, so it's OK. It isn't worth carefully
> +                        * tracking the full array of valid slots to check against when moving.
> +                        */
> +                       if (tm->slot == max_slot)
> +                               max_slot--;
>                         /* if a move operation is needed it's in the log */
>                         n--;
>                         break;
>                 case BTRFS_MOD_LOG_MOVE_KEYS:
>                         o_dst = btrfs_node_key_ptr_offset(eb, tm->slot);
>                         o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
> +                       if (WARN_ON((tm->move.dst_slot + tm->move.nr_items - 1 > max_slot) ||
> +                                   (max_slot == (u32)-1 && tm->move.nr_items > 0))) {

Like v1, I'm still a bit at odds with the check for  tm->move.nr_items > 0

Such a case should never be possible, we should assert that
separately, plus when I read this it makes me think:
why do we check it only when max_slot is (u32)-1 but not for the
previous condition?
It makes me think what would happen to the first condition if it's 0
and dst_slot happens to be 0, we have an underflow, etc

Maybe add an ASSERT(tm->move.nr_items > 0) before.

Thanks.


> +                               btrfs_warn(fs_info,
> +                                          "Move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %u\n",
> +                                          eb->start, tm->slot,
> +                                          tm->move.dst_slot, tm->move.nr_items,
> +                                          tm->seq, n, max_slot);
> +
> +                       }
>                         memmove_extent_buffer(eb, o_dst, o_src,
>                                               tm->move.nr_items * p_size);
>                         break;
> --
> 2.40.1
>

Boris Burkov June 1, 2023, 3:04 p.m. UTC | #2

On Thu, Jun 01, 2023 at 10:28:58AM +0100, Filipe Manana wrote:
> On Wed, May 31, 2023 at 5:22 PM Boris Burkov <boris@bur.io> wrote:
> >
> > The way that tree mod log tracks the ultimate length of the eb, the
> > variable 'n', eventually turns up the correct value, but at intermediate
> > steps during the rewind, n can be inaccurate as a representation of the
> > end of the eb. For example, it doesn't get updated on move rewinds, and
> > it does get updated for add/remove in the middle of the eb.
> >
> > To detect cases with invalid moves, introduce a separate variable called
> > max_slot which tries to track the maximum valid slot in the rewind eb.
> > We can then warn if we do a move whose src range goes beyond the max
> > valid slot.
> >
> > There is a commented caveat that it is possible to have this value be an
> > overestimate due to the challenge of properly handling 'add' operations
> > in the middle of the eb, but in practice it doesn't cause enough of a
> > problem to throw out the max idea in favor of tracking every valid slot.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> >  fs/btrfs/tree-mod-log.c | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
> > index a555baa0143a..43f2ffa6f44d 100644
> > --- a/fs/btrfs/tree-mod-log.c
> > +++ b/fs/btrfs/tree-mod-log.c
> > @@ -664,8 +664,10 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> >         unsigned long o_dst;
> >         unsigned long o_src;
> >         unsigned long p_size = sizeof(struct btrfs_key_ptr);
> > +       u32 max_slot;
> >
> >         n = btrfs_header_nritems(eb);
> > +       max_slot = n - 1;
> >         read_lock(&fs_info->tree_mod_log_lock);
> >         while (tm && tm->seq >= time_seq) {
> >                 /*
> > @@ -684,6 +686,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> >                         btrfs_set_node_ptr_generation(eb, tm->slot,
> >                                                       tm->generation);
> >                         n++;
> > +                       if (max_slot == (u32)-1 || tm->slot > max_slot)
> 
> It would be better to document the intention of the special value
> (u32)-1, and perhaps also let the type be
> a signed integer and just check for -1 as meaning "no valid slot".

I really like the signed integer idea, I'll do that for v3.

> 
> > +                               max_slot = tm->slot;
> >                         break;
> >                 case BTRFS_MOD_LOG_KEY_REPLACE:
> >                         BUG_ON(tm->slot >= n);
> > @@ -693,12 +697,30 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> >                                                       tm->generation);
> >                         break;
> >                 case BTRFS_MOD_LOG_KEY_ADD:
> > +                       /*
> > +                        * It is possible we could have already removed keys behind the known
> > +                        * max slot, so this will be an overestimate. In practice, the copy
> > +                        * operation inserts them in increasing order, and overestimating just
> > +                        * means we miss some warnings, so it's OK. It isn't worth carefully
> > +                        * tracking the full array of valid slots to check against when moving.
> > +                        */
> > +                       if (tm->slot == max_slot)
> > +                               max_slot--;
> >                         /* if a move operation is needed it's in the log */
> >                         n--;
> >                         break;
> >                 case BTRFS_MOD_LOG_MOVE_KEYS:
> >                         o_dst = btrfs_node_key_ptr_offset(eb, tm->slot);
> >                         o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
> > +                       if (WARN_ON((tm->move.dst_slot + tm->move.nr_items - 1 > max_slot) ||
> > +                                   (max_slot == (u32)-1 && tm->move.nr_items > 0))) {
> 
> Like v1, I'm still a bit at odds with the check for  tm->move.nr_items > 0
> 
> Such a case should never be possible, we should assert that
> separately, plus when I read this it makes me think:
> why do we check it only when max_slot is (u32)-1 but not for the
> previous condition?
> It makes me think what would happen to the first condition if it's 0
> and dst_slot happens to be 0, we have an underflow, etc

Good point, I am actually assuming it is >0 for that other calculation,
so 0 is not as "valid" (if obviously pointless) of a value as I argued.

> 
> Maybe add an ASSERT(tm->move.nr_items > 0) before.

I think this is a good idea, and I'll mix it into the warning too. I
think the only thing I would "insist" on is that I don't want to panic
non debug builds on this condition, even if I can't think of a way it
would happen.

> 
> Thanks.
> 
> 
> > +                               btrfs_warn(fs_info,
> > +                                          "Move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %u\n",
> > +                                          eb->start, tm->slot,
> > +                                          tm->move.dst_slot, tm->move.nr_items,
> > +                                          tm->seq, n, max_slot);
> > +
> > +                       }
> >                         memmove_extent_buffer(eb, o_dst, o_src,
> >                                               tm->move.nr_items * p_size);
> >                         break;
> > --
> > 2.40.1
> >

diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index a555baa0143a..43f2ffa6f44d 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -664,8 +664,10 @@  static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
 	unsigned long o_dst;
 	unsigned long o_src;
 	unsigned long p_size = sizeof(struct btrfs_key_ptr);
+	u32 max_slot;
 
 	n = btrfs_header_nritems(eb);
+	max_slot = n - 1;
 	read_lock(&fs_info->tree_mod_log_lock);
 	while (tm && tm->seq >= time_seq) {
 		/*
@@ -684,6 +686,8 @@  static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
 			btrfs_set_node_ptr_generation(eb, tm->slot,
 						      tm->generation);
 			n++;
+			if (max_slot == (u32)-1 || tm->slot > max_slot)
+				max_slot = tm->slot;
 			break;
 		case BTRFS_MOD_LOG_KEY_REPLACE:
 			BUG_ON(tm->slot >= n);
@@ -693,12 +697,30 @@  static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
 						      tm->generation);
 			break;
 		case BTRFS_MOD_LOG_KEY_ADD:
+			/*
+			 * It is possible we could have already removed keys behind the known
+			 * max slot, so this will be an overestimate. In practice, the copy
+			 * operation inserts them in increasing order, and overestimating just
+			 * means we miss some warnings, so it's OK. It isn't worth carefully
+			 * tracking the full array of valid slots to check against when moving.
+			 */
+			if (tm->slot == max_slot)
+				max_slot--;
 			/* if a move operation is needed it's in the log */
 			n--;
 			break;
 		case BTRFS_MOD_LOG_MOVE_KEYS:
 			o_dst = btrfs_node_key_ptr_offset(eb, tm->slot);
 			o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
+			if (WARN_ON((tm->move.dst_slot + tm->move.nr_items - 1 > max_slot) ||
+				    (max_slot == (u32)-1 && tm->move.nr_items > 0))) {
+				btrfs_warn(fs_info,
+					   "Move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %u\n",
+					   eb->start, tm->slot,
+					   tm->move.dst_slot, tm->move.nr_items,
+					   tm->seq, n, max_slot);
+
+			}
 			memmove_extent_buffer(eb, o_dst, o_src,
 					      tm->move.nr_items * p_size);
 			break;

[v2,1/2] btrfs: warn on invalid slot in tree mod log rewind

Commit Message

Comments

Patch