diff mbox series

[1/2] btrfs: warn on invalid slot in tree mod log rewind

Message ID ab57e130bc466300efe32f36e623e546e4cfa85d.1685474140.git.boris@bur.io (mailing list archive)
State New, archived
Headers show
Series btrfs: fix logical_to_ino panic in btrfs_map_bio | expand

Commit Message

Boris Burkov May 30, 2023, 7:22 p.m. UTC
The way that tree mod log tracks the ultimate length of the eb, the
variable 'n', eventually turns up the correct value, but at intermediate
steps during the rewind, n can be inaccurate as a representation of the
end of the eb. For example, it doesn't get updated on move rewinds, and
it does get updated for add/remove in the middle of the eb.

To detect cases with invalid moves, introduce a separate variable called
max_slot which tries to track the maximum valid slot in the rewind eb.
We can then warn if we do a move whose src range goes beyond the max
valid slot.

There is a commented caveat that it is possible to have this value be an
overestimate due to the challenge of properly handling 'add' operations
in the middle of the eb, but in practice it doesn't cause enough of a
problem to throw out the max idea in favor of tracking every valid slot.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/tree-mod-log.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Filipe Manana May 31, 2023, 12:37 p.m. UTC | #1
On Tue, May 30, 2023 at 8:52 PM Boris Burkov <boris@bur.io> wrote:
>
> The way that tree mod log tracks the ultimate length of the eb, the
> variable 'n', eventually turns up the correct value, but at intermediate
> steps during the rewind, n can be inaccurate as a representation of the
> end of the eb. For example, it doesn't get updated on move rewinds, and
> it does get updated for add/remove in the middle of the eb.
>
> To detect cases with invalid moves, introduce a separate variable called
> max_slot which tries to track the maximum valid slot in the rewind eb.
> We can then warn if we do a move whose src range goes beyond the max
> valid slot.
>
> There is a commented caveat that it is possible to have this value be an
> overestimate due to the challenge of properly handling 'add' operations
> in the middle of the eb, but in practice it doesn't cause enough of a
> problem to throw out the max idea in favor of tracking every valid slot.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/tree-mod-log.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
> index a555baa0143a..1b1ae9ce9d1e 100644
> --- a/fs/btrfs/tree-mod-log.c
> +++ b/fs/btrfs/tree-mod-log.c
> @@ -664,8 +664,10 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
>         unsigned long o_dst;
>         unsigned long o_src;
>         unsigned long p_size = sizeof(struct btrfs_key_ptr);
> +       u32 max_slot;
>
>         n = btrfs_header_nritems(eb);
> +       max_slot = n - 1;
>         read_lock(&fs_info->tree_mod_log_lock);
>         while (tm && tm->seq >= time_seq) {
>                 /*
> @@ -684,6 +686,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
>                         btrfs_set_node_ptr_generation(eb, tm->slot,
>                                                       tm->generation);
>                         n++;
> +                       if (max_slot == (u32)-1 || tm->slot > max_slot)
> +                               max_slot = tm->slot;
>                         break;
>                 case BTRFS_MOD_LOG_KEY_REPLACE:
>                         BUG_ON(tm->slot >= n);
> @@ -693,6 +697,15 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
>                                                       tm->generation);
>                         break;
>                 case BTRFS_MOD_LOG_KEY_ADD:
> +                       /*
> +                        * It is possible we could have already removed keys behind the known
> +                        * max slot, so this will be an overestimate. In practice, the copy
> +                        * operation inserts them in increasing order, and overestimating just
> +                        * means we miss some warnings, so it's OK. It isn't worth carefully
> +                        * tracking the full array of valid slots to check against when moving.
> +                        */
> +                       if (tm->slot == max_slot)
> +                               max_slot--;
>                         /* if a move operation is needed it's in the log */
>                         n--;
>                         break;
> @@ -701,6 +714,12 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
>                         o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
>                         memmove_extent_buffer(eb, o_dst, o_src,
>                                               tm->move.nr_items * p_size);
> +                       WARN((tm->move.dst_slot + tm->move.nr_items - 1 > max_slot) ||
> +                            (max_slot == (u32)-1 && tm->move.nr_items > 0),

Why the: "tm->move.nr_items > 0" ?
It's actually a bug, or in the best case a pointless operation that
should not have been logged, to have a move operation for 0 items.

> +                            "Move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %u\n",
> +                            eb->start, tm->slot, tm->move.dst_slot, tm->move.nr_items,
> +                            tm->seq, n, max_slot);

Why trigger the warning after doing the memmove, and not before?
I would say it's preferable, because in case the bad memmove triggers
a crash/panic, we get the warning logged with some useful information.

It's also possibly better to log using the btrfs_warn() helper, as it
prints information about the affected filesystem, etc.
So like a: if (WARN_ON(conditions)) btrfs_warn()

Thanks.


> +                       max_slot = tm->slot + tm->move.nr_items - 1;
>                         break;
>                 case BTRFS_MOD_LOG_ROOT_REPLACE:
>                         /*
> --
> 2.40.1
>
Boris Burkov May 31, 2023, 2:51 p.m. UTC | #2
On Wed, May 31, 2023 at 01:37:32PM +0100, Filipe Manana wrote:
> On Tue, May 30, 2023 at 8:52 PM Boris Burkov <boris@bur.io> wrote:
> >
> > The way that tree mod log tracks the ultimate length of the eb, the
> > variable 'n', eventually turns up the correct value, but at intermediate
> > steps during the rewind, n can be inaccurate as a representation of the
> > end of the eb. For example, it doesn't get updated on move rewinds, and
> > it does get updated for add/remove in the middle of the eb.
> >
> > To detect cases with invalid moves, introduce a separate variable called
> > max_slot which tries to track the maximum valid slot in the rewind eb.
> > We can then warn if we do a move whose src range goes beyond the max
> > valid slot.
> >
> > There is a commented caveat that it is possible to have this value be an
> > overestimate due to the challenge of properly handling 'add' operations
> > in the middle of the eb, but in practice it doesn't cause enough of a
> > problem to throw out the max idea in favor of tracking every valid slot.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> >  fs/btrfs/tree-mod-log.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
> > index a555baa0143a..1b1ae9ce9d1e 100644
> > --- a/fs/btrfs/tree-mod-log.c
> > +++ b/fs/btrfs/tree-mod-log.c
> > @@ -664,8 +664,10 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> >         unsigned long o_dst;
> >         unsigned long o_src;
> >         unsigned long p_size = sizeof(struct btrfs_key_ptr);
> > +       u32 max_slot;
> >
> >         n = btrfs_header_nritems(eb);
> > +       max_slot = n - 1;
> >         read_lock(&fs_info->tree_mod_log_lock);
> >         while (tm && tm->seq >= time_seq) {
> >                 /*
> > @@ -684,6 +686,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> >                         btrfs_set_node_ptr_generation(eb, tm->slot,
> >                                                       tm->generation);
> >                         n++;
> > +                       if (max_slot == (u32)-1 || tm->slot > max_slot)
> > +                               max_slot = tm->slot;
> >                         break;
> >                 case BTRFS_MOD_LOG_KEY_REPLACE:
> >                         BUG_ON(tm->slot >= n);
> > @@ -693,6 +697,15 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> >                                                       tm->generation);
> >                         break;
> >                 case BTRFS_MOD_LOG_KEY_ADD:
> > +                       /*
> > +                        * It is possible we could have already removed keys behind the known
> > +                        * max slot, so this will be an overestimate. In practice, the copy
> > +                        * operation inserts them in increasing order, and overestimating just
> > +                        * means we miss some warnings, so it's OK. It isn't worth carefully
> > +                        * tracking the full array of valid slots to check against when moving.
> > +                        */
> > +                       if (tm->slot == max_slot)
> > +                               max_slot--;
> >                         /* if a move operation is needed it's in the log */
> >                         n--;
> >                         break;
> > @@ -701,6 +714,12 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
> >                         o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
> >                         memmove_extent_buffer(eb, o_dst, o_src,
> >                                               tm->move.nr_items * p_size);
> > +                       WARN((tm->move.dst_slot + tm->move.nr_items - 1 > max_slot) ||
> > +                            (max_slot == (u32)-1 && tm->move.nr_items > 0),
> 
> Why the: "tm->move.nr_items > 0" ?
> It's actually a bug, or in the best case a pointless operation that
> should not have been logged, to have a move operation for 0 items.

My thinking was that max_slot models the maximum valid slot with
(u32)-1 meaning "no slot is valid". But a move of size 0 doesn't read
any slot so doesn't constitute a bug per-se, just a dumb no-op tree mod
log entry. I can still warn on it, if you think that's better.

> 
> > +                            "Move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %u\n",
> > +                            eb->start, tm->slot, tm->move.dst_slot, tm->move.nr_items,
> > +                            tm->seq, n, max_slot);
> 
> Why trigger the warning after doing the memmove, and not before?
> I would say it's preferable, because in case the bad memmove triggers
> a crash/panic, we get the warning logged with some useful information.
> 
> It's also possibly better to log using the btrfs_warn() helper, as it
> prints information about the affected filesystem, etc.
> So like a: if (WARN_ON(conditions)) btrfs_warn()

Good points, thanks for the review.

> 
> Thanks.
> 
> 
> > +                       max_slot = tm->slot + tm->move.nr_items - 1;
> >                         break;
> >                 case BTRFS_MOD_LOG_ROOT_REPLACE:
> >                         /*
> > --
> > 2.40.1
> >
diff mbox series

Patch

diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index a555baa0143a..1b1ae9ce9d1e 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -664,8 +664,10 @@  static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
 	unsigned long o_dst;
 	unsigned long o_src;
 	unsigned long p_size = sizeof(struct btrfs_key_ptr);
+	u32 max_slot;
 
 	n = btrfs_header_nritems(eb);
+	max_slot = n - 1;
 	read_lock(&fs_info->tree_mod_log_lock);
 	while (tm && tm->seq >= time_seq) {
 		/*
@@ -684,6 +686,8 @@  static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
 			btrfs_set_node_ptr_generation(eb, tm->slot,
 						      tm->generation);
 			n++;
+			if (max_slot == (u32)-1 || tm->slot > max_slot)
+				max_slot = tm->slot;
 			break;
 		case BTRFS_MOD_LOG_KEY_REPLACE:
 			BUG_ON(tm->slot >= n);
@@ -693,6 +697,15 @@  static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
 						      tm->generation);
 			break;
 		case BTRFS_MOD_LOG_KEY_ADD:
+			/*
+			 * It is possible we could have already removed keys behind the known
+			 * max slot, so this will be an overestimate. In practice, the copy
+			 * operation inserts them in increasing order, and overestimating just
+			 * means we miss some warnings, so it's OK. It isn't worth carefully
+			 * tracking the full array of valid slots to check against when moving.
+			 */
+			if (tm->slot == max_slot)
+				max_slot--;
 			/* if a move operation is needed it's in the log */
 			n--;
 			break;
@@ -701,6 +714,12 @@  static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
 			o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
 			memmove_extent_buffer(eb, o_dst, o_src,
 					      tm->move.nr_items * p_size);
+			WARN((tm->move.dst_slot + tm->move.nr_items - 1 > max_slot) ||
+			     (max_slot == (u32)-1 && tm->move.nr_items > 0),
+			     "Move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %u\n",
+			     eb->start, tm->slot, tm->move.dst_slot, tm->move.nr_items,
+			     tm->seq, n, max_slot);
+			max_slot = tm->slot + tm->move.nr_items - 1;
 			break;
 		case BTRFS_MOD_LOG_ROOT_REPLACE:
 			/*