diff mbox series

btrfs: set start on clone before calling copy_extent_buffer_full

Message ID 5062a746cf151bfbc217c00e149740956e748abb.1713073723.git.josef@toxicpanda.com (mailing list archive)
State New
Headers show
Series btrfs: set start on clone before calling copy_extent_buffer_full | expand

Commit Message

Josef Bacik April 14, 2024, 5:49 a.m. UTC
Our subpage testing started hanging on generic/560 and I bisected it
down to Fixes: 1cab1375ba6d ("btrfs: reuse cloned extent buffer during
fiemap to avoid re-allocations").  This is subtle because we use
eb->start to figure out where in the folio we're copying to when we're
subpage, as our ->start may refer to an area inside of the folio.

We were copying with ->start set to the previous value, and then
re-setting ->start in order to be used later on by fiemap.  However this
changed the offset into the eb that we would read from, which would
cause us to not emit EOF sometimes for fiemap.  Thanks to a bug in the
duperemove that the CI vms are using this manifested as a hung test.

Fix this by setting start before we co copy_extent_buffer_full to make
sure that we're copying into the same offset inside of the folio that we
will read from later.

With this fix we now pass generic/560 on our subpage tests.

Fixes: 1cab1375ba6d ("btrfs: reuse cloned extent buffer during fiemap toavoid re-allocations")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Filipe Manana April 14, 2024, 10:22 a.m. UTC | #1
On Sun, Apr 14, 2024 at 6:50 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Our subpage testing started hanging on generic/560 and I bisected it
> down to Fixes: 1cab1375ba6d ("btrfs: reuse cloned extent buffer during

The "Fixes:" should be here, seems like a copy paste mistake from the
bottom of the change log.

> fiemap to avoid re-allocations").  This is subtle because we use
> eb->start to figure out where in the folio we're copying to when we're
> subpage, as our ->start may refer to an area inside of the folio.

Where is that exactly? Is it at get_eb_offset_in_folio()? If it's that
I don't see why it's subpage specific.
Can you mention where exactly in the change log?

>
> We were copying with ->start set to the previous value, and then
> re-setting ->start in order to be used later on by fiemap.  However this
> changed the offset into the eb that we would read from, which would

I don't understand this part that says: "we would read from".
We would read what and where? Are you saying we need to read from the
destination eb during copy_full_extent_buffer()?
Where is that?

Does this only affect copy_extent_buffer_full()? Doesn't it affect
copy_extent_buffer() too? If not, why?
Can you please give all these details in the changelog?

> cause us to not emit EOF sometimes for fiemap.  Thanks to a bug in the
> duperemove that the CI vms are using this manifested as a hung test.
>
> Fix this by setting start before we co copy_extent_buffer_full to make
> sure that we're copying into the same offset inside of the folio that we
> will read from later.
>
> With this fix we now pass generic/560 on our subpage tests.
>
> Fixes: 1cab1375ba6d ("btrfs: reuse cloned extent buffer during fiemap toavoid re-allocations")

Missing a space at "toavoid", in the commit's subject there's actually a space.

> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent_io.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 49f7161a6578..a3d0befaa461 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2809,13 +2809,17 @@ static int fiemap_next_leaf_item(struct btrfs_inode *inode, struct btrfs_path *p
>                 goto out;
>         }
>
> -       /* See the comment at fiemap_search_slot() about why we clone. */
> -       copy_extent_buffer_full(clone, path->nodes[0]);
>         /*
>          * Important to preserve the start field, for the optimizations when
>          * checking if extents are shared (see extent_fiemap()).
> +        *
> +        * Additionally it needs to be set before we call
> +        * copy_extent_buffer_full because for subpagesize we need to make sure

Can we get the () in front of the function name, so that it's
consistent with the rest of the comment (paragraph above)?

> +        * we have the correctly calculated offset.
>          */
>         clone->start = path->nodes[0]->start;
> +       /* See the comment at fiemap_search_slot() about why we clone. */
> +       copy_extent_buffer_full(clone, path->nodes[0]);

Ok so this is a landmine and I doubt people will remember to do this
when using copy_extent_buffer_full() in the future.
If this is really needed, why not do that at copy_extent_buffer_full()
itself? This would be more future proof.

Why wouldn't we need this in other places that use
copy_extent_buffer_full() too?

For example in the tree log code where we use a dummy eb and we don't
update the eb's ->start because we don't care about it. Why is it not
a problem there? Or you missed it?

Or another example at btrfs_copy_root(). where we can't obviously set
the destination eb's ->start to that of the source eb. Why don't we
run into any problem there?

I'm puzzled at why we need to this ->start update only in this place,
why the ->start of the destination eb of copy_extent_buffer_full() is
used or why it causes a problem, why is it subpage specific

Thanks.


>
>         slot = path->slots[0];
>         btrfs_release_path(path);
> --
> 2.43.0
>
>
Josef Bacik April 14, 2024, 12:54 p.m. UTC | #2
On Sun, Apr 14, 2024 at 11:22:17AM +0100, Filipe Manana wrote:
> On Sun, Apr 14, 2024 at 6:50 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > Our subpage testing started hanging on generic/560 and I bisected it
> > down to Fixes: 1cab1375ba6d ("btrfs: reuse cloned extent buffer during
> 
> The "Fixes:" should be here, seems like a copy paste mistake from the
> bottom of the change log.
> 
> > fiemap to avoid re-allocations").  This is subtle because we use
> > eb->start to figure out where in the folio we're copying to when we're
> > subpage, as our ->start may refer to an area inside of the folio.
> 
> Where is that exactly? Is it at get_eb_offset_in_folio()? If it's that
> I don't see why it's subpage specific.
> Can you mention where exactly in the change log?
> 
> >
> > We were copying with ->start set to the previous value, and then
> > re-setting ->start in order to be used later on by fiemap.  However this
> > changed the offset into the eb that we would read from, which would
> 
> I don't understand this part that says: "we would read from".
> We would read what and where? Are you saying we need to read from the
> destination eb during copy_full_extent_buffer()?
> Where is that?
> 
> Does this only affect copy_extent_buffer_full()? Doesn't it affect
> copy_extent_buffer() too? If not, why?
> Can you please give all these details in the changelog?
> 
> > cause us to not emit EOF sometimes for fiemap.  Thanks to a bug in the
> > duperemove that the CI vms are using this manifested as a hung test.
> >
> > Fix this by setting start before we co copy_extent_buffer_full to make
> > sure that we're copying into the same offset inside of the folio that we
> > will read from later.
> >
> > With this fix we now pass generic/560 on our subpage tests.
> >
> > Fixes: 1cab1375ba6d ("btrfs: reuse cloned extent buffer during fiemap toavoid re-allocations")
> 
> Missing a space at "toavoid", in the commit's subject there's actually a space.
> 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/extent_io.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 49f7161a6578..a3d0befaa461 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -2809,13 +2809,17 @@ static int fiemap_next_leaf_item(struct btrfs_inode *inode, struct btrfs_path *p
> >                 goto out;
> >         }
> >
> > -       /* See the comment at fiemap_search_slot() about why we clone. */
> > -       copy_extent_buffer_full(clone, path->nodes[0]);
> >         /*
> >          * Important to preserve the start field, for the optimizations when
> >          * checking if extents are shared (see extent_fiemap()).
> > +        *
> > +        * Additionally it needs to be set before we call
> > +        * copy_extent_buffer_full because for subpagesize we need to make sure
> 
> Can we get the () in front of the function name, so that it's
> consistent with the rest of the comment (paragraph above)?
> 
> > +        * we have the correctly calculated offset.
> >          */
> >         clone->start = path->nodes[0]->start;
> > +       /* See the comment at fiemap_search_slot() about why we clone. */
> > +       copy_extent_buffer_full(clone, path->nodes[0]);
> 
> Ok so this is a landmine and I doubt people will remember to do this
> when using copy_extent_buffer_full() in the future.
> If this is really needed, why not do that at copy_extent_buffer_full()
> itself? This would be more future proof.
> 
> Why wouldn't we need this in other places that use
> copy_extent_buffer_full() too?
> 
> For example in the tree log code where we use a dummy eb and we don't
> update the eb's ->start because we don't care about it. Why is it not
> a problem there? Or you missed it?
> 
> Or another example at btrfs_copy_root(). where we can't obviously set
> the destination eb's ->start to that of the source eb. Why don't we
> run into any problem there?
> 
> I'm puzzled at why we need to this ->start update only in this place,
> why the ->start of the destination eb of copy_extent_buffer_full() is
> used or why it causes a problem, why is it subpage specific

Sorry, 2am changelogs aren't great.

In __write_extent_buffer() we do

offset = get_eb_offset_in_folio(eb, start);

start is the logical offset into the eb we're copying to, in this case it's 0
because we're doing copy_extent_buffer_full().  get_eb_offset_in_folio() does
this

        return offset_in_folio(eb->folios[0], offset + eb->start);

and offset_in_folio() does this

        return start & (eb->folio_size - 1);

so in this case offset is 0, so we're just passing in eb->start.

With subpage ->start can be not folio_size aligned, so this would be some
arbitrary offset into the folio.

The other places aren't a problem because we don't change eb->start after we
do the copy.  This is a problem because we're re-using a cloned eb, so we do
this

// start is 4k on a 16k pagesize fs, so we're at the 4k offset into the folio
copy_extent_buffer_full();

// now start is at 8k offset into the folio
cloned->start = 8k;

Any subsequent reads from the eb, by which I mean things like
btrfs_item_key_to_cpu(), anything we read the members out of the leaf are going
to use the same get_eb_offset_in_folio() helper and now come up with a different
answer than where we copied into.

All other places are fine, because we don't change ->start after we've copied
data into it.  I'll update the changelog to include this information, thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 49f7161a6578..a3d0befaa461 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2809,13 +2809,17 @@  static int fiemap_next_leaf_item(struct btrfs_inode *inode, struct btrfs_path *p
 		goto out;
 	}
 
-	/* See the comment at fiemap_search_slot() about why we clone. */
-	copy_extent_buffer_full(clone, path->nodes[0]);
 	/*
 	 * Important to preserve the start field, for the optimizations when
 	 * checking if extents are shared (see extent_fiemap()).
+	 *
+	 * Additionally it needs to be set before we call
+	 * copy_extent_buffer_full because for subpagesize we need to make sure
+	 * we have the correctly calculated offset.
 	 */
 	clone->start = path->nodes[0]->start;
+	/* See the comment at fiemap_search_slot() about why we clone. */
+	copy_extent_buffer_full(clone, path->nodes[0]);
 
 	slot = path->slots[0];
 	btrfs_release_path(path);