diff mbox series

[v2,3/4] xfs: terminate perag iteration reliably on agcount

Message ID 20211012165203.1354826-4-bfoster@redhat.com (mailing list archive)
State Superseded, archived
Headers show
Series xfs: fix perag iteration raciness | expand

Commit Message

Brian Foster Oct. 12, 2021, 4:52 p.m. UTC
The for_each_perag_from() iteration macro relies on sb_agcount to
process every perag currently within EOFS from a given starting
point. It's perfectly valid to have perag structures beyond
sb_agcount, however, such as if a growfs is in progress. If a perag
loop happens to race with growfs in this manner, it will actually
attempt to process the post-EOFS perag where ->pag_agno ==
sb_agcount. This is reproduced by xfs/104 and manifests as the
following assert failure in superblock write verifier context:

 XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22

Update the corresponding macro to only process perags that are
within the current sb_agcount.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Darrick J. Wong Oct. 12, 2021, 7:08 p.m. UTC | #1
On Tue, Oct 12, 2021 at 12:52:02PM -0400, Brian Foster wrote:
> The for_each_perag_from() iteration macro relies on sb_agcount to
> process every perag currently within EOFS from a given starting
> point. It's perfectly valid to have perag structures beyond
> sb_agcount, however, such as if a growfs is in progress. If a perag
> loop happens to race with growfs in this manner, it will actually
> attempt to process the post-EOFS perag where ->pag_agno ==
> sb_agcount. This is reproduced by xfs/104 and manifests as the
> following assert failure in superblock write verifier context:
> 
>  XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
> 
> Update the corresponding macro to only process perags that are
> within the current sb_agcount.

Does this need a Fixes: tag?

Also ... should we be checking for agno <= agcount-1 for the initial
xfs_perag_get in the first for loop clause of for_each_perag_range?
I /think/ the answer is that the current users are careful enough to
check that race, but I haven't looked exhaustively.

Welcome back, by the way. :)

--D

> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index cf8baae2ba18..b8cc5017efba 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -142,7 +142,7 @@ struct xfs_perag *xfs_perag_next(
>  		(pag) = xfs_perag_next((pag), &(agno)))
>  
>  #define for_each_perag_from(mp, agno, pag) \
> -	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> +	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
>  
>  
>  #define for_each_perag(mp, agno, pag) \
> -- 
> 2.31.1
>
Brian Foster Oct. 14, 2021, 2:10 p.m. UTC | #2
On Tue, Oct 12, 2021 at 12:08:22PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 12, 2021 at 12:52:02PM -0400, Brian Foster wrote:
> > The for_each_perag_from() iteration macro relies on sb_agcount to
> > process every perag currently within EOFS from a given starting
> > point. It's perfectly valid to have perag structures beyond
> > sb_agcount, however, such as if a growfs is in progress. If a perag
> > loop happens to race with growfs in this manner, it will actually
> > attempt to process the post-EOFS perag where ->pag_agno ==
> > sb_agcount. This is reproduced by xfs/104 and manifests as the
> > following assert failure in superblock write verifier context:
> > 
> >  XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
> > 
> > Update the corresponding macro to only process perags that are
> > within the current sb_agcount.
> 
> Does this need a Fixes: tag?
> 

Probably. I briefly looked into this originally, saw that this code was
introduced/modified across a span of commits and skipped it because it
wasn't immediately clear which singular commit may have introduced the
bug(s). Since these are now separate patches, I'd probably go with
58d43a7e3263 ("xfs: pass perags around in fsmap data dev functions") for
this one (since it introduced the use of sb_agcount) and f250eedcf762
("xfs: make for_each_perag... a first class citizen") for the next
patch.

That said, technically we could probably refer to the latter for both of
these fixes as a suitable enough catchall for the intended purpose of
the Fixes tag. I suspect the fundamental problem actually exists in that
base patch because for_each_perag() iterates solely based on pag !=
NULL. It seems a little odd that the sb_agcount usage is not introduced
until a couple patches later, but I suppose that could just be
considered a dependency. In reality, it's probably unlikely to ever have
a stable kernel at that intermediate point of a rework series so it
might not matter much either way. I don't really have a preference one
way or the other. Your call..?

> Also ... should we be checking for agno <= agcount-1 for the initial
> xfs_perag_get in the first for loop clause of for_each_perag_range?
> I /think/ the answer is that the current users are careful enough to
> check that race, but I haven't looked exhaustively.
> 

Not sure I follow... for_each_perag_range() is a more generic variant
that doesn't know or care about sb_agcount. I think it should support
the ability to span an arbitrary range of perags regardless of
sb_agcount. Hm?

> Welcome back, by the way. :)
> 

Thanks!

Brian

> --D
> 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_ag.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > index cf8baae2ba18..b8cc5017efba 100644
> > --- a/fs/xfs/libxfs/xfs_ag.h
> > +++ b/fs/xfs/libxfs/xfs_ag.h
> > @@ -142,7 +142,7 @@ struct xfs_perag *xfs_perag_next(
> >  		(pag) = xfs_perag_next((pag), &(agno)))
> >  
> >  #define for_each_perag_from(mp, agno, pag) \
> > -	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > +	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
> >  
> >  
> >  #define for_each_perag(mp, agno, pag) \
> > -- 
> > 2.31.1
> > 
>
Darrick J. Wong Oct. 14, 2021, 4:46 p.m. UTC | #3
On Thu, Oct 14, 2021 at 10:10:36AM -0400, Brian Foster wrote:
> On Tue, Oct 12, 2021 at 12:08:22PM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 12, 2021 at 12:52:02PM -0400, Brian Foster wrote:
> > > The for_each_perag_from() iteration macro relies on sb_agcount to
> > > process every perag currently within EOFS from a given starting
> > > point. It's perfectly valid to have perag structures beyond
> > > sb_agcount, however, such as if a growfs is in progress. If a perag
> > > loop happens to race with growfs in this manner, it will actually
> > > attempt to process the post-EOFS perag where ->pag_agno ==
> > > sb_agcount. This is reproduced by xfs/104 and manifests as the
> > > following assert failure in superblock write verifier context:
> > > 
> > >  XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
> > > 
> > > Update the corresponding macro to only process perags that are
> > > within the current sb_agcount.
> > 
> > Does this need a Fixes: tag?
> > 
> 
> Probably. I briefly looked into this originally, saw that this code was
> introduced/modified across a span of commits and skipped it because it
> wasn't immediately clear which singular commit may have introduced the
> bug(s). Since these are now separate patches, I'd probably go with
> 58d43a7e3263 ("xfs: pass perags around in fsmap data dev functions") for
> this one (since it introduced the use of sb_agcount) and f250eedcf762
> ("xfs: make for_each_perag... a first class citizen") for the next
> patch.
> 
> That said, technically we could probably refer to the latter for both of
> these fixes as a suitable enough catchall for the intended purpose of
> the Fixes tag. I suspect the fundamental problem actually exists in that
> base patch because for_each_perag() iterates solely based on pag !=
> NULL. It seems a little odd that the sb_agcount usage is not introduced
> until a couple patches later, but I suppose that could just be
> considered a dependency. In reality, it's probably unlikely to ever have
> a stable kernel at that intermediate point of a rework series so it
> might not matter much either way. I don't really have a preference one
> way or the other. Your call..?

Those fixes tags seem like a reasonable breadcrumb for finding fixes.
I'll add them to the respective patches on commit.  So for this third
one:

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

> > Also ... should we be checking for agno <= agcount-1 for the initial
> > xfs_perag_get in the first for loop clause of for_each_perag_range?
> > I /think/ the answer is that the current users are careful enough to
> > check that race, but I haven't looked exhaustively.
> > 
> 
> Not sure I follow... for_each_perag_range() is a more generic variant
> that doesn't know or care about sb_agcount. I think it should support
> the ability to span an arbitrary range of perags regardless of
> sb_agcount. Hm?

Oh, I was idly wondering if these iterators ought to have one more
training wheel where the loop would be skipped entirely if you did
something buggy such as:

agno = mp->m_sb.sb_agcount;
/* time goes by */
for_each_perag_from(mp, agno...)
	/* stuff */

Normally that would be skipped since xfs_perag_get(sb_agcount) returns
NULL, except in the case that it's racing with growfs.  But, some
malfunction like this should be fairly easy to spot even in the common
case.

--D

> > Welcome back, by the way. :)
> > 
> 
> Thanks!
> 
> Brian
> 
> > --D
> > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_ag.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > > index cf8baae2ba18..b8cc5017efba 100644
> > > --- a/fs/xfs/libxfs/xfs_ag.h
> > > +++ b/fs/xfs/libxfs/xfs_ag.h
> > > @@ -142,7 +142,7 @@ struct xfs_perag *xfs_perag_next(
> > >  		(pag) = xfs_perag_next((pag), &(agno)))
> > >  
> > >  #define for_each_perag_from(mp, agno, pag) \
> > > -	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > > +	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
> > >  
> > >  
> > >  #define for_each_perag(mp, agno, pag) \
> > > -- 
> > > 2.31.1
> > > 
> > 
>
Brian Foster Oct. 14, 2021, 5:41 p.m. UTC | #4
On Thu, Oct 14, 2021 at 09:46:21AM -0700, Darrick J. Wong wrote:
> On Thu, Oct 14, 2021 at 10:10:36AM -0400, Brian Foster wrote:
> > On Tue, Oct 12, 2021 at 12:08:22PM -0700, Darrick J. Wong wrote:
> > > On Tue, Oct 12, 2021 at 12:52:02PM -0400, Brian Foster wrote:
> > > > The for_each_perag_from() iteration macro relies on sb_agcount to
> > > > process every perag currently within EOFS from a given starting
> > > > point. It's perfectly valid to have perag structures beyond
> > > > sb_agcount, however, such as if a growfs is in progress. If a perag
> > > > loop happens to race with growfs in this manner, it will actually
> > > > attempt to process the post-EOFS perag where ->pag_agno ==
> > > > sb_agcount. This is reproduced by xfs/104 and manifests as the
> > > > following assert failure in superblock write verifier context:
> > > > 
> > > >  XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
> > > > 
> > > > Update the corresponding macro to only process perags that are
> > > > within the current sb_agcount.
> > > 
> > > Does this need a Fixes: tag?
> > > 
> > 
> > Probably. I briefly looked into this originally, saw that this code was
> > introduced/modified across a span of commits and skipped it because it
> > wasn't immediately clear which singular commit may have introduced the
> > bug(s). Since these are now separate patches, I'd probably go with
> > 58d43a7e3263 ("xfs: pass perags around in fsmap data dev functions") for
> > this one (since it introduced the use of sb_agcount) and f250eedcf762
> > ("xfs: make for_each_perag... a first class citizen") for the next
> > patch.
> > 
> > That said, technically we could probably refer to the latter for both of
> > these fixes as a suitable enough catchall for the intended purpose of
> > the Fixes tag. I suspect the fundamental problem actually exists in that
> > base patch because for_each_perag() iterates solely based on pag !=
> > NULL. It seems a little odd that the sb_agcount usage is not introduced
> > until a couple patches later, but I suppose that could just be
> > considered a dependency. In reality, it's probably unlikely to ever have
> > a stable kernel at that intermediate point of a rework series so it
> > might not matter much either way. I don't really have a preference one
> > way or the other. Your call..?
> 
> Those fixes tags seem like a reasonable breadcrumb for finding fixes.
> I'll add them to the respective patches on commit.  So for this third
> one:
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 

Thanks. Do you want me to post my v3 with the style and tag fixes or
have you already made those changes?

> > > Also ... should we be checking for agno <= agcount-1 for the initial
> > > xfs_perag_get in the first for loop clause of for_each_perag_range?
> > > I /think/ the answer is that the current users are careful enough to
> > > check that race, but I haven't looked exhaustively.
> > > 
> > 
> > Not sure I follow... for_each_perag_range() is a more generic variant
> > that doesn't know or care about sb_agcount. I think it should support
> > the ability to span an arbitrary range of perags regardless of
> > sb_agcount. Hm?
> 
> Oh, I was idly wondering if these iterators ought to have one more
> training wheel where the loop would be skipped entirely if you did
> something buggy such as:
> 
> agno = mp->m_sb.sb_agcount;
> /* time goes by */
> for_each_perag_from(mp, agno...)
> 	/* stuff */
> 
> Normally that would be skipped since xfs_perag_get(sb_agcount) returns
> NULL, except in the case that it's racing with growfs.  But, some
> malfunction like this should be fairly easy to spot even in the common
> case.
> 

Oh, I see. Yeah, I think technically that would be more defensive logic.
We might be able to repurpose xfs_perag_next() into something more
generic that also covers the init case, but I'm not terribly concerned
with that type of misuse in the context of this patch (and not sure it
warrants the quirky logic if there are bigger changes pending with these
macros anyways).

Brian

> --D
> 
> > > Welcome back, by the way. :)
> > > 
> > 
> > Thanks!
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_ag.h | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > > > index cf8baae2ba18..b8cc5017efba 100644
> > > > --- a/fs/xfs/libxfs/xfs_ag.h
> > > > +++ b/fs/xfs/libxfs/xfs_ag.h
> > > > @@ -142,7 +142,7 @@ struct xfs_perag *xfs_perag_next(
> > > >  		(pag) = xfs_perag_next((pag), &(agno)))
> > > >  
> > > >  #define for_each_perag_from(mp, agno, pag) \
> > > > -	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > > > +	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
> > > >  
> > > >  
> > > >  #define for_each_perag(mp, agno, pag) \
> > > > -- 
> > > > 2.31.1
> > > > 
> > > 
> > 
>
Darrick J. Wong Oct. 14, 2021, 5:50 p.m. UTC | #5
On Thu, Oct 14, 2021 at 01:41:57PM -0400, Brian Foster wrote:
> On Thu, Oct 14, 2021 at 09:46:21AM -0700, Darrick J. Wong wrote:
> > On Thu, Oct 14, 2021 at 10:10:36AM -0400, Brian Foster wrote:
> > > On Tue, Oct 12, 2021 at 12:08:22PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Oct 12, 2021 at 12:52:02PM -0400, Brian Foster wrote:
> > > > > The for_each_perag_from() iteration macro relies on sb_agcount to
> > > > > process every perag currently within EOFS from a given starting
> > > > > point. It's perfectly valid to have perag structures beyond
> > > > > sb_agcount, however, such as if a growfs is in progress. If a perag
> > > > > loop happens to race with growfs in this manner, it will actually
> > > > > attempt to process the post-EOFS perag where ->pag_agno ==
> > > > > sb_agcount. This is reproduced by xfs/104 and manifests as the
> > > > > following assert failure in superblock write verifier context:
> > > > > 
> > > > >  XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
> > > > > 
> > > > > Update the corresponding macro to only process perags that are
> > > > > within the current sb_agcount.
> > > > 
> > > > Does this need a Fixes: tag?
> > > > 
> > > 
> > > Probably. I briefly looked into this originally, saw that this code was
> > > introduced/modified across a span of commits and skipped it because it
> > > wasn't immediately clear which singular commit may have introduced the
> > > bug(s). Since these are now separate patches, I'd probably go with
> > > 58d43a7e3263 ("xfs: pass perags around in fsmap data dev functions") for
> > > this one (since it introduced the use of sb_agcount) and f250eedcf762
> > > ("xfs: make for_each_perag... a first class citizen") for the next
> > > patch.
> > > 
> > > That said, technically we could probably refer to the latter for both of
> > > these fixes as a suitable enough catchall for the intended purpose of
> > > the Fixes tag. I suspect the fundamental problem actually exists in that
> > > base patch because for_each_perag() iterates solely based on pag !=
> > > NULL. It seems a little odd that the sb_agcount usage is not introduced
> > > until a couple patches later, but I suppose that could just be
> > > considered a dependency. In reality, it's probably unlikely to ever have
> > > a stable kernel at that intermediate point of a rework series so it
> > > might not matter much either way. I don't really have a preference one
> > > way or the other. Your call..?
> > 
> > Those fixes tags seem like a reasonable breadcrumb for finding fixes.
> > I'll add them to the respective patches on commit.  So for this third
> > one:
> > 
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > 
> 
> Thanks. Do you want me to post my v3 with the style and tag fixes or
> have you already made those changes?

Yes please. :)

> > > > Also ... should we be checking for agno <= agcount-1 for the initial
> > > > xfs_perag_get in the first for loop clause of for_each_perag_range?
> > > > I /think/ the answer is that the current users are careful enough to
> > > > check that race, but I haven't looked exhaustively.
> > > > 
> > > 
> > > Not sure I follow... for_each_perag_range() is a more generic variant
> > > that doesn't know or care about sb_agcount. I think it should support
> > > the ability to span an arbitrary range of perags regardless of
> > > sb_agcount. Hm?
> > 
> > Oh, I was idly wondering if these iterators ought to have one more
> > training wheel where the loop would be skipped entirely if you did
> > something buggy such as:
> > 
> > agno = mp->m_sb.sb_agcount;
> > /* time goes by */
> > for_each_perag_from(mp, agno...)
> > 	/* stuff */
> > 
> > Normally that would be skipped since xfs_perag_get(sb_agcount) returns
> > NULL, except in the case that it's racing with growfs.  But, some
> > malfunction like this should be fairly easy to spot even in the common
> > case.
> > 
> 
> Oh, I see. Yeah, I think technically that would be more defensive logic.
> We might be able to repurpose xfs_perag_next() into something more
> generic that also covers the init case, but I'm not terribly concerned
> with that type of misuse in the context of this patch (and not sure it
> warrants the quirky logic if there are bigger changes pending with these
> macros anyways).

Nah, I think we can probably let that go.  If one of us trips over this
6mo from now when we've forgotten, we can add the extra training wheels
for ourselves at that time. ;)

--D

> 
> Brian
> 
> > --D
> > 
> > > > Welcome back, by the way. :)
> > > > 
> > > 
> > > Thanks!
> > > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > > ---
> > > > >  fs/xfs/libxfs/xfs_ag.h | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > > > > index cf8baae2ba18..b8cc5017efba 100644
> > > > > --- a/fs/xfs/libxfs/xfs_ag.h
> > > > > +++ b/fs/xfs/libxfs/xfs_ag.h
> > > > > @@ -142,7 +142,7 @@ struct xfs_perag *xfs_perag_next(
> > > > >  		(pag) = xfs_perag_next((pag), &(agno)))
> > > > >  
> > > > >  #define for_each_perag_from(mp, agno, pag) \
> > > > > -	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > > > > +	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
> > > > >  
> > > > >  
> > > > >  #define for_each_perag(mp, agno, pag) \
> > > > > -- 
> > > > > 2.31.1
> > > > > 
> > > > 
> > > 
> > 
>
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index cf8baae2ba18..b8cc5017efba 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -142,7 +142,7 @@  struct xfs_perag *xfs_perag_next(
 		(pag) = xfs_perag_next((pag), &(agno)))
 
 #define for_each_perag_from(mp, agno, pag) \
-	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
+	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
 
 
 #define for_each_perag(mp, agno, pag) \