mbox series

[5.15,CANDIDATE,00/10] more xfs fixes for 5.15

Message ID 20230208175228.2226263-1-leah.rumancik@gmail.com (mailing list archive)
Headers show
Series more xfs fixes for 5.15 | expand

Message

Leah Rumancik Feb. 8, 2023, 5:52 p.m. UTC
Hello again,

Here is the next batch of backports for 5.15.y. Testing included
25 runs of auto group on 12 xfs configs. No regressions were seen.
I checked xfs/538 was run without issue as this test was mentioned
in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
unable to reproduce the issue.

Below I've outlined which series the backports came from:

series "xfs: intent whiteouts" (1):
[01/10] cb512c921639613ce03f87e62c5e93ed9fe8c84d
    xfs: zero inode fork buffer at allocation
[02/10] c230a4a85bcdbfc1a7415deec6caf04e8fca1301
    xfs: fix potential log item leak

series "xfs: fix random format verification issues" (2):
[1/4] dc04db2aa7c9307e740d6d0e173085301c173b1a
    xfs: detect self referencing btree sibling pointers
[2/4] 1eb70f54c445fcbb25817841e774adb3d912f3e8 -> already in 5.15.y
    xfs: validate inode fork size against fork format
[3/4] dd0d2f9755191690541b09e6385d0f8cd8bc9d8f
    xfs: set XFS_FEAT_NLINK correctly
[4/4] f0f5f658065a5af09126ec892e4c383540a1c77f
    xfs: validate v5 feature fields

series "xfs: small fixes for 5.19 cycle" (3):
[1/3] 5672225e8f2a872a22b0cecedba7a6644af1fb84
    xfs: avoid unnecessary runtime sibling pointer endian conversions
[2/3] 5b55cbc2d72632e874e50d2e36bce608e55aaaea
    fs: don't assert fail on perag references on teardown
[2/3] 56486f307100e8fc66efa2ebd8a71941fa10bf6f
    xfs: assert in xfs_btree_del_cursor should take into account error

series "xfs: random fixes for 5.19" (4):
[1/2] 86d40f1e49e9a909d25c35ba01bea80dbcd758cb
    xfs: purge dquots after inode walk fails during quotacheck
[2/2] a54f78def73d847cb060b18c4e4a3d1d26c9ca6d
    xfs: don't leak btree cursor when insrec fails after a split

(1) https://lore.kernel.org/all/20220503221728.185449-1-david@fromorbit.com/
(2) https://lore.kernel.org/all/20220502082018.1076561-1-david@fromorbit.com/
(3) https://lore.kernel.org/all/20220524022158.1849458-1-david@fromorbit.com/
(4) https://lore.kernel.org/all/165337056527.993079.1232300816023906959.stgit@magnolia/

Darrick J. Wong (2):
  xfs: purge dquots after inode walk fails during quotacheck
  xfs: don't leak btree cursor when insrec fails after a split

Dave Chinner (8):
  xfs: zero inode fork buffer at allocation
  xfs: fix potential log item leak
  xfs: detect self referencing btree sibling pointers
  xfs: set XFS_FEAT_NLINK correctly
  xfs: validate v5 feature fields
  xfs: avoid unnecessary runtime sibling pointer endian conversions
  xfs: don't assert fail on perag references on teardown
  xfs: assert in xfs_btree_del_cursor should take into account error

 fs/xfs/libxfs/xfs_ag.c         |   3 +-
 fs/xfs/libxfs/xfs_btree.c      | 175 +++++++++++++++++++++++++--------
 fs/xfs/libxfs/xfs_inode_fork.c |  12 ++-
 fs/xfs/libxfs/xfs_sb.c         |  70 +++++++++++--
 fs/xfs/xfs_bmap_item.c         |   2 +
 fs/xfs/xfs_icreate_item.c      |   1 +
 fs/xfs/xfs_qm.c                |   9 +-
 fs/xfs/xfs_refcount_item.c     |   2 +
 fs/xfs/xfs_rmap_item.c         |   2 +
 9 files changed, 221 insertions(+), 55 deletions(-)

Comments

Amir Goldstein Feb. 8, 2023, 7:02 p.m. UTC | #1
On Wed, Feb 8, 2023 at 7:52 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
>
> Hello again,
>
> Here is the next batch of backports for 5.15.y. Testing included
> 25 runs of auto group on 12 xfs configs. No regressions were seen.
> I checked xfs/538 was run without issue as this test was mentioned
> in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
> XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
> unable to reproduce the issue.

Did you find any tests that started to pass or whose failure rate reduced?

There are very few Fixes: annotations in these commits so it is hard for
me to assess if any of them are relevant/important/worth the effort/risk
to backport to 5.10.

Unless I know of reproducible bugs in 5.10, I don't think I will invest
in backporting this series to 5.10.
Chandan, if you find any of these fixes relevant and important for 5.4
let me know and I will make the effort to consider them for 5.10.

Leah, please consider working on the SGID bug fixes for the next 5.15
update, because my 5.10 SGID fixes series [1] has been blocked for
months and because there are several reproducible test cases in xfstest.

I did not push on that until now because SGID test expectations were
a moving target, but since xfstests commit 81e6f628 ("generic: update
setgid tests") in this week's xfstests release, I think that tests should be
stable and we can finally start backporting all relevant SGID fixes to
align the SGID behavior of LTS kernels with that of upstream.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/xfs-5.10.y-sgid-fixes

>
> Below I've outlined which series the backports came from:
>
> series "xfs: intent whiteouts" (1):
> [01/10] cb512c921639613ce03f87e62c5e93ed9fe8c84d
>     xfs: zero inode fork buffer at allocation
> [02/10] c230a4a85bcdbfc1a7415deec6caf04e8fca1301
>     xfs: fix potential log item leak
>
> series "xfs: fix random format verification issues" (2):
> [1/4] dc04db2aa7c9307e740d6d0e173085301c173b1a
>     xfs: detect self referencing btree sibling pointers
> [2/4] 1eb70f54c445fcbb25817841e774adb3d912f3e8 -> already in 5.15.y
>     xfs: validate inode fork size against fork format
> [3/4] dd0d2f9755191690541b09e6385d0f8cd8bc9d8f
>     xfs: set XFS_FEAT_NLINK correctly
> [4/4] f0f5f658065a5af09126ec892e4c383540a1c77f
>     xfs: validate v5 feature fields
>
> series "xfs: small fixes for 5.19 cycle" (3):
> [1/3] 5672225e8f2a872a22b0cecedba7a6644af1fb84
>     xfs: avoid unnecessary runtime sibling pointer endian conversions
> [2/3] 5b55cbc2d72632e874e50d2e36bce608e55aaaea
>     fs: don't assert fail on perag references on teardown
> [2/3] 56486f307100e8fc66efa2ebd8a71941fa10bf6f
>     xfs: assert in xfs_btree_del_cursor should take into account error
>
> series "xfs: random fixes for 5.19" (4):
> [1/2] 86d40f1e49e9a909d25c35ba01bea80dbcd758cb
>     xfs: purge dquots after inode walk fails during quotacheck
> [2/2] a54f78def73d847cb060b18c4e4a3d1d26c9ca6d
>     xfs: don't leak btree cursor when insrec fails after a split
>
> (1) https://lore.kernel.org/all/20220503221728.185449-1-david@fromorbit.com/
> (2) https://lore.kernel.org/all/20220502082018.1076561-1-david@fromorbit.com/
> (3) https://lore.kernel.org/all/20220524022158.1849458-1-david@fromorbit.com/
> (4) https://lore.kernel.org/all/165337056527.993079.1232300816023906959.stgit@magnolia/
>
> Darrick J. Wong (2):
>   xfs: purge dquots after inode walk fails during quotacheck
>   xfs: don't leak btree cursor when insrec fails after a split
>
> Dave Chinner (8):
>   xfs: zero inode fork buffer at allocation
>   xfs: fix potential log item leak
>   xfs: detect self referencing btree sibling pointers
>   xfs: set XFS_FEAT_NLINK correctly
>   xfs: validate v5 feature fields
>   xfs: avoid unnecessary runtime sibling pointer endian conversions
>   xfs: don't assert fail on perag references on teardown
>   xfs: assert in xfs_btree_del_cursor should take into account error
>
>  fs/xfs/libxfs/xfs_ag.c         |   3 +-
>  fs/xfs/libxfs/xfs_btree.c      | 175 +++++++++++++++++++++++++--------
>  fs/xfs/libxfs/xfs_inode_fork.c |  12 ++-
>  fs/xfs/libxfs/xfs_sb.c         |  70 +++++++++++--
>  fs/xfs/xfs_bmap_item.c         |   2 +
>  fs/xfs/xfs_icreate_item.c      |   1 +
>  fs/xfs/xfs_qm.c                |   9 +-
>  fs/xfs/xfs_refcount_item.c     |   2 +
>  fs/xfs/xfs_rmap_item.c         |   2 +
>  9 files changed, 221 insertions(+), 55 deletions(-)
>
> --
> 2.39.1.519.gcb327c4b5f-goog
>
Darrick J. Wong Feb. 8, 2023, 7:40 p.m. UTC | #2
On Wed, Feb 08, 2023 at 09:02:58PM +0200, Amir Goldstein wrote:
> On Wed, Feb 8, 2023 at 7:52 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
> >
> > Hello again,
> >
> > Here is the next batch of backports for 5.15.y. Testing included
> > 25 runs of auto group on 12 xfs configs. No regressions were seen.
> > I checked xfs/538 was run without issue as this test was mentioned
> > in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
> > XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
> > unable to reproduce the issue.
> 
> Did you find any tests that started to pass or whose failure rate reduced?

I wish Leah had, but there basically aren't any tests for the problems
fixed in this set for her to find. :(

The first two patches I think were from when Dave was working on log
intent whiteouts, turned on KASAN to diagnose some other problem he had,
and began pulling on the ball of string (as it were) as he noticed other
things in the codebase.  We don't usually bother with regression tests
for kernel memory leaks, since they're not so easy to reproduce.

Patches 3-6 are fixes for a rash of fuzzer reports that someone in China
posted last May:
https://bugzilla.kernel.org/show_bug.cgi?id=215927

(There are more than just that one)

As usual, the submitter didn't bother to help triage and just dumped a
ton of work in our laps.  They didn't follow up with any regression
tests, because few fuzz kiddiez ever do.  At the time, I was too burned
out to deal with it, so Dave posted fixes.

Patches 7-8 would manifest themselves as test VMs halting on ASSERTs if
you configure your kernel to panic.  Not strictly needed since most LTS
kernels probably don't even have XFS_DEBUG=y, but it makes the lives of
recoveryloop runners easier if they do.

Patch 9 trips xfs/434 and xfs/436, but they only run if you have slab
debugging enabled and build XFS as a module.  I don't know why very few
people do this.

Patch 10 is a memory leak if you have XFS_DEBUG=y.  No need for a
separate test for this one, since kmemleak catches it.  If you turn it
on.

(IOWs, LGTM for the whole set.)

> There are very few Fixes: annotations in these commits so it is hard for
> me to assess if any of them are relevant/important/worth the effort/risk
> to backport to 5.10.

<nod> Dave's fixpatches rarely have Fixes tags attached.  It's difficult
to get him to do that because he has so much bad history with AUTOSEL.
I've tried to get him to add them in the past, but if I'm already
stressed out and Dave doesn't reply then I tend to merge the fix and
move on.

> Unless I know of reproducible bugs in 5.10, I don't think I will invest
> in backporting this series to 5.10.
> Chandan, if you find any of these fixes relevant and important for 5.4
> let me know and I will make the effort to consider them for 5.10.
> 
> Leah, please consider working on the SGID bug fixes for the next 5.15
> update, because my 5.10 SGID fixes series [1] has been blocked for
> months and because there are several reproducible test cases in xfstest.

Whenever y'all get to 6.0, beware of commit 2ed5b09b3e8f ("xfs: make
inode attribute forks a permanent part of struct xfs_inode"), which is a
KASAN UAF that we fixed by eliminating the 'F'.  Do not pull on the
devilstring ("...the file capabilities code calls getxattr with and
without i_rwsem held...") if you can avoid it.

> I did not push on that until now because SGID test expectations were
> a moving target, but since xfstests commit 81e6f628 ("generic: update
> setgid tests") in this week's xfstests release, I think that tests should be
> stable and we can finally start backporting all relevant SGID fixes to
> align the SGID behavior of LTS kernels with that of upstream.

Oh good, I've been (gently) waiting on that one too. :)

--D

> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/linux/commits/xfs-5.10.y-sgid-fixes
> 
> >
> > Below I've outlined which series the backports came from:
> >
> > series "xfs: intent whiteouts" (1):
> > [01/10] cb512c921639613ce03f87e62c5e93ed9fe8c84d
> >     xfs: zero inode fork buffer at allocation
> > [02/10] c230a4a85bcdbfc1a7415deec6caf04e8fca1301
> >     xfs: fix potential log item leak
> >
> > series "xfs: fix random format verification issues" (2):
> > [1/4] dc04db2aa7c9307e740d6d0e173085301c173b1a
> >     xfs: detect self referencing btree sibling pointers
> > [2/4] 1eb70f54c445fcbb25817841e774adb3d912f3e8 -> already in 5.15.y
> >     xfs: validate inode fork size against fork format
> > [3/4] dd0d2f9755191690541b09e6385d0f8cd8bc9d8f
> >     xfs: set XFS_FEAT_NLINK correctly
> > [4/4] f0f5f658065a5af09126ec892e4c383540a1c77f
> >     xfs: validate v5 feature fields
> >
> > series "xfs: small fixes for 5.19 cycle" (3):
> > [1/3] 5672225e8f2a872a22b0cecedba7a6644af1fb84
> >     xfs: avoid unnecessary runtime sibling pointer endian conversions
> > [2/3] 5b55cbc2d72632e874e50d2e36bce608e55aaaea
> >     fs: don't assert fail on perag references on teardown
> > [2/3] 56486f307100e8fc66efa2ebd8a71941fa10bf6f
> >     xfs: assert in xfs_btree_del_cursor should take into account error
> >
> > series "xfs: random fixes for 5.19" (4):
> > [1/2] 86d40f1e49e9a909d25c35ba01bea80dbcd758cb
> >     xfs: purge dquots after inode walk fails during quotacheck
> > [2/2] a54f78def73d847cb060b18c4e4a3d1d26c9ca6d
> >     xfs: don't leak btree cursor when insrec fails after a split
> >
> > (1) https://lore.kernel.org/all/20220503221728.185449-1-david@fromorbit.com/
> > (2) https://lore.kernel.org/all/20220502082018.1076561-1-david@fromorbit.com/
> > (3) https://lore.kernel.org/all/20220524022158.1849458-1-david@fromorbit.com/
> > (4) https://lore.kernel.org/all/165337056527.993079.1232300816023906959.stgit@magnolia/
> >
> > Darrick J. Wong (2):
> >   xfs: purge dquots after inode walk fails during quotacheck
> >   xfs: don't leak btree cursor when insrec fails after a split
> >
> > Dave Chinner (8):
> >   xfs: zero inode fork buffer at allocation
> >   xfs: fix potential log item leak
> >   xfs: detect self referencing btree sibling pointers
> >   xfs: set XFS_FEAT_NLINK correctly
> >   xfs: validate v5 feature fields
> >   xfs: avoid unnecessary runtime sibling pointer endian conversions
> >   xfs: don't assert fail on perag references on teardown
> >   xfs: assert in xfs_btree_del_cursor should take into account error
> >
> >  fs/xfs/libxfs/xfs_ag.c         |   3 +-
> >  fs/xfs/libxfs/xfs_btree.c      | 175 +++++++++++++++++++++++++--------
> >  fs/xfs/libxfs/xfs_inode_fork.c |  12 ++-
> >  fs/xfs/libxfs/xfs_sb.c         |  70 +++++++++++--
> >  fs/xfs/xfs_bmap_item.c         |   2 +
> >  fs/xfs/xfs_icreate_item.c      |   1 +
> >  fs/xfs/xfs_qm.c                |   9 +-
> >  fs/xfs/xfs_refcount_item.c     |   2 +
> >  fs/xfs/xfs_rmap_item.c         |   2 +
> >  9 files changed, 221 insertions(+), 55 deletions(-)
> >
> > --
> > 2.39.1.519.gcb327c4b5f-goog
> >
Dave Chinner Feb. 8, 2023, 10:21 p.m. UTC | #3
On Wed, Feb 08, 2023 at 11:40:59AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 08, 2023 at 09:02:58PM +0200, Amir Goldstein wrote:
> > On Wed, Feb 8, 2023 at 7:52 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
> > >
> > > Hello again,
> > >
> > > Here is the next batch of backports for 5.15.y. Testing included
> > > 25 runs of auto group on 12 xfs configs. No regressions were seen.
> > > I checked xfs/538 was run without issue as this test was mentioned
> > > in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
> > > XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
> > > unable to reproduce the issue.
> > 
> > Did you find any tests that started to pass or whose failure rate reduced?
> 
> I wish Leah had, but there basically aren't any tests for the problems
> fixed in this set for her to find. :(
.....
> > There are very few Fixes: annotations in these commits so it is hard for
> > me to assess if any of them are relevant/important/worth the effort/risk
> > to backport to 5.10.
> 
> <nod> Dave's fixpatches rarely have Fixes tags attached.  It's difficult
> to get him to do that because he has so much bad history with AUTOSEL.
> I've tried to get him to add them in the past, but if I'm already
> stressed out and Dave doesn't reply then I tend to merge the fix and
> move on.

In my defense, all the "fixes" from me in this series (except for
the one with a fixes tag on it) date back so long ago it was
difficult to identify what commit actually introduced the issue.
Once we're talking about "it's been there for at least a decade" -
espcially for fuzzer issues - identifying the exact commit is time
consuming and often not possible, nor really useful for anything.

I'm also not going to tag a patch with "fixes commit xyz" when
commit xyz isn't actually the cause of the problem just so that
someone can blindly use that as a "it's got a fixes tag on it, we
should back port it" trigger.

That's the whole problem with AUTOSEL - blindly applying anything
with a fixes tag on it that merges cleanly into an older kernel -
and the whole point of having a human actually manage the stable
kernel backports.

The stable XFS kernel maintainer is supposed to be actively looking
at the commits that go into the upstream kernel to determine if they
are relevant or not to the given stable kernel, regardless of
whether they address fstests failures, have fixes/stable tags on
them, etc. If all we needed stable maintainers to do is turn a crank
handle, then we'd be perfectly OK with AUTOSEL and the upstream
stable kernel process....

-Dave.
Amir Goldstein Feb. 9, 2023, 8:08 a.m. UTC | #4
On Thu, Feb 9, 2023 at 12:21 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Feb 08, 2023 at 11:40:59AM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 08, 2023 at 09:02:58PM +0200, Amir Goldstein wrote:
> > > On Wed, Feb 8, 2023 at 7:52 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
> > > >
> > > > Hello again,
> > > >
> > > > Here is the next batch of backports for 5.15.y. Testing included
> > > > 25 runs of auto group on 12 xfs configs. No regressions were seen.
> > > > I checked xfs/538 was run without issue as this test was mentioned
> > > > in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
> > > > XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
> > > > unable to reproduce the issue.
> > >
> > > Did you find any tests that started to pass or whose failure rate reduced?
> >
> > I wish Leah had, but there basically aren't any tests for the problems
> > fixed in this set for her to find. :(
> .....
> > > There are very few Fixes: annotations in these commits so it is hard for
> > > me to assess if any of them are relevant/important/worth the effort/risk
> > > to backport to 5.10.
> >
> > <nod> Dave's fixpatches rarely have Fixes tags attached.  It's difficult
> > to get him to do that because he has so much bad history with AUTOSEL.
> > I've tried to get him to add them in the past, but if I'm already
> > stressed out and Dave doesn't reply then I tend to merge the fix and
> > move on.
>
> In my defense, all the "fixes" from me in this series (except for
> the one with a fixes tag on it) date back so long ago it was
> difficult to identify what commit actually introduced the issue.
> Once we're talking about "it's been there for at least a decade" -
> espcially for fuzzer issues - identifying the exact commit is time
> consuming and often not possible, nor really useful for anything.
>

I wouldn't want anyone to spend more time on fuzzer issues and
I myself have given those very little attention.
My expectation is to always have a Fixes tag when a developer
knows they are fixing a regression and best effort when it is easy
for the developer to determine when the buggy code was merged.

It doesn't even have to be in the form of a Fixes tag.
If the cover letter says "fixes to the $FEATURE from v5.18"
that is good enough for me.

As a stable maintainer, if a commit has no Fixes tag and no hint
in the cover letter about relevant kernel versions, I will assume
it is relevant to the LTS kernel and spend time on the manual triage.
If a developer adds a Fixes tag where relevant, that can save my
time by eliminating commits irrelevant for e.g. 5.10.y.
This wasn't the case in this series I guess.

> I'm also not going to tag a patch with "fixes commit xyz" when
> commit xyz isn't actually the cause of the problem just so that
> someone can blindly use that as a "it's got a fixes tag on it, we
> should back port it" trigger.
>
> That's the whole problem with AUTOSEL - blindly applying anything
> with a fixes tag on it that merges cleanly into an older kernel -
> and the whole point of having a human actually manage the stable
> kernel backports.
>
> The stable XFS kernel maintainer is supposed to be actively looking
> at the commits that go into the upstream kernel to determine if they
> are relevant or not to the given stable kernel, regardless of
> whether they address fstests failures, have fixes/stable tags on
> them, etc. If all we needed stable maintainers to do is turn a crank
> handle, then we'd be perfectly OK with AUTOSEL and the upstream
> stable kernel process....
>

Yes. The reason I did not pick any of the commits in this series for 5.10
is not because lack of Fixes/tests, but because I read the cover letters
(thanks for the links Leah!) and commit messages and conclude that
the risk/effort does not make the cut.

But this conclusion was based very much on my own intuition and my
own interpretation of the cover letters. This is why I asked Chandan to
take another look.

I encourage any developer who writes a cover letter to share their
own opinion about the relevancy and risks associated with backporting
their patch set to LTS kernels - make it as vague statement as you
wish, anything that can help a human LTS maintainer do their job better.

Thanks,
Amir.
Chandan Babu R Feb. 10, 2023, 1:11 p.m. UTC | #5
On Wed, Feb 08, 2023 at 09:02:58 PM +0200, Amir Goldstein wrote:
> On Wed, Feb 8, 2023 at 7:52 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
>>
>> Hello again,
>>
>> Here is the next batch of backports for 5.15.y. Testing included
>> 25 runs of auto group on 12 xfs configs. No regressions were seen.
>> I checked xfs/538 was run without issue as this test was mentioned
>> in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
>> XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
>> unable to reproduce the issue.
>
> Did you find any tests that started to pass or whose failure rate reduced?
>
> There are very few Fixes: annotations in these commits so it is hard for
> me to assess if any of them are relevant/important/worth the effort/risk
> to backport to 5.10.
>
> Unless I know of reproducible bugs in 5.10, I don't think I will invest
> in backporting this series to 5.10.
> Chandan, if you find any of these fixes relevant and important for 5.4
> let me know and I will make the effort to consider them for 5.10.

Amir, Please find my observations listed below,

Patch 1: Intent whiteouts was designed to overcome performance problems with
delayed attributes feature. Hence this patch will not be needed for v5.4-LTS.

Patch 2: This patch won't be needed since this memory leak is prevalent mostly
when intent whiteouts are being used.

Patch 3: IMHO this fix does not need to be back-ported since ondisk btree
cycles are close to impossible to occur during regular operation of an XFS
filesystem.

Patch 4: This patch won't be needed since xfs_mount->m_features isn't present
in v5.4 xfsprogs.

Patch 5: I am not sure about this one. May be mkfs.xfs has always created V5
filesystems with the required V4 feature flags set and probably fuzzing is the
only way we could modify a V5 filesystem to have some of the required V4
feature flags disabled.
I don't think this patch needs to be backported if my assumptions are true.

Patch 6: This patch does not need to be backported since this fixes a
performance regression introduced by the third patch.

Patch 7: I will want to backport this patch since it will prevent
recovery-loop test execution from stalling.

Patch 8: I think backporting XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag to
5.4-LTS will be helpful since xfs/538 has revealed some important bugs. Patch
8 needs to backported after backporting patches relevant to
XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT.

Patch 9: I think this patch needs to be backported to prevent kmemleak
warnings.

Patch 10: I would backport this patch since this can prevent kmemleak warnings
from showing up (even though the bug was detected after the third patch was
applied).

But maybe you could wait until I come across these patches when working on
backporting commits from v5.11 and newer versions of the kernel. I will share
the list of candidate patches with you. Once we agree upon the list of
patches, you could then backport them to v5.10-LTS.
Leah Rumancik Feb. 10, 2023, 7:51 p.m. UTC | #6
On Wed, Feb 08, 2023 at 11:40:59AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 08, 2023 at 09:02:58PM +0200, Amir Goldstein wrote:
> > On Wed, Feb 8, 2023 at 7:52 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
> > >
> > > Hello again,
> > >
> > > Here is the next batch of backports for 5.15.y. Testing included
> > > 25 runs of auto group on 12 xfs configs. No regressions were seen.
> > > I checked xfs/538 was run without issue as this test was mentioned
> > > in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
> > > XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
> > > unable to reproduce the issue.
> > 
> > Did you find any tests that started to pass or whose failure rate reduced?
> 
> I wish Leah had, but there basically aren't any tests for the problems
> fixed in this set for her to find. :(
> 
> The first two patches I think were from when Dave was working on log
> intent whiteouts, turned on KASAN to diagnose some other problem he had,
> and began pulling on the ball of string (as it were) as he noticed other
> things in the codebase.  We don't usually bother with regression tests
> for kernel memory leaks, since they're not so easy to reproduce.
> 
> Patches 3-6 are fixes for a rash of fuzzer reports that someone in China
> posted last May:
> https://bugzilla.kernel.org/show_bug.cgi?id=215927
> 
> (There are more than just that one)
> 
> As usual, the submitter didn't bother to help triage and just dumped a
> ton of work in our laps.  They didn't follow up with any regression
> tests, because few fuzz kiddiez ever do.  At the time, I was too burned
> out to deal with it, so Dave posted fixes.
> 
> Patches 7-8 would manifest themselves as test VMs halting on ASSERTs if
> you configure your kernel to panic.  Not strictly needed since most LTS
> kernels probably don't even have XFS_DEBUG=y, but it makes the lives of
> recoveryloop runners easier if they do.
> 
> Patch 9 trips xfs/434 and xfs/436, but they only run if you have slab
> debugging enabled and build XFS as a module.  I don't know why very few
> people do this.
> 
> Patch 10 is a memory leak if you have XFS_DEBUG=y.  No need for a
> separate test for this one, since kmemleak catches it.  If you turn it
> on.
> 
> (IOWs, LGTM for the whole set.)

Good to add Ack tag?

> 
> > 
> > Leah, please consider working on the SGID bug fixes for the next 5.15
> > update, because my 5.10 SGID fixes series [1] has been blocked for
> > months and because there are several reproducible test cases in xfstest.
> 
> Whenever y'all get to 6.0, beware of commit 2ed5b09b3e8f ("xfs: make
> inode attribute forks a permanent part of struct xfs_inode"), which is a
> KASAN UAF that we fixed by eliminating the 'F'.  Do not pull on the
> devilstring ("...the file capabilities code calls getxattr with and
> without i_rwsem held...") if you can avoid it.
> 
> > I did not push on that until now because SGID test expectations were
> > a moving target, but since xfstests commit 81e6f628 ("generic: update
> > setgid tests") in this week's xfstests release, I think that tests should be
> > stable and we can finally start backporting all relevant SGID fixes to
> > align the SGID behavior of LTS kernels with that of upstream.

Ooo goody, ok, will do this next.

The following patches are on my radar to look into for this set. I have
yet to look into dependencies, so the set may grow. If the sgid tests
still fail after these ptaches, I will continue hunting for more fixes
to include in this set.

  e014f37db1a2 xfs: use setattr_copy to set vfs inode attributes
  472c6e46f589 xfs: remove XFS_PREALLOC_SYNC
  fbe7e5200365 xfs: fallocate() should call file_modified()
  0b02c8c0d75a xfs: set prealloc flag in xfs_alloc_file_space()
  2b3416ceff5e fs: add mode_strip_sgid() helper
  1639a49ccdce fs: move S_ISGID stripping into the vfs_*() helpers
  ed5a7047d201 attr: use consistent sgid stripping checks
  8d84e39d76bd fs: use consistent setgid checks in is_sxid()

In addition to the normal regression testing, I will specifically look
at the following tests for the sgid changes:

  generic/673
  generic/68[3-7]
  generic/69[6-7]

I will also do some extra runs on the entire perms group.

Let me know if you think something should be dropped or added.

- Leah

> 
> Oh good, I've been (gently) waiting on that one too. :)
> 
> --D
> 
> > Thanks,
> > Amir.
> > 
> > [1] https://github.com/amir73il/linux/commits/xfs-5.10.y-sgid-fixes
> > 
> > >
> > > Below I've outlined which series the backports came from:
> > >
> > > series "xfs: intent whiteouts" (1):
> > > [01/10] cb512c921639613ce03f87e62c5e93ed9fe8c84d
> > >     xfs: zero inode fork buffer at allocation
> > > [02/10] c230a4a85bcdbfc1a7415deec6caf04e8fca1301
> > >     xfs: fix potential log item leak
> > >
> > > series "xfs: fix random format verification issues" (2):
> > > [1/4] dc04db2aa7c9307e740d6d0e173085301c173b1a
> > >     xfs: detect self referencing btree sibling pointers
> > > [2/4] 1eb70f54c445fcbb25817841e774adb3d912f3e8 -> already in 5.15.y
> > >     xfs: validate inode fork size against fork format
> > > [3/4] dd0d2f9755191690541b09e6385d0f8cd8bc9d8f
> > >     xfs: set XFS_FEAT_NLINK correctly
> > > [4/4] f0f5f658065a5af09126ec892e4c383540a1c77f
> > >     xfs: validate v5 feature fields
> > >
> > > series "xfs: small fixes for 5.19 cycle" (3):
> > > [1/3] 5672225e8f2a872a22b0cecedba7a6644af1fb84
> > >     xfs: avoid unnecessary runtime sibling pointer endian conversions
> > > [2/3] 5b55cbc2d72632e874e50d2e36bce608e55aaaea
> > >     fs: don't assert fail on perag references on teardown
> > > [2/3] 56486f307100e8fc66efa2ebd8a71941fa10bf6f
> > >     xfs: assert in xfs_btree_del_cursor should take into account error
> > >
> > > series "xfs: random fixes for 5.19" (4):
> > > [1/2] 86d40f1e49e9a909d25c35ba01bea80dbcd758cb
> > >     xfs: purge dquots after inode walk fails during quotacheck
> > > [2/2] a54f78def73d847cb060b18c4e4a3d1d26c9ca6d
> > >     xfs: don't leak btree cursor when insrec fails after a split
> > >
> > > (1) https://lore.kernel.org/all/20220503221728.185449-1-david@fromorbit.com/
> > > (2) https://lore.kernel.org/all/20220502082018.1076561-1-david@fromorbit.com/
> > > (3) https://lore.kernel.org/all/20220524022158.1849458-1-david@fromorbit.com/
> > > (4) https://lore.kernel.org/all/165337056527.993079.1232300816023906959.stgit@magnolia/
> > >
> > > Darrick J. Wong (2):
> > >   xfs: purge dquots after inode walk fails during quotacheck
> > >   xfs: don't leak btree cursor when insrec fails after a split
> > >
> > > Dave Chinner (8):
> > >   xfs: zero inode fork buffer at allocation
> > >   xfs: fix potential log item leak
> > >   xfs: detect self referencing btree sibling pointers
> > >   xfs: set XFS_FEAT_NLINK correctly
> > >   xfs: validate v5 feature fields
> > >   xfs: avoid unnecessary runtime sibling pointer endian conversions
> > >   xfs: don't assert fail on perag references on teardown
> > >   xfs: assert in xfs_btree_del_cursor should take into account error
> > >
> > >  fs/xfs/libxfs/xfs_ag.c         |   3 +-
> > >  fs/xfs/libxfs/xfs_btree.c      | 175 +++++++++++++++++++++++++--------
> > >  fs/xfs/libxfs/xfs_inode_fork.c |  12 ++-
> > >  fs/xfs/libxfs/xfs_sb.c         |  70 +++++++++++--
> > >  fs/xfs/xfs_bmap_item.c         |   2 +
> > >  fs/xfs/xfs_icreate_item.c      |   1 +
> > >  fs/xfs/xfs_qm.c                |   9 +-
> > >  fs/xfs/xfs_refcount_item.c     |   2 +
> > >  fs/xfs/xfs_rmap_item.c         |   2 +
> > >  9 files changed, 221 insertions(+), 55 deletions(-)
> > >
> > > --
> > > 2.39.1.519.gcb327c4b5f-goog
> > >
Darrick J. Wong Feb. 11, 2023, 4:05 a.m. UTC | #7
On Fri, Feb 10, 2023 at 11:51:03AM -0800, Leah Rumancik wrote:
> On Wed, Feb 08, 2023 at 11:40:59AM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 08, 2023 at 09:02:58PM +0200, Amir Goldstein wrote:
> > > On Wed, Feb 8, 2023 at 7:52 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
> > > >
> > > > Hello again,
> > > >
> > > > Here is the next batch of backports for 5.15.y. Testing included
> > > > 25 runs of auto group on 12 xfs configs. No regressions were seen.
> > > > I checked xfs/538 was run without issue as this test was mentioned
> > > > in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
> > > > XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
> > > > unable to reproduce the issue.
> > > 
> > > Did you find any tests that started to pass or whose failure rate reduced?
> > 
> > I wish Leah had, but there basically aren't any tests for the problems
> > fixed in this set for her to find. :(
> > 
> > The first two patches I think were from when Dave was working on log
> > intent whiteouts, turned on KASAN to diagnose some other problem he had,
> > and began pulling on the ball of string (as it were) as he noticed other
> > things in the codebase.  We don't usually bother with regression tests
> > for kernel memory leaks, since they're not so easy to reproduce.
> > 
> > Patches 3-6 are fixes for a rash of fuzzer reports that someone in China
> > posted last May:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215927
> > 
> > (There are more than just that one)
> > 
> > As usual, the submitter didn't bother to help triage and just dumped a
> > ton of work in our laps.  They didn't follow up with any regression
> > tests, because few fuzz kiddiez ever do.  At the time, I was too burned
> > out to deal with it, so Dave posted fixes.
> > 
> > Patches 7-8 would manifest themselves as test VMs halting on ASSERTs if
> > you configure your kernel to panic.  Not strictly needed since most LTS
> > kernels probably don't even have XFS_DEBUG=y, but it makes the lives of
> > recoveryloop runners easier if they do.
> > 
> > Patch 9 trips xfs/434 and xfs/436, but they only run if you have slab
> > debugging enabled and build XFS as a module.  I don't know why very few
> > people do this.
> > 
> > Patch 10 is a memory leak if you have XFS_DEBUG=y.  No need for a
> > separate test for this one, since kmemleak catches it.  If you turn it
> > on.
> > 
> > (IOWs, LGTM for the whole set.)
> 
> Good to add Ack tag?

Oh, yes, sorry, I forgot about the formal tagging bit:

Acked-by: Darrick J. Wong <djwong@kernel.org>

--D


> > 
> > > 
> > > Leah, please consider working on the SGID bug fixes for the next 5.15
> > > update, because my 5.10 SGID fixes series [1] has been blocked for
> > > months and because there are several reproducible test cases in xfstest.
> > 
> > Whenever y'all get to 6.0, beware of commit 2ed5b09b3e8f ("xfs: make
> > inode attribute forks a permanent part of struct xfs_inode"), which is a
> > KASAN UAF that we fixed by eliminating the 'F'.  Do not pull on the
> > devilstring ("...the file capabilities code calls getxattr with and
> > without i_rwsem held...") if you can avoid it.
> > 
> > > I did not push on that until now because SGID test expectations were
> > > a moving target, but since xfstests commit 81e6f628 ("generic: update
> > > setgid tests") in this week's xfstests release, I think that tests should be
> > > stable and we can finally start backporting all relevant SGID fixes to
> > > align the SGID behavior of LTS kernels with that of upstream.
> 
> Ooo goody, ok, will do this next.
> 
> The following patches are on my radar to look into for this set. I have
> yet to look into dependencies, so the set may grow. If the sgid tests
> still fail after these ptaches, I will continue hunting for more fixes
> to include in this set.
> 
>   e014f37db1a2 xfs: use setattr_copy to set vfs inode attributes
>   472c6e46f589 xfs: remove XFS_PREALLOC_SYNC
>   fbe7e5200365 xfs: fallocate() should call file_modified()
>   0b02c8c0d75a xfs: set prealloc flag in xfs_alloc_file_space()
>   2b3416ceff5e fs: add mode_strip_sgid() helper
>   1639a49ccdce fs: move S_ISGID stripping into the vfs_*() helpers
>   ed5a7047d201 attr: use consistent sgid stripping checks
>   8d84e39d76bd fs: use consistent setgid checks in is_sxid()
> 
> In addition to the normal regression testing, I will specifically look
> at the following tests for the sgid changes:
> 
>   generic/673
>   generic/68[3-7]
>   generic/69[6-7]
> 
> I will also do some extra runs on the entire perms group.
> 
> Let me know if you think something should be dropped or added.
> 
> - Leah
> 
> > 
> > Oh good, I've been (gently) waiting on that one too. :)
> > 
> > --D
> > 
> > > Thanks,
> > > Amir.
> > > 
> > > [1] https://github.com/amir73il/linux/commits/xfs-5.10.y-sgid-fixes
> > > 
> > > >
> > > > Below I've outlined which series the backports came from:
> > > >
> > > > series "xfs: intent whiteouts" (1):
> > > > [01/10] cb512c921639613ce03f87e62c5e93ed9fe8c84d
> > > >     xfs: zero inode fork buffer at allocation
> > > > [02/10] c230a4a85bcdbfc1a7415deec6caf04e8fca1301
> > > >     xfs: fix potential log item leak
> > > >
> > > > series "xfs: fix random format verification issues" (2):
> > > > [1/4] dc04db2aa7c9307e740d6d0e173085301c173b1a
> > > >     xfs: detect self referencing btree sibling pointers
> > > > [2/4] 1eb70f54c445fcbb25817841e774adb3d912f3e8 -> already in 5.15.y
> > > >     xfs: validate inode fork size against fork format
> > > > [3/4] dd0d2f9755191690541b09e6385d0f8cd8bc9d8f
> > > >     xfs: set XFS_FEAT_NLINK correctly
> > > > [4/4] f0f5f658065a5af09126ec892e4c383540a1c77f
> > > >     xfs: validate v5 feature fields
> > > >
> > > > series "xfs: small fixes for 5.19 cycle" (3):
> > > > [1/3] 5672225e8f2a872a22b0cecedba7a6644af1fb84
> > > >     xfs: avoid unnecessary runtime sibling pointer endian conversions
> > > > [2/3] 5b55cbc2d72632e874e50d2e36bce608e55aaaea
> > > >     fs: don't assert fail on perag references on teardown
> > > > [2/3] 56486f307100e8fc66efa2ebd8a71941fa10bf6f
> > > >     xfs: assert in xfs_btree_del_cursor should take into account error
> > > >
> > > > series "xfs: random fixes for 5.19" (4):
> > > > [1/2] 86d40f1e49e9a909d25c35ba01bea80dbcd758cb
> > > >     xfs: purge dquots after inode walk fails during quotacheck
> > > > [2/2] a54f78def73d847cb060b18c4e4a3d1d26c9ca6d
> > > >     xfs: don't leak btree cursor when insrec fails after a split
> > > >
> > > > (1) https://lore.kernel.org/all/20220503221728.185449-1-david@fromorbit.com/
> > > > (2) https://lore.kernel.org/all/20220502082018.1076561-1-david@fromorbit.com/
> > > > (3) https://lore.kernel.org/all/20220524022158.1849458-1-david@fromorbit.com/
> > > > (4) https://lore.kernel.org/all/165337056527.993079.1232300816023906959.stgit@magnolia/
> > > >
> > > > Darrick J. Wong (2):
> > > >   xfs: purge dquots after inode walk fails during quotacheck
> > > >   xfs: don't leak btree cursor when insrec fails after a split
> > > >
> > > > Dave Chinner (8):
> > > >   xfs: zero inode fork buffer at allocation
> > > >   xfs: fix potential log item leak
> > > >   xfs: detect self referencing btree sibling pointers
> > > >   xfs: set XFS_FEAT_NLINK correctly
> > > >   xfs: validate v5 feature fields
> > > >   xfs: avoid unnecessary runtime sibling pointer endian conversions
> > > >   xfs: don't assert fail on perag references on teardown
> > > >   xfs: assert in xfs_btree_del_cursor should take into account error
> > > >
> > > >  fs/xfs/libxfs/xfs_ag.c         |   3 +-
> > > >  fs/xfs/libxfs/xfs_btree.c      | 175 +++++++++++++++++++++++++--------
> > > >  fs/xfs/libxfs/xfs_inode_fork.c |  12 ++-
> > > >  fs/xfs/libxfs/xfs_sb.c         |  70 +++++++++++--
> > > >  fs/xfs/xfs_bmap_item.c         |   2 +
> > > >  fs/xfs/xfs_icreate_item.c      |   1 +
> > > >  fs/xfs/xfs_qm.c                |   9 +-
> > > >  fs/xfs/xfs_refcount_item.c     |   2 +
> > > >  fs/xfs/xfs_rmap_item.c         |   2 +
> > > >  9 files changed, 221 insertions(+), 55 deletions(-)
> > > >
> > > > --
> > > > 2.39.1.519.gcb327c4b5f-goog
> > > >
Amir Goldstein Feb. 11, 2023, 8:33 a.m. UTC | #8
On Fri, Feb 10, 2023 at 9:51 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
>
[...]
> > > I did not push on that until now because SGID test expectations were
> > > a moving target, but since xfstests commit 81e6f628 ("generic: update
> > > setgid tests") in this week's xfstests release, I think that tests should be
> > > stable and we can finally start backporting all relevant SGID fixes to
> > > align the SGID behavior of LTS kernels with that of upstream.
>
> Ooo goody, ok, will do this next.
>
> The following patches are on my radar to look into for this set. I have
> yet to look into dependencies, so the set may grow. If the sgid tests
> still fail after these ptaches, I will continue hunting for more fixes
> to include in this set.
>
>   e014f37db1a2 xfs: use setattr_copy to set vfs inode attributes
>   472c6e46f589 xfs: remove XFS_PREALLOC_SYNC
>   fbe7e5200365 xfs: fallocate() should call file_modified()
>   0b02c8c0d75a xfs: set prealloc flag in xfs_alloc_file_space()
>   2b3416ceff5e fs: add mode_strip_sgid() helper
>   1639a49ccdce fs: move S_ISGID stripping into the vfs_*() helpers
>   ed5a7047d201 attr: use consistent sgid stripping checks
>   8d84e39d76bd fs: use consistent setgid checks in is_sxid()
>
> In addition to the normal regression testing, I will specifically look
> at the following tests for the sgid changes:
>
>   generic/673
>   generic/68[3-7]
>   generic/69[6-7]
>
> I will also do some extra runs on the entire perms group.
>
> Let me know if you think something should be dropped or added.
>

I reckon you will need those dependency prep commits from
Christian's PR [1]:

11c2a8700cdc attr: add in_group_or_capable()
e243e3f94c80 fs: move should_remove_suid()
72ae017c5451 attr: add setattr_should_drop_sgid()

FYI, the ovl commits from this PR are independent fixes that were
already applied to 5.15.y.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20221212112053.99208-1-brauner@kernel.org/