mbox series

[PATCHSET,v25.0,0/7] xfs_scrub: fixes to the repair code

Message ID 168506071314.3742205.8114181660121831202.stgit@frogsfrogsfrogs (mailing list archive)
Headers show
Series xfs_scrub: fixes to the repair code | expand

Message

Darrick J. Wong May 26, 2023, 12:38 a.m. UTC
Hi all,

Now that we've landed the new kernel code, it's time to reorganize the
xfs_scrub code that handles repairs.  Clean up various naming warts and
misleading error messages.  Move the repair code to scrub/repair.c as
the first step.  Then, fix various issues in the repair code before we
start reorganizing things.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes
---
 scrub/phase1.c        |    2 
 scrub/phase2.c        |    3 -
 scrub/phase3.c        |    2 
 scrub/phase4.c        |   22 ++++-
 scrub/phase5.c        |    2 
 scrub/phase6.c        |   13 +++
 scrub/phase7.c        |    2 
 scrub/repair.c        |  177 ++++++++++++++++++++++++++++++++++++++++++-
 scrub/repair.h        |   16 +++-
 scrub/scrub.c         |  204 +------------------------------------------------
 scrub/scrub.h         |   16 ----
 scrub/scrub_private.h |   55 +++++++++++++
 scrub/xfs_scrub.c     |    2 
 13 files changed, 283 insertions(+), 233 deletions(-)
 create mode 100644 scrub/scrub_private.h

Comments

Shiyang Ruan June 8, 2023, 1:22 p.m. UTC | #1
Hi Darrick,

I'm running test on this patchset (patched kernel and xfsprogs, latest 
xfstests:v2023.05.28).  Now I found xfs/730 failed with message "online 
scrub didn't fail".  The detail is:

FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP 
PREEMPT_DYNAMIC Wed Jun  7 15:44:15 CST 2023
MKFS_OPTIONS  -- -f /dev/pmem1
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1 
/mnt/scratch

xfs/730       - output mismatch (see 
/root/xts/results//default/xfs/730.out.bad)
mv: failed to preserve ownership for 
'/root/xts/results//default/xfs/730.out.bad': Operation not permitted
     --- tests/xfs/730.out	2023-03-16 09:42:15.256141472 +0800
     +++ /root/xts/results//default/xfs/730.out.bad	2023-06-08 
20:43:27.436083265 +0800
     @@ -1,4 +1,14 @@
      QA output created by 730
      Format and populate
      Fuzz fscounters
     +icount = zeroes: online scrub didn't fail.
     +icount = ones: online scrub didn't fail.
     +icount = firstbit: online scrub didn't fail.
     +icount = middlebit: online scrub didn't fail.
     ...
     (Run 'diff -u /root/xts/tests/xfs/730.out 
/root/xts/results//default/xfs/730.out.bad'  to see the entire diff)


This test case is to fuzz metadata and make sure xfs_scrub can repair 
the fs. After a little investigation, I think the fuzz actually done to 
the metadata area but the xfs_scrub seems didn't notice that.  I haven't 
found the root cause of the problem yet.  Do you have the same message 
which causes fail on this case?


--
Thanks,
Ruan.

在 2023/5/26 8:38, Darrick J. Wong 写道:
> Hi all,
> 
> Now that we've landed the new kernel code, it's time to reorganize the
> xfs_scrub code that handles repairs.  Clean up various naming warts and
> misleading error messages.  Move the repair code to scrub/repair.c as
> the first step.  Then, fix various issues in the repair code before we
> start reorganizing things.
> 
> If you're going to start using this mess, you probably ought to just
> pull from my git trees, which are linked below.
> 
> This is an extraordinary way to destroy everything.  Enjoy!
> Comments and questions are, as always, welcome.
> 
> --D
> 
> xfsprogs git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes
> ---
>   scrub/phase1.c        |    2
>   scrub/phase2.c        |    3 -
>   scrub/phase3.c        |    2
>   scrub/phase4.c        |   22 ++++-
>   scrub/phase5.c        |    2
>   scrub/phase6.c        |   13 +++
>   scrub/phase7.c        |    2
>   scrub/repair.c        |  177 ++++++++++++++++++++++++++++++++++++++++++-
>   scrub/repair.h        |   16 +++-
>   scrub/scrub.c         |  204 +------------------------------------------------
>   scrub/scrub.h         |   16 ----
>   scrub/scrub_private.h |   55 +++++++++++++
>   scrub/xfs_scrub.c     |    2
>   13 files changed, 283 insertions(+), 233 deletions(-)
>   create mode 100644 scrub/scrub_private.h
>
Darrick J. Wong June 8, 2023, 2:56 p.m. UTC | #2
On Thu, Jun 08, 2023 at 09:22:02PM +0800, Shiyang Ruan wrote:
> Hi Darrick,
> 
> I'm running test on this patchset (patched kernel and xfsprogs, latest
> xfstests:v2023.05.28).  Now I found xfs/730 failed with message "online
> scrub didn't fail".  The detail is:
> 
> FSTYP         -- xfs (debug)
> PLATFORM      -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP PREEMPT_DYNAMIC
> Wed Jun  7 15:44:15 CST 2023
> MKFS_OPTIONS  -- -f /dev/pmem1
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1
> /mnt/scratch
> 
> xfs/730       - output mismatch (see
> /root/xts/results//default/xfs/730.out.bad)
> mv: failed to preserve ownership for
> '/root/xts/results//default/xfs/730.out.bad': Operation not permitted
>     --- tests/xfs/730.out	2023-03-16 09:42:15.256141472 +0800
>     +++ /root/xts/results//default/xfs/730.out.bad	2023-06-08

Whoah, someone besides me actually runs the repair fuzzers??

Yay!

> 20:43:27.436083265 +0800
>     @@ -1,4 +1,14 @@
>      QA output created by 730
>      Format and populate
>      Fuzz fscounters
>     +icount = zeroes: online scrub didn't fail.
>     +icount = ones: online scrub didn't fail.
>     +icount = firstbit: online scrub didn't fail.
>     +icount = middlebit: online scrub didn't fail.
>     ...
>     (Run 'diff -u /root/xts/tests/xfs/730.out
> /root/xts/results//default/xfs/730.out.bad'  to see the entire diff)
> 
> 
> This test case is to fuzz metadata and make sure xfs_scrub can repair the
> fs. After a little investigation, I think the fuzz actually done to the
> metadata area but the xfs_scrub seems didn't notice that.  I haven't found
> the root cause of the problem yet.  Do you have the same message which
> causes fail on this case?

Yeah, we recently disabled some code in fscounters.c to fix some other
correctness problems in the inodegc code.  My goal was to get this
series:

https://lore.kernel.org/linux-xfs/168506061483.3732954.5178462816967376906.stgit@frogsfrogsfrogs/

merged for 6.5 and then the whole thing would work *completely*
correctly, but it might be too late now.

--D

> 
> --
> Thanks,
> Ruan.
> 
> 在 2023/5/26 8:38, Darrick J. Wong 写道:
> > Hi all,
> > 
> > Now that we've landed the new kernel code, it's time to reorganize the
> > xfs_scrub code that handles repairs.  Clean up various naming warts and
> > misleading error messages.  Move the repair code to scrub/repair.c as
> > the first step.  Then, fix various issues in the repair code before we
> > start reorganizing things.
> > 
> > If you're going to start using this mess, you probably ought to just
> > pull from my git trees, which are linked below.
> > 
> > This is an extraordinary way to destroy everything.  Enjoy!
> > Comments and questions are, as always, welcome.
> > 
> > --D
> > 
> > xfsprogs git tree:
> > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes
> > ---
> >   scrub/phase1.c        |    2
> >   scrub/phase2.c        |    3 -
> >   scrub/phase3.c        |    2
> >   scrub/phase4.c        |   22 ++++-
> >   scrub/phase5.c        |    2
> >   scrub/phase6.c        |   13 +++
> >   scrub/phase7.c        |    2
> >   scrub/repair.c        |  177 ++++++++++++++++++++++++++++++++++++++++++-
> >   scrub/repair.h        |   16 +++-
> >   scrub/scrub.c         |  204 +------------------------------------------------
> >   scrub/scrub.h         |   16 ----
> >   scrub/scrub_private.h |   55 +++++++++++++
> >   scrub/xfs_scrub.c     |    2
> >   13 files changed, 283 insertions(+), 233 deletions(-)
> >   create mode 100644 scrub/scrub_private.h
> >
Shiyang Ruan June 9, 2023, 3:22 a.m. UTC | #3
在 2023/6/8 22:56, Darrick J. Wong 写道:
> On Thu, Jun 08, 2023 at 09:22:02PM +0800, Shiyang Ruan wrote:
>> Hi Darrick,
>>
>> I'm running test on this patchset (patched kernel and xfsprogs, latest
>> xfstests:v2023.05.28).  Now I found xfs/730 failed with message "online
>> scrub didn't fail".  The detail is:
>>
>> FSTYP         -- xfs (debug)
>> PLATFORM      -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP PREEMPT_DYNAMIC
>> Wed Jun  7 15:44:15 CST 2023
>> MKFS_OPTIONS  -- -f /dev/pmem1
>> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1
>> /mnt/scratch
>>
>> xfs/730       - output mismatch (see
>> /root/xts/results//default/xfs/730.out.bad)
>> mv: failed to preserve ownership for
>> '/root/xts/results//default/xfs/730.out.bad': Operation not permitted
>>      --- tests/xfs/730.out	2023-03-16 09:42:15.256141472 +0800
>>      +++ /root/xts/results//default/xfs/730.out.bad	2023-06-08
> 
> Whoah, someone besides me actually runs the repair fuzzers??
> 
> Yay!
> 
>> 20:43:27.436083265 +0800
>>      @@ -1,4 +1,14 @@
>>       QA output created by 730
>>       Format and populate
>>       Fuzz fscounters
>>      +icount = zeroes: online scrub didn't fail.
>>      +icount = ones: online scrub didn't fail.
>>      +icount = firstbit: online scrub didn't fail.
>>      +icount = middlebit: online scrub didn't fail.
>>      ...
>>      (Run 'diff -u /root/xts/tests/xfs/730.out
>> /root/xts/results//default/xfs/730.out.bad'  to see the entire diff)
>>
>>
>> This test case is to fuzz metadata and make sure xfs_scrub can repair the
>> fs. After a little investigation, I think the fuzz actually done to the
>> metadata area but the xfs_scrub seems didn't notice that.  I haven't found
>> the root cause of the problem yet.  Do you have the same message which
>> causes fail on this case?
> 
> Yeah, we recently disabled some code in fscounters.c to fix some other
> correctness problems in the inodegc code.

Do you mean this failure is what you expected?

> My goal was to get this
> series:
> 
> https://lore.kernel.org/linux-xfs/168506061483.3732954.5178462816967376906.stgit@frogsfrogsfrogs/
> 
> merged for 6.5 and then the whole thing would work *completely*
> correctly, but it might be too late now.

But actually I'm running test on your repair-fscounters branch[1], which 
seems to be the same thing as the patchsets you just show to me.  And 
the failure of xfs/730 happened.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters


--
Thanks,
Ruan.

> 
> --D
> 
>>
>> --
>> Thanks,
>> Ruan.
>>
>> 在 2023/5/26 8:38, Darrick J. Wong 写道:
>>> Hi all,
>>>
>>> Now that we've landed the new kernel code, it's time to reorganize the
>>> xfs_scrub code that handles repairs.  Clean up various naming warts and
>>> misleading error messages.  Move the repair code to scrub/repair.c as
>>> the first step.  Then, fix various issues in the repair code before we
>>> start reorganizing things.
>>>
>>> If you're going to start using this mess, you probably ought to just
>>> pull from my git trees, which are linked below.
>>>
>>> This is an extraordinary way to destroy everything.  Enjoy!
>>> Comments and questions are, as always, welcome.
>>>
>>> --D
>>>
>>> xfsprogs git tree:
>>> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes
>>> ---
>>>    scrub/phase1.c        |    2
>>>    scrub/phase2.c        |    3 -
>>>    scrub/phase3.c        |    2
>>>    scrub/phase4.c        |   22 ++++-
>>>    scrub/phase5.c        |    2
>>>    scrub/phase6.c        |   13 +++
>>>    scrub/phase7.c        |    2
>>>    scrub/repair.c        |  177 ++++++++++++++++++++++++++++++++++++++++++-
>>>    scrub/repair.h        |   16 +++-
>>>    scrub/scrub.c         |  204 +------------------------------------------------
>>>    scrub/scrub.h         |   16 ----
>>>    scrub/scrub_private.h |   55 +++++++++++++
>>>    scrub/xfs_scrub.c     |    2
>>>    13 files changed, 283 insertions(+), 233 deletions(-)
>>>    create mode 100644 scrub/scrub_private.h
>>>
Darrick J. Wong July 13, 2023, 5:12 a.m. UTC | #4
On Fri, Jun 09, 2023 at 11:22:30AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/6/8 22:56, Darrick J. Wong 写道:
> > On Thu, Jun 08, 2023 at 09:22:02PM +0800, Shiyang Ruan wrote:
> > > Hi Darrick,
> > > 
> > > I'm running test on this patchset (patched kernel and xfsprogs, latest
> > > xfstests:v2023.05.28).  Now I found xfs/730 failed with message "online
> > > scrub didn't fail".  The detail is:
> > > 
> > > FSTYP         -- xfs (debug)
> > > PLATFORM      -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP PREEMPT_DYNAMIC
> > > Wed Jun  7 15:44:15 CST 2023
> > > MKFS_OPTIONS  -- -f /dev/pmem1
> > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1
> > > /mnt/scratch
> > > 
> > > xfs/730       - output mismatch (see
> > > /root/xts/results//default/xfs/730.out.bad)
> > > mv: failed to preserve ownership for
> > > '/root/xts/results//default/xfs/730.out.bad': Operation not permitted
> > >      --- tests/xfs/730.out	2023-03-16 09:42:15.256141472 +0800
> > >      +++ /root/xts/results//default/xfs/730.out.bad	2023-06-08
> > 
> > Whoah, someone besides me actually runs the repair fuzzers??
> > 
> > Yay!
> > 
> > > 20:43:27.436083265 +0800
> > >      @@ -1,4 +1,14 @@
> > >       QA output created by 730
> > >       Format and populate
> > >       Fuzz fscounters
> > >      +icount = zeroes: online scrub didn't fail.
> > >      +icount = ones: online scrub didn't fail.
> > >      +icount = firstbit: online scrub didn't fail.
> > >      +icount = middlebit: online scrub didn't fail.
> > >      ...
> > >      (Run 'diff -u /root/xts/tests/xfs/730.out
> > > /root/xts/results//default/xfs/730.out.bad'  to see the entire diff)
> > > 
> > > 
> > > This test case is to fuzz metadata and make sure xfs_scrub can repair the
> > > fs. After a little investigation, I think the fuzz actually done to the
> > > metadata area but the xfs_scrub seems didn't notice that.  I haven't found
> > > the root cause of the problem yet.  Do you have the same message which
> > > causes fail on this case?
> > 
> > Yeah, we recently disabled some code in fscounters.c to fix some other
> > correctness problems in the inodegc code.
> 
> Do you mean this failure is what you expected?
> 
> > My goal was to get this
> > series:
> > 
> > https://lore.kernel.org/linux-xfs/168506061483.3732954.5178462816967376906.stgit@frogsfrogsfrogs/
> > 
> > merged for 6.5 and then the whole thing would work *completely*
> > correctly, but it might be too late now.
> 
> But actually I'm running test on your repair-fscounters branch[1], which
> seems to be the same thing as the patchsets you just show to me.  And the
> failure of xfs/730 happened.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters

Ahhh, ok.  #repair-fscounters shouldn't be doing that.  I'll take a
look and see if I can figure it out.  Do you have a dmesg from the
xfs/730 run?

--D

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > --D
> > 
> > > 
> > > --
> > > Thanks,
> > > Ruan.
> > > 
> > > 在 2023/5/26 8:38, Darrick J. Wong 写道:
> > > > Hi all,
> > > > 
> > > > Now that we've landed the new kernel code, it's time to reorganize the
> > > > xfs_scrub code that handles repairs.  Clean up various naming warts and
> > > > misleading error messages.  Move the repair code to scrub/repair.c as
> > > > the first step.  Then, fix various issues in the repair code before we
> > > > start reorganizing things.
> > > > 
> > > > If you're going to start using this mess, you probably ought to just
> > > > pull from my git trees, which are linked below.
> > > > 
> > > > This is an extraordinary way to destroy everything.  Enjoy!
> > > > Comments and questions are, as always, welcome.
> > > > 
> > > > --D
> > > > 
> > > > xfsprogs git tree:
> > > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes
> > > > ---
> > > >    scrub/phase1.c        |    2
> > > >    scrub/phase2.c        |    3 -
> > > >    scrub/phase3.c        |    2
> > > >    scrub/phase4.c        |   22 ++++-
> > > >    scrub/phase5.c        |    2
> > > >    scrub/phase6.c        |   13 +++
> > > >    scrub/phase7.c        |    2
> > > >    scrub/repair.c        |  177 ++++++++++++++++++++++++++++++++++++++++++-
> > > >    scrub/repair.h        |   16 +++-
> > > >    scrub/scrub.c         |  204 +------------------------------------------------
> > > >    scrub/scrub.h         |   16 ----
> > > >    scrub/scrub_private.h |   55 +++++++++++++
> > > >    scrub/xfs_scrub.c     |    2
> > > >    13 files changed, 283 insertions(+), 233 deletions(-)
> > > >    create mode 100644 scrub/scrub_private.h
> > > >