Message ID | 168506071314.3742205.8114181660121831202.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
Headers | show |
Series | xfs_scrub: fixes to the repair code | expand |
Hi Darrick, I'm running test on this patchset (patched kernel and xfsprogs, latest xfstests:v2023.05.28). Now I found xfs/730 failed with message "online scrub didn't fail". The detail is: FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP PREEMPT_DYNAMIC Wed Jun 7 15:44:15 CST 2023 MKFS_OPTIONS -- -f /dev/pmem1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1 /mnt/scratch xfs/730 - output mismatch (see /root/xts/results//default/xfs/730.out.bad) mv: failed to preserve ownership for '/root/xts/results//default/xfs/730.out.bad': Operation not permitted --- tests/xfs/730.out 2023-03-16 09:42:15.256141472 +0800 +++ /root/xts/results//default/xfs/730.out.bad 2023-06-08 20:43:27.436083265 +0800 @@ -1,4 +1,14 @@ QA output created by 730 Format and populate Fuzz fscounters +icount = zeroes: online scrub didn't fail. +icount = ones: online scrub didn't fail. +icount = firstbit: online scrub didn't fail. +icount = middlebit: online scrub didn't fail. ... (Run 'diff -u /root/xts/tests/xfs/730.out /root/xts/results//default/xfs/730.out.bad' to see the entire diff) This test case is to fuzz metadata and make sure xfs_scrub can repair the fs. After a little investigation, I think the fuzz actually done to the metadata area but the xfs_scrub seems didn't notice that. I haven't found the root cause of the problem yet. Do you have the same message which causes fail on this case? -- Thanks, Ruan. 在 2023/5/26 8:38, Darrick J. Wong 写道: > Hi all, > > Now that we've landed the new kernel code, it's time to reorganize the > xfs_scrub code that handles repairs. Clean up various naming warts and > misleading error messages. Move the repair code to scrub/repair.c as > the first step. Then, fix various issues in the repair code before we > start reorganizing things. > > If you're going to start using this mess, you probably ought to just > pull from my git trees, which are linked below. > > This is an extraordinary way to destroy everything. Enjoy! > Comments and questions are, as always, welcome. > > --D > > xfsprogs git tree: > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes > --- > scrub/phase1.c | 2 > scrub/phase2.c | 3 - > scrub/phase3.c | 2 > scrub/phase4.c | 22 ++++- > scrub/phase5.c | 2 > scrub/phase6.c | 13 +++ > scrub/phase7.c | 2 > scrub/repair.c | 177 ++++++++++++++++++++++++++++++++++++++++++- > scrub/repair.h | 16 +++- > scrub/scrub.c | 204 +------------------------------------------------ > scrub/scrub.h | 16 ---- > scrub/scrub_private.h | 55 +++++++++++++ > scrub/xfs_scrub.c | 2 > 13 files changed, 283 insertions(+), 233 deletions(-) > create mode 100644 scrub/scrub_private.h >
On Thu, Jun 08, 2023 at 09:22:02PM +0800, Shiyang Ruan wrote: > Hi Darrick, > > I'm running test on this patchset (patched kernel and xfsprogs, latest > xfstests:v2023.05.28). Now I found xfs/730 failed with message "online > scrub didn't fail". The detail is: > > FSTYP -- xfs (debug) > PLATFORM -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP PREEMPT_DYNAMIC > Wed Jun 7 15:44:15 CST 2023 > MKFS_OPTIONS -- -f /dev/pmem1 > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1 > /mnt/scratch > > xfs/730 - output mismatch (see > /root/xts/results//default/xfs/730.out.bad) > mv: failed to preserve ownership for > '/root/xts/results//default/xfs/730.out.bad': Operation not permitted > --- tests/xfs/730.out 2023-03-16 09:42:15.256141472 +0800 > +++ /root/xts/results//default/xfs/730.out.bad 2023-06-08 Whoah, someone besides me actually runs the repair fuzzers?? Yay! > 20:43:27.436083265 +0800 > @@ -1,4 +1,14 @@ > QA output created by 730 > Format and populate > Fuzz fscounters > +icount = zeroes: online scrub didn't fail. > +icount = ones: online scrub didn't fail. > +icount = firstbit: online scrub didn't fail. > +icount = middlebit: online scrub didn't fail. > ... > (Run 'diff -u /root/xts/tests/xfs/730.out > /root/xts/results//default/xfs/730.out.bad' to see the entire diff) > > > This test case is to fuzz metadata and make sure xfs_scrub can repair the > fs. After a little investigation, I think the fuzz actually done to the > metadata area but the xfs_scrub seems didn't notice that. I haven't found > the root cause of the problem yet. Do you have the same message which > causes fail on this case? Yeah, we recently disabled some code in fscounters.c to fix some other correctness problems in the inodegc code. My goal was to get this series: https://lore.kernel.org/linux-xfs/168506061483.3732954.5178462816967376906.stgit@frogsfrogsfrogs/ merged for 6.5 and then the whole thing would work *completely* correctly, but it might be too late now. --D > > -- > Thanks, > Ruan. > > 在 2023/5/26 8:38, Darrick J. Wong 写道: > > Hi all, > > > > Now that we've landed the new kernel code, it's time to reorganize the > > xfs_scrub code that handles repairs. Clean up various naming warts and > > misleading error messages. Move the repair code to scrub/repair.c as > > the first step. Then, fix various issues in the repair code before we > > start reorganizing things. > > > > If you're going to start using this mess, you probably ought to just > > pull from my git trees, which are linked below. > > > > This is an extraordinary way to destroy everything. Enjoy! > > Comments and questions are, as always, welcome. > > > > --D > > > > xfsprogs git tree: > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes > > --- > > scrub/phase1.c | 2 > > scrub/phase2.c | 3 - > > scrub/phase3.c | 2 > > scrub/phase4.c | 22 ++++- > > scrub/phase5.c | 2 > > scrub/phase6.c | 13 +++ > > scrub/phase7.c | 2 > > scrub/repair.c | 177 ++++++++++++++++++++++++++++++++++++++++++- > > scrub/repair.h | 16 +++- > > scrub/scrub.c | 204 +------------------------------------------------ > > scrub/scrub.h | 16 ---- > > scrub/scrub_private.h | 55 +++++++++++++ > > scrub/xfs_scrub.c | 2 > > 13 files changed, 283 insertions(+), 233 deletions(-) > > create mode 100644 scrub/scrub_private.h > >
在 2023/6/8 22:56, Darrick J. Wong 写道: > On Thu, Jun 08, 2023 at 09:22:02PM +0800, Shiyang Ruan wrote: >> Hi Darrick, >> >> I'm running test on this patchset (patched kernel and xfsprogs, latest >> xfstests:v2023.05.28). Now I found xfs/730 failed with message "online >> scrub didn't fail". The detail is: >> >> FSTYP -- xfs (debug) >> PLATFORM -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP PREEMPT_DYNAMIC >> Wed Jun 7 15:44:15 CST 2023 >> MKFS_OPTIONS -- -f /dev/pmem1 >> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1 >> /mnt/scratch >> >> xfs/730 - output mismatch (see >> /root/xts/results//default/xfs/730.out.bad) >> mv: failed to preserve ownership for >> '/root/xts/results//default/xfs/730.out.bad': Operation not permitted >> --- tests/xfs/730.out 2023-03-16 09:42:15.256141472 +0800 >> +++ /root/xts/results//default/xfs/730.out.bad 2023-06-08 > > Whoah, someone besides me actually runs the repair fuzzers?? > > Yay! > >> 20:43:27.436083265 +0800 >> @@ -1,4 +1,14 @@ >> QA output created by 730 >> Format and populate >> Fuzz fscounters >> +icount = zeroes: online scrub didn't fail. >> +icount = ones: online scrub didn't fail. >> +icount = firstbit: online scrub didn't fail. >> +icount = middlebit: online scrub didn't fail. >> ... >> (Run 'diff -u /root/xts/tests/xfs/730.out >> /root/xts/results//default/xfs/730.out.bad' to see the entire diff) >> >> >> This test case is to fuzz metadata and make sure xfs_scrub can repair the >> fs. After a little investigation, I think the fuzz actually done to the >> metadata area but the xfs_scrub seems didn't notice that. I haven't found >> the root cause of the problem yet. Do you have the same message which >> causes fail on this case? > > Yeah, we recently disabled some code in fscounters.c to fix some other > correctness problems in the inodegc code. Do you mean this failure is what you expected? > My goal was to get this > series: > > https://lore.kernel.org/linux-xfs/168506061483.3732954.5178462816967376906.stgit@frogsfrogsfrogs/ > > merged for 6.5 and then the whole thing would work *completely* > correctly, but it might be too late now. But actually I'm running test on your repair-fscounters branch[1], which seems to be the same thing as the patchsets you just show to me. And the failure of xfs/730 happened. [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters -- Thanks, Ruan. > > --D > >> >> -- >> Thanks, >> Ruan. >> >> 在 2023/5/26 8:38, Darrick J. Wong 写道: >>> Hi all, >>> >>> Now that we've landed the new kernel code, it's time to reorganize the >>> xfs_scrub code that handles repairs. Clean up various naming warts and >>> misleading error messages. Move the repair code to scrub/repair.c as >>> the first step. Then, fix various issues in the repair code before we >>> start reorganizing things. >>> >>> If you're going to start using this mess, you probably ought to just >>> pull from my git trees, which are linked below. >>> >>> This is an extraordinary way to destroy everything. Enjoy! >>> Comments and questions are, as always, welcome. >>> >>> --D >>> >>> xfsprogs git tree: >>> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes >>> --- >>> scrub/phase1.c | 2 >>> scrub/phase2.c | 3 - >>> scrub/phase3.c | 2 >>> scrub/phase4.c | 22 ++++- >>> scrub/phase5.c | 2 >>> scrub/phase6.c | 13 +++ >>> scrub/phase7.c | 2 >>> scrub/repair.c | 177 ++++++++++++++++++++++++++++++++++++++++++- >>> scrub/repair.h | 16 +++- >>> scrub/scrub.c | 204 +------------------------------------------------ >>> scrub/scrub.h | 16 ---- >>> scrub/scrub_private.h | 55 +++++++++++++ >>> scrub/xfs_scrub.c | 2 >>> 13 files changed, 283 insertions(+), 233 deletions(-) >>> create mode 100644 scrub/scrub_private.h >>>
On Fri, Jun 09, 2023 at 11:22:30AM +0800, Shiyang Ruan wrote: > > > 在 2023/6/8 22:56, Darrick J. Wong 写道: > > On Thu, Jun 08, 2023 at 09:22:02PM +0800, Shiyang Ruan wrote: > > > Hi Darrick, > > > > > > I'm running test on this patchset (patched kernel and xfsprogs, latest > > > xfstests:v2023.05.28). Now I found xfs/730 failed with message "online > > > scrub didn't fail". The detail is: > > > > > > FSTYP -- xfs (debug) > > > PLATFORM -- Linux/x86_64 f36 6.4.0-rc3-pmem+ #309 SMP PREEMPT_DYNAMIC > > > Wed Jun 7 15:44:15 CST 2023 > > > MKFS_OPTIONS -- -f /dev/pmem1 > > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/pmem1 > > > /mnt/scratch > > > > > > xfs/730 - output mismatch (see > > > /root/xts/results//default/xfs/730.out.bad) > > > mv: failed to preserve ownership for > > > '/root/xts/results//default/xfs/730.out.bad': Operation not permitted > > > --- tests/xfs/730.out 2023-03-16 09:42:15.256141472 +0800 > > > +++ /root/xts/results//default/xfs/730.out.bad 2023-06-08 > > > > Whoah, someone besides me actually runs the repair fuzzers?? > > > > Yay! > > > > > 20:43:27.436083265 +0800 > > > @@ -1,4 +1,14 @@ > > > QA output created by 730 > > > Format and populate > > > Fuzz fscounters > > > +icount = zeroes: online scrub didn't fail. > > > +icount = ones: online scrub didn't fail. > > > +icount = firstbit: online scrub didn't fail. > > > +icount = middlebit: online scrub didn't fail. > > > ... > > > (Run 'diff -u /root/xts/tests/xfs/730.out > > > /root/xts/results//default/xfs/730.out.bad' to see the entire diff) > > > > > > > > > This test case is to fuzz metadata and make sure xfs_scrub can repair the > > > fs. After a little investigation, I think the fuzz actually done to the > > > metadata area but the xfs_scrub seems didn't notice that. I haven't found > > > the root cause of the problem yet. Do you have the same message which > > > causes fail on this case? > > > > Yeah, we recently disabled some code in fscounters.c to fix some other > > correctness problems in the inodegc code. > > Do you mean this failure is what you expected? > > > My goal was to get this > > series: > > > > https://lore.kernel.org/linux-xfs/168506061483.3732954.5178462816967376906.stgit@frogsfrogsfrogs/ > > > > merged for 6.5 and then the whole thing would work *completely* > > correctly, but it might be too late now. > > But actually I'm running test on your repair-fscounters branch[1], which > seems to be the same thing as the patchsets you just show to me. And the > failure of xfs/730 happened. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters Ahhh, ok. #repair-fscounters shouldn't be doing that. I'll take a look and see if I can figure it out. Do you have a dmesg from the xfs/730 run? --D > > -- > Thanks, > Ruan. > > > > > --D > > > > > > > > -- > > > Thanks, > > > Ruan. > > > > > > 在 2023/5/26 8:38, Darrick J. Wong 写道: > > > > Hi all, > > > > > > > > Now that we've landed the new kernel code, it's time to reorganize the > > > > xfs_scrub code that handles repairs. Clean up various naming warts and > > > > misleading error messages. Move the repair code to scrub/repair.c as > > > > the first step. Then, fix various issues in the repair code before we > > > > start reorganizing things. > > > > > > > > If you're going to start using this mess, you probably ought to just > > > > pull from my git trees, which are linked below. > > > > > > > > This is an extraordinary way to destroy everything. Enjoy! > > > > Comments and questions are, as always, welcome. > > > > > > > > --D > > > > > > > > xfsprogs git tree: > > > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-fixes > > > > --- > > > > scrub/phase1.c | 2 > > > > scrub/phase2.c | 3 - > > > > scrub/phase3.c | 2 > > > > scrub/phase4.c | 22 ++++- > > > > scrub/phase5.c | 2 > > > > scrub/phase6.c | 13 +++ > > > > scrub/phase7.c | 2 > > > > scrub/repair.c | 177 ++++++++++++++++++++++++++++++++++++++++++- > > > > scrub/repair.h | 16 +++- > > > > scrub/scrub.c | 204 +------------------------------------------------ > > > > scrub/scrub.h | 16 ---- > > > > scrub/scrub_private.h | 55 +++++++++++++ > > > > scrub/xfs_scrub.c | 2 > > > > 13 files changed, 283 insertions(+), 233 deletions(-) > > > > create mode 100644 scrub/scrub_private.h > > > >