mbox series

[0/2] fstests/xfs: a couple growfs log recovery tests

Message ID 20241017163405.173062-1-bfoster@redhat.com (mailing list archive)
Headers show
Series fstests/xfs: a couple growfs log recovery tests | expand

Message

Brian Foster Oct. 17, 2024, 4:34 p.m. UTC
Hi all,

This is first pass of a growfs crash and log recovery test I cooked up
for XFS. A bit more background and context on this is available here
[1]. In short, this reproduces at least a couple log recovery issues on
XFS related to growfs that Christoph has tracked down and resolved.
Darrick proposed a simple realtime variant in the discussion at [1], so
patch 2 is a stab at that. It's basically just a copy of patch 1 with
some rt related tweaks. However..

Darrick,

I believe you reproduced a problem with your customized realtime variant
of the initial test. I've not been able to reproduce any test failures
with patch 2 here, though I have tried to streamline the test a bit to
reduce unnecessary bits (patch 1 still reproduces the original
problems). I also don't tend to test much with rt, so it's possible my
config is off somehow or another. Otherwise I _think_ I've included the
necessary changes for rt support in the test itself.

Thoughts? I'd like to figure out what might be going on there before
this should land..

Brian

[1] https://lore.kernel.org/fstests/20240910043127.3480554-1-hch@lst.de/

Brian Foster (2):
  xfs: online grow vs. log recovery stress test
  xfs: online grow vs. log recovery stress test (realtime version)

 tests/xfs/609     | 69 +++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/609.out |  7 +++++
 tests/xfs/610     | 71 +++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/610.out |  7 +++++
 4 files changed, 154 insertions(+)
 create mode 100755 tests/xfs/609
 create mode 100644 tests/xfs/609.out
 create mode 100755 tests/xfs/610
 create mode 100644 tests/xfs/610.out

Comments

Christoph Hellwig Oct. 18, 2024, 5:09 a.m. UTC | #1
On Thu, Oct 17, 2024 at 12:34:03PM -0400, Brian Foster wrote:
> I believe you reproduced a problem with your customized realtime variant
> of the initial test. I've not been able to reproduce any test failures
> with patch 2 here, though I have tried to streamline the test a bit to
> reduce unnecessary bits (patch 1 still reproduces the original
> problems). I also don't tend to test much with rt, so it's possible my
> config is off somehow or another. Otherwise I _think_ I've included the
> necessary changes for rt support in the test itself.
> 
> Thoughts? I'd like to figure out what might be going on there before
> this should land..

Darrick mentioned that was just with his rt group patchset, which
make sense as we don't have per-group metadata without that.

Anyway, the series looks good to me, and I think it supersedes my
more targeted hand crafted reproducer.
Brian Foster Oct. 18, 2024, 11:29 a.m. UTC | #2
On Fri, Oct 18, 2024 at 07:09:09AM +0200, Christoph Hellwig wrote:
> On Thu, Oct 17, 2024 at 12:34:03PM -0400, Brian Foster wrote:
> > I believe you reproduced a problem with your customized realtime variant
> > of the initial test. I've not been able to reproduce any test failures
> > with patch 2 here, though I have tried to streamline the test a bit to
> > reduce unnecessary bits (patch 1 still reproduces the original
> > problems). I also don't tend to test much with rt, so it's possible my
> > config is off somehow or another. Otherwise I _think_ I've included the
> > necessary changes for rt support in the test itself.
> > 
> > Thoughts? I'd like to figure out what might be going on there before
> > this should land..
> 
> Darrick mentioned that was just with his rt group patchset, which
> make sense as we don't have per-group metadata without that.
> 

Ah, that would explain it then.

> Anyway, the series looks good to me, and I think it supersedes my
> more targeted hand crafted reproducer.
> 

Ok, thanks. It would be nice if anybody who knows more about the rt
group stuff could give the rt test a quick whirl and just confirm it's
at least still effective in that known broken case after my tweaks.
Otherwise I'll wait on any feedback on the code/test itself... thanks.

Brian
Darrick J. Wong Oct. 18, 2024, 9:39 p.m. UTC | #3
On Fri, Oct 18, 2024 at 07:29:22AM -0400, Brian Foster wrote:
> On Fri, Oct 18, 2024 at 07:09:09AM +0200, Christoph Hellwig wrote:
> > On Thu, Oct 17, 2024 at 12:34:03PM -0400, Brian Foster wrote:
> > > I believe you reproduced a problem with your customized realtime variant
> > > of the initial test. I've not been able to reproduce any test failures
> > > with patch 2 here, though I have tried to streamline the test a bit to
> > > reduce unnecessary bits (patch 1 still reproduces the original
> > > problems). I also don't tend to test much with rt, so it's possible my
> > > config is off somehow or another. Otherwise I _think_ I've included the
> > > necessary changes for rt support in the test itself.
> > > 
> > > Thoughts? I'd like to figure out what might be going on there before
> > > this should land..
> > 
> > Darrick mentioned that was just with his rt group patchset, which
> > make sense as we don't have per-group metadata without that.
> > 
> 
> Ah, that would explain it then.

Yep.

> > Anyway, the series looks good to me, and I think it supersedes my
> > more targeted hand crafted reproducer.
> > 
> 
> Ok, thanks. It would be nice if anybody who knows more about the rt
> group stuff could give the rt test a quick whirl and just confirm it's
> at least still effective in that known broken case after my tweaks.
> Otherwise I'll wait on any feedback on the code/test itself... thanks.

Will do, now that I'm out of the mountains. :)

The tests look fine to me, but I guess we could wait to see what falls
out when I add bfoster's tests.

--D

> Brian
> 
>
Darrick J. Wong Oct. 21, 2024, 4:41 p.m. UTC | #4
On Fri, Oct 18, 2024 at 07:29:22AM -0400, Brian Foster wrote:
> On Fri, Oct 18, 2024 at 07:09:09AM +0200, Christoph Hellwig wrote:
> > On Thu, Oct 17, 2024 at 12:34:03PM -0400, Brian Foster wrote:
> > > I believe you reproduced a problem with your customized realtime variant
> > > of the initial test. I've not been able to reproduce any test failures
> > > with patch 2 here, though I have tried to streamline the test a bit to
> > > reduce unnecessary bits (patch 1 still reproduces the original
> > > problems). I also don't tend to test much with rt, so it's possible my
> > > config is off somehow or another. Otherwise I _think_ I've included the
> > > necessary changes for rt support in the test itself.
> > > 
> > > Thoughts? I'd like to figure out what might be going on there before
> > > this should land..
> > 
> > Darrick mentioned that was just with his rt group patchset, which
> > make sense as we don't have per-group metadata without that.
> > 
> 
> Ah, that would explain it then.
> 
> > Anyway, the series looks good to me, and I think it supersedes my
> > more targeted hand crafted reproducer.
> > 
> 
> Ok, thanks. It would be nice if anybody who knows more about the rt
> group stuff could give the rt test a quick whirl and just confirm it's
> at least still effective in that known broken case after my tweaks.
> Otherwise I'll wait on any feedback on the code/test itself... thanks.

Perplexingly, I tried this out on the test fleet last night and got zero
failures except for torvalds TOT.

Oh, I don't have any recoveryloop VMs that also have rt enabled, maybe
that's why 610 didn't pop anywhere.

--D

> Brian
>
Christoph Hellwig Oct. 22, 2024, 5:52 a.m. UTC | #5
On Mon, Oct 21, 2024 at 09:41:50AM -0700, Darrick J. Wong wrote:
> Perplexingly, I tried this out on the test fleet last night and got zero
> failures except for torvalds TOT.
> 
> Oh, I don't have any recoveryloop VMs that also have rt enabled, maybe
> that's why 610 didn't pop anywhere.

Note that your trees already contain the fixes for AGs and RTGs, so
they are not expected to fail.  To Linus' tree fail is expected for
AGs, and we'd need an older version of your rtgroup branch to fail
for RTGs.

As far as I can tell the result is expected.