diff mbox series

generic/623: fix test for runing on overlayfs

Message ID 20220530112905.79602-1-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series generic/623: fix test for runing on overlayfs | expand

Commit Message

Amir Goldstein May 30, 2022, 11:29 a.m. UTC
For this test to run on overlayfs we open a different file to perform
shutdown+fsync while keeping the writeback target file open.

We should probably perform fsync on the writeback target file, but
the bug is reproduced on xfs and overlayfs+xfs also as is.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Zorro,

I tested that this test passes for both xfs and overlayfs+xfs on v5.18
and tested that both configs fail with the same warning on v5.10.109.

I tried changing fsync to syncfs for the test to be more correct in the
overlayfs case, but then golden output of xfs and overlayfs+xfs differ
and that would need some more output filtering (or disregarding output
completely).

Since this minimal change does the job and does not change test behavior
on xfs on any of the tested kernels, I thought it might be good enough.

Thanks,
Amir.

 tests/generic/623 | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Zorro Lang May 30, 2022, 1:29 p.m. UTC | #1
On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> For this test to run on overlayfs we open a different file to perform
> shutdown+fsync while keeping the writeback target file open.
> 
> We should probably perform fsync on the writeback target file, but
> the bug is reproduced on xfs and overlayfs+xfs also as is.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Zorro,
> 
> I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> and tested that both configs fail with the same warning on v5.10.109.
> 
> I tried changing fsync to syncfs for the test to be more correct in the
> overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> and that would need some more output filtering (or disregarding output
> completely).
> 
> Since this minimal change does the job and does not change test behavior
> on xfs on any of the tested kernels, I thought it might be good enough.
> 
> Thanks,
> Amir.
> 
>  tests/generic/623 | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/generic/623 b/tests/generic/623
> index ea016d91..bb36ad25 100755
> --- a/tests/generic/623
> +++ b/tests/generic/623
> @@ -24,10 +24,13 @@ _scratch_mount
>  # XFS had a regression where it failed to check shutdown status in the fault
>  # path. This produced an iomap warning because writeback failure clears Uptodate
>  # status on the page.
> +# For this test to run on overlayfs we open a different file to perform
> +# shutdown+fsync while keeping the writeback target file open.
>  file=$SCRATCH_MNT/file
>  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
>  ulimit -c 0
> -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> +	-c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \

Did you try to reproduce the original bug which this test case covers?

According to the "man xfs_io":

       open [[ -acdfrstRTPL ] path ]
              Closes the current file, and opens the file specified by path instead. Without any arguments, displays statistics about the current file - see the stat command.

Although I doubt if it always real close the current file, but you open to get
a new file descriptor, later operations will base on new fd. I don't know if
it still has original testing coverage.

I'd like to help this case to support overlay, presuppose original fs
testing isn't changed. Or it's not worth.

Welcome more review points from the author of this case and others.

Thanks,
Zorro

>  	-c "mwrite 0 4k" $file | _filter_xfs_io
>  
>  # success, all done
> -- 
> 2.25.1
>
Amir Goldstein May 30, 2022, 4:17 p.m. UTC | #2
On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@redhat.com> wrote:
>
> On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> > For this test to run on overlayfs we open a different file to perform
> > shutdown+fsync while keeping the writeback target file open.
> >
> > We should probably perform fsync on the writeback target file, but
> > the bug is reproduced on xfs and overlayfs+xfs also as is.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Zorro,
> >
> > I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> > and tested that both configs fail with the same warning on v5.10.109.
> >
> > I tried changing fsync to syncfs for the test to be more correct in the
> > overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> > and that would need some more output filtering (or disregarding output
> > completely).
> >
> > Since this minimal change does the job and does not change test behavior
> > on xfs on any of the tested kernels, I thought it might be good enough.
> >
> > Thanks,
> > Amir.
> >
> >  tests/generic/623 | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/generic/623 b/tests/generic/623
> > index ea016d91..bb36ad25 100755
> > --- a/tests/generic/623
> > +++ b/tests/generic/623
> > @@ -24,10 +24,13 @@ _scratch_mount
> >  # XFS had a regression where it failed to check shutdown status in the fault
> >  # path. This produced an iomap warning because writeback failure clears Uptodate
> >  # status on the page.
> > +# For this test to run on overlayfs we open a different file to perform
> > +# shutdown+fsync while keeping the writeback target file open.
> >  file=$SCRATCH_MNT/file
> >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> >  ulimit -c 0
> > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> > +     -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
>
> Did you try to reproduce the original bug which this test case covers?
>

Yes. As I wrote:
"tested that both configs fail with the same warning on v5.10.109"
Meaning the same bug that the test triggered before my change
in v5.10 is still triggered on xfs in v5.10 and it is triggered on both
xfs and overlayfs+xfs in v5.10 with my change.

> According to the "man xfs_io":
>
>        open [[ -acdfrstRTPL ] path ]
>               Closes the current file, and opens the file specified by path instead.

The documentation is incorrect.
Current file is not closed.

> Although I doubt if it always real close the current file, but you open to get
> a new file descriptor, later operations will base on new fd. I don't know if
> it still has original testing coverage.

fsync on the fs root dir is not the same as fsync on the original file.
mwrite does not change because mwrite is not acted on open fd
it is acted on memory mapping of mmap.

I can either change fd again to first fd before doing fsync
or change fsync to syncfs. Both solutions should be fine,
just a bit more work on the golden output.
I just wonder if we need to do anything at all as the bug is
reproduced and warning triggered anyway.

>
> I'd like to help this case to support overlay, presuppose original fs
> testing isn't changed. Or it's not worth.
>
> Welcome more review points from the author of this case and others.
>

Yes, me as well.

Thanks,
Amir.
Dave Chinner May 30, 2022, 9:58 p.m. UTC | #3
On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote:
> On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@redhat.com> wrote:
> >
> > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> > > For this test to run on overlayfs we open a different file to perform
> > > shutdown+fsync while keeping the writeback target file open.
> > >
> > > We should probably perform fsync on the writeback target file, but
> > > the bug is reproduced on xfs and overlayfs+xfs also as is.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Zorro,
> > >
> > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> > > and tested that both configs fail with the same warning on v5.10.109.
> > >
> > > I tried changing fsync to syncfs for the test to be more correct in the
> > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> > > and that would need some more output filtering (or disregarding output
> > > completely).
> > >
> > > Since this minimal change does the job and does not change test behavior
> > > on xfs on any of the tested kernels, I thought it might be good enough.
> > >
> > > Thanks,
> > > Amir.
> > >
> > >  tests/generic/623 | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tests/generic/623 b/tests/generic/623
> > > index ea016d91..bb36ad25 100755
> > > --- a/tests/generic/623
> > > +++ b/tests/generic/623
> > > @@ -24,10 +24,13 @@ _scratch_mount
> > >  # XFS had a regression where it failed to check shutdown status in the fault
> > >  # path. This produced an iomap warning because writeback failure clears Uptodate
> > >  # status on the page.
> > > +# For this test to run on overlayfs we open a different file to perform
> > > +# shutdown+fsync while keeping the writeback target file open.

To trigger the original bug, the post-shutdown fsync needs to be run
on the original file. That triggers a writeback error writeback
which clears the uptodate state on the mapped page. The mwrite that
follows then trips over the state of the page and attempts IO
operations without first checking shutdown state.

Hence moving the fsync to a different file will break the mechanism
the regression test uses to trigger the original bug.

> > >  file=$SCRATCH_MNT/file
> > >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> > >  ulimit -c 0
> > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> > > +     -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
> >
> > Did you try to reproduce the original bug which this test case covers?
> >
> 
> Yes. As I wrote:
> "tested that both configs fail with the same warning on v5.10.109"
> Meaning the same bug that the test triggered before my change
> in v5.10 is still triggered on xfs in v5.10 and it is triggered on both
> xfs and overlayfs+xfs in v5.10 with my change.

It reproduced on 5.10, but not because of the reasons you are
suggesting.

> 
> > According to the "man xfs_io":
> >
> >        open [[ -acdfrstRTPL ] path ]
> >               Closes the current file, and opens the file specified by path instead.
> 
> The documentation is incorrect.
> Current file is not closed.

It is not closed, but it's also not the target of subsequent file
operations until you use "file 0" to switch back to it....

> > Although I doubt if it always real close the current file, but you open to get
> > a new file descriptor, later operations will base on new fd. I don't know if
> > it still has original testing coverage.
> 
> fsync on the fs root dir is not the same as fsync on the original file.

Yup, and that's what will break the regression test.

But why does it still fail on v5.10.109?

Well, that's because of a quirk of the xfs_io fsync command.  It
doesn't have the CMD_FLAG_ONESHOT flag set on it, so it operates on
*all open files*, not just the current file.

IOWs, the misunderstanding of how the bug is triggered has been
covered up by the misunderstanding of how the xfs_io open file table
and the fsync command interact.

> mwrite does not change because mwrite is not acted on open fd
> it is acted on memory mapping of mmap.
> 
> I can either change fd again to first fd before doing fsync
> or change fsync to syncfs.

Do not change it to syncfs - that executes completely different
writeback and metadata sync code paths with different error
propagation and may well result in very different behaviour from the
underlying filesystem.  Fundamentally, syncfs() and fsync() are not
interchangeable from a regression test POV - you *might* get the
same high level result, but the low level code behaves very
differently...

Cheers,

Dave.
Zorro Lang May 31, 2022, 8:58 a.m. UTC | #4
On Tue, May 31, 2022 at 07:58:48AM +1000, Dave Chinner wrote:
> On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote:
> > On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@redhat.com> wrote:
> > >
> > > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> > > > For this test to run on overlayfs we open a different file to perform
> > > > shutdown+fsync while keeping the writeback target file open.
> > > >
> > > > We should probably perform fsync on the writeback target file, but
> > > > the bug is reproduced on xfs and overlayfs+xfs also as is.
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >
> > > > Zorro,
> > > >
> > > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> > > > and tested that both configs fail with the same warning on v5.10.109.
> > > >
> > > > I tried changing fsync to syncfs for the test to be more correct in the
> > > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> > > > and that would need some more output filtering (or disregarding output
> > > > completely).
> > > >
> > > > Since this minimal change does the job and does not change test behavior
> > > > on xfs on any of the tested kernels, I thought it might be good enough.
> > > >
> > > > Thanks,
> > > > Amir.
> > > >
> > > >  tests/generic/623 | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tests/generic/623 b/tests/generic/623
> > > > index ea016d91..bb36ad25 100755
> > > > --- a/tests/generic/623
> > > > +++ b/tests/generic/623
> > > > @@ -24,10 +24,13 @@ _scratch_mount
> > > >  # XFS had a regression where it failed to check shutdown status in the fault
> > > >  # path. This produced an iomap warning because writeback failure clears Uptodate
> > > >  # status on the page.
> > > > +# For this test to run on overlayfs we open a different file to perform
> > > > +# shutdown+fsync while keeping the writeback target file open.
> 
> To trigger the original bug, the post-shutdown fsync needs to be run
> on the original file. That triggers a writeback error writeback
> which clears the uptodate state on the mapped page. The mwrite that
> follows then trips over the state of the page and attempts IO
> operations without first checking shutdown state.
> 
> Hence moving the fsync to a different file will break the mechanism
> the regression test uses to trigger the original bug.
> 
> > > >  file=$SCRATCH_MNT/file
> > > >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> > > >  ulimit -c 0
> > > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> > > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> > > > +     -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
> > >
> > > Did you try to reproduce the original bug which this test case covers?
> > >
> > 
> > Yes. As I wrote:
> > "tested that both configs fail with the same warning on v5.10.109"
> > Meaning the same bug that the test triggered before my change
> > in v5.10 is still triggered on xfs in v5.10 and it is triggered on both
> > xfs and overlayfs+xfs in v5.10 with my change.
> 
> It reproduced on 5.10, but not because of the reasons you are
> suggesting.
> 
> > 
> > > According to the "man xfs_io":
> > >
> > >        open [[ -acdfrstRTPL ] path ]
> > >               Closes the current file, and opens the file specified by path instead.
> > 
> > The documentation is incorrect.
> > Current file is not closed.
> 
> It is not closed, but it's also not the target of subsequent file
> operations until you use "file 0" to switch back to it...

I checked the xfsprogs/io/open.c, it truely doesn't "Close the current file",
(need to update the man-page?) and it looks like make all opened files as a
list, then operate them one by one(?):

  execve("/usr/sbin/xfs_io", ["xfs_io", "-c", "mmap 0 4k", "-c", "mwrite 0 4k", "-c", "open testfile", "-c", "fsync", "-c", "mwrite 0 4k", "testfile"], 0x7ffdc2285b78 /* 42 vars */) = 0
  ...
  openat(AT_FDCWD, "testfile", O_RDWR)    = 3
  ...
  mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 3, 0) = 0x7efc04548000
  ...
  openat(AT_FDCWD, "testfile", O_RDWR)    = 4
  ...
  fsync(3)                                = 0
  fsync(4)                                = 0
  exit_group(0)                           = ?
  +++ exited with 0 +++

Anyway, looks like we're stuck at here. So I have 3 crude methods which is just
out of my mind, hope to get review:)

1) Skip this test for overlay directly, as this patch does.
2) Add a _require_real_scratch_shutdown() helper, require the $FSTYP supporting
   GOINGDOWN ioctl, don't fallback to the underlying fs. Then let generic/623
   use it to skip overlay and other fs which really doesn't support shutdown.
3) Change xfsprogs/io/shutdown.c, let shutdown_f() accept an extra mountpoint
   path argument, to shutdown a specified mountpoint (?? not sure if it's
   reasonable for xfs_io :). Then let generic/623 require and use this feature.

Thanks,
Zorro

> 
> > > Although I doubt if it always real close the current file, but you open to get
> > > a new file descriptor, later operations will base on new fd. I don't know if
> > > it still has original testing coverage.
> > 
> > fsync on the fs root dir is not the same as fsync on the original file.
> 
> Yup, and that's what will break the regression test.
> 
> But why does it still fail on v5.10.109?
> 
> Well, that's because of a quirk of the xfs_io fsync command.  It
> doesn't have the CMD_FLAG_ONESHOT flag set on it, so it operates on
> *all open files*, not just the current file.
> 
> IOWs, the misunderstanding of how the bug is triggered has been
> covered up by the misunderstanding of how the xfs_io open file table
> and the fsync command interact.
> 
> > mwrite does not change because mwrite is not acted on open fd
> > it is acted on memory mapping of mmap.
> > 
> > I can either change fd again to first fd before doing fsync
> > or change fsync to syncfs.
> 
> Do not change it to syncfs - that executes completely different
> writeback and metadata sync code paths with different error
> propagation and may well result in very different behaviour from the
> underlying filesystem.  Fundamentally, syncfs() and fsync() are not
> interchangeable from a regression test POV - you *might* get the
> same high level result, but the low level code behaves very
> differently...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Amir Goldstein May 31, 2022, 9:42 a.m. UTC | #5
On Tue, May 31, 2022 at 11:59 AM Zorro Lang <zlang@redhat.com> wrote:
>
> On Tue, May 31, 2022 at 07:58:48AM +1000, Dave Chinner wrote:
> > On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote:
> > > On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@redhat.com> wrote:
> > > >
> > > > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> > > > > For this test to run on overlayfs we open a different file to perform
> > > > > shutdown+fsync while keeping the writeback target file open.
> > > > >
> > > > > We should probably perform fsync on the writeback target file, but
> > > > > the bug is reproduced on xfs and overlayfs+xfs also as is.
> > > > >
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > ---
> > > > >
> > > > > Zorro,
> > > > >
> > > > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> > > > > and tested that both configs fail with the same warning on v5.10.109.
> > > > >
> > > > > I tried changing fsync to syncfs for the test to be more correct in the
> > > > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> > > > > and that would need some more output filtering (or disregarding output
> > > > > completely).
> > > > >
> > > > > Since this minimal change does the job and does not change test behavior
> > > > > on xfs on any of the tested kernels, I thought it might be good enough.
> > > > >
> > > > > Thanks,
> > > > > Amir.
> > > > >
> > > > >  tests/generic/623 | 5 ++++-
> > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tests/generic/623 b/tests/generic/623
> > > > > index ea016d91..bb36ad25 100755
> > > > > --- a/tests/generic/623
> > > > > +++ b/tests/generic/623
> > > > > @@ -24,10 +24,13 @@ _scratch_mount
> > > > >  # XFS had a regression where it failed to check shutdown status in the fault
> > > > >  # path. This produced an iomap warning because writeback failure clears Uptodate
> > > > >  # status on the page.
> > > > > +# For this test to run on overlayfs we open a different file to perform
> > > > > +# shutdown+fsync while keeping the writeback target file open.
> >
> > To trigger the original bug, the post-shutdown fsync needs to be run
> > on the original file. That triggers a writeback error writeback
> > which clears the uptodate state on the mapped page. The mwrite that
> > follows then trips over the state of the page and attempts IO
> > operations without first checking shutdown state.
> >
> > Hence moving the fsync to a different file will break the mechanism
> > the regression test uses to trigger the original bug.
> >
> > > > >  file=$SCRATCH_MNT/file
> > > > >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> > > > >  ulimit -c 0
> > > > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> > > > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> > > > > +     -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
> > > >
> > > > Did you try to reproduce the original bug which this test case covers?
> > > >
> > >
> > > Yes. As I wrote:
> > > "tested that both configs fail with the same warning on v5.10.109"
> > > Meaning the same bug that the test triggered before my change
> > > in v5.10 is still triggered on xfs in v5.10 and it is triggered on both
> > > xfs and overlayfs+xfs in v5.10 with my change.
> >
> > It reproduced on 5.10, but not because of the reasons you are
> > suggesting.
> >
> > >
> > > > According to the "man xfs_io":
> > > >
> > > >        open [[ -acdfrstRTPL ] path ]
> > > >               Closes the current file, and opens the file specified by path instead.
> > >
> > > The documentation is incorrect.
> > > Current file is not closed.
> >
> > It is not closed, but it's also not the target of subsequent file
> > operations until you use "file 0" to switch back to it...
>
> I checked the xfsprogs/io/open.c, it truely doesn't "Close the current file",
> (need to update the man-page?) and it looks like make all opened files as a
> list, then operate them one by one(?):
>
>   execve("/usr/sbin/xfs_io", ["xfs_io", "-c", "mmap 0 4k", "-c", "mwrite 0 4k", "-c", "open testfile", "-c", "fsync", "-c", "mwrite 0 4k", "testfile"], 0x7ffdc2285b78 /* 42 vars */) = 0
>   ...
>   openat(AT_FDCWD, "testfile", O_RDWR)    = 3
>   ...
>   mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 3, 0) = 0x7efc04548000
>   ...
>   openat(AT_FDCWD, "testfile", O_RDWR)    = 4
>   ...
>   fsync(3)                                = 0
>   fsync(4)                                = 0
>   exit_group(0)                           = ?
>   +++ exited with 0 +++
>
> Anyway, looks like we're stuck at here.

We are not stuck at all.
It is very simple to fix.
As you can see from the strace above and from Dave explanation,
The test as I posted in v1 already does the right thing, because
it does fsync the original file.
Which explains why the test I posted DOES work as expected on
both xfs and overlayfs and DOES reproduce the bug.

However, if we want to write the test in a way that does NOT assume
that xfs_io -c fsync operates on all the open files, I need to add some
more commands:

$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
        -c "open $(_scratch_shutdown_handle)" -c shutdown \
        -c "file 0" -c fsync \
        -c "mwrite 0 4k" $file | _filter_scratch | _filter_xfs_io

The only problem is this complicates the golden output
which does not produce the same output for overlayfs and xfs
open files, so I need to write another filter for that.

The question is, is it worth it?

Are we ever going to change xfs_io -c fsync to behave
differently and fsync only the current file?

If not, that I can just change the comment and keep the command
and golden output:

 # path. This produced an iomap warning because writeback failure
clears Uptodate
 # status on the page.
 # For this test to run on overlayfs we open a different file to perform
-# shutdown+fsync while keeping the writeback target file open.
+# shutdown while keeping the writeback target file open.
+# xfs_io -c fsync post-shutdown performs fsync also on the writeback
target file,
+# which is critical for triggering the writeback failure.
 file=$SCRATCH_MNT/file
 $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
 $XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
        -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
        -c "mwrite 0 4k" $file | _filter_xfs_io


> So I have 3 crude methods which is just
> out of my mind, hope to get review:)
>
> 1) Skip this test for overlay directly, as this patch does.
> 2) Add a _require_real_scratch_shutdown() helper, require the $FSTYP supporting
>    GOINGDOWN ioctl, don't fallback to the underlying fs. Then let generic/623
>    use it to skip overlay and other fs which really doesn't support shutdown.
> 3) Change xfsprogs/io/shutdown.c, let shutdown_f() accept an extra mountpoint
>    path argument, to shutdown a specified mountpoint (?? not sure if it's
>    reasonable for xfs_io :). Then let generic/623 require and use this feature.
>

All of the above are overkills.
The test fix already works.

Thanks,
Amir.
Zorro Lang June 1, 2022, 6:26 a.m. UTC | #6
On Tue, May 31, 2022 at 12:42:01PM +0300, Amir Goldstein wrote:
> On Tue, May 31, 2022 at 11:59 AM Zorro Lang <zlang@redhat.com> wrote:
> >
> > On Tue, May 31, 2022 at 07:58:48AM +1000, Dave Chinner wrote:
> > > On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote:
> > > > On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@redhat.com> wrote:
> > > > >
> > > > > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> > > > > > For this test to run on overlayfs we open a different file to perform
> > > > > > shutdown+fsync while keeping the writeback target file open.
> > > > > >
> > > > > > We should probably perform fsync on the writeback target file, but
> > > > > > the bug is reproduced on xfs and overlayfs+xfs also as is.
> > > > > >
> > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > > ---
> > > > > >
> > > > > > Zorro,
> > > > > >
> > > > > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> > > > > > and tested that both configs fail with the same warning on v5.10.109.
> > > > > >
> > > > > > I tried changing fsync to syncfs for the test to be more correct in the
> > > > > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> > > > > > and that would need some more output filtering (or disregarding output
> > > > > > completely).
> > > > > >
> > > > > > Since this minimal change does the job and does not change test behavior
> > > > > > on xfs on any of the tested kernels, I thought it might be good enough.
> > > > > >
> > > > > > Thanks,
> > > > > > Amir.
> > > > > >
> > > > > >  tests/generic/623 | 5 ++++-
> > > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/tests/generic/623 b/tests/generic/623
> > > > > > index ea016d91..bb36ad25 100755
> > > > > > --- a/tests/generic/623
> > > > > > +++ b/tests/generic/623
> > > > > > @@ -24,10 +24,13 @@ _scratch_mount
> > > > > >  # XFS had a regression where it failed to check shutdown status in the fault
> > > > > >  # path. This produced an iomap warning because writeback failure clears Uptodate
> > > > > >  # status on the page.
> > > > > > +# For this test to run on overlayfs we open a different file to perform
> > > > > > +# shutdown+fsync while keeping the writeback target file open.
> > >
> > > To trigger the original bug, the post-shutdown fsync needs to be run
> > > on the original file. That triggers a writeback error writeback
> > > which clears the uptodate state on the mapped page. The mwrite that
> > > follows then trips over the state of the page and attempts IO
> > > operations without first checking shutdown state.
> > >
> > > Hence moving the fsync to a different file will break the mechanism
> > > the regression test uses to trigger the original bug.
> > >
> > > > > >  file=$SCRATCH_MNT/file
> > > > > >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> > > > > >  ulimit -c 0
> > > > > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> > > > > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> > > > > > +     -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
> > > > >
> > > > > Did you try to reproduce the original bug which this test case covers?
> > > > >
> > > >
> > > > Yes. As I wrote:
> > > > "tested that both configs fail with the same warning on v5.10.109"
> > > > Meaning the same bug that the test triggered before my change
> > > > in v5.10 is still triggered on xfs in v5.10 and it is triggered on both
> > > > xfs and overlayfs+xfs in v5.10 with my change.
> > >
> > > It reproduced on 5.10, but not because of the reasons you are
> > > suggesting.
> > >
> > > >
> > > > > According to the "man xfs_io":
> > > > >
> > > > >        open [[ -acdfrstRTPL ] path ]
> > > > >               Closes the current file, and opens the file specified by path instead.
> > > >
> > > > The documentation is incorrect.
> > > > Current file is not closed.
> > >
> > > It is not closed, but it's also not the target of subsequent file
> > > operations until you use "file 0" to switch back to it...
> >
> > I checked the xfsprogs/io/open.c, it truely doesn't "Close the current file",
> > (need to update the man-page?) and it looks like make all opened files as a
> > list, then operate them one by one(?):
> >
> >   execve("/usr/sbin/xfs_io", ["xfs_io", "-c", "mmap 0 4k", "-c", "mwrite 0 4k", "-c", "open testfile", "-c", "fsync", "-c", "mwrite 0 4k", "testfile"], 0x7ffdc2285b78 /* 42 vars */) = 0
> >   ...
> >   openat(AT_FDCWD, "testfile", O_RDWR)    = 3
> >   ...
> >   mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 3, 0) = 0x7efc04548000
> >   ...
> >   openat(AT_FDCWD, "testfile", O_RDWR)    = 4
> >   ...
> >   fsync(3)                                = 0
> >   fsync(4)                                = 0
> >   exit_group(0)                           = ?
> >   +++ exited with 0 +++
> >
> > Anyway, looks like we're stuck at here.
> 
> We are not stuck at all.
> It is very simple to fix.
> As you can see from the strace above and from Dave explanation,
> The test as I posted in v1 already does the right thing, because
> it does fsync the original file.
> Which explains why the test I posted DOES work as expected on
> both xfs and overlayfs and DOES reproduce the bug.
> 
> However, if we want to write the test in a way that does NOT assume
> that xfs_io -c fsync operates on all the open files, I need to add some
> more commands:
> 
> $XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
>         -c "open $(_scratch_shutdown_handle)" -c shutdown \
>         -c "file 0" -c fsync \
>         -c "mwrite 0 4k" $file | _filter_scratch | _filter_xfs_io
> 
> The only problem is this complicates the golden output
> which does not produce the same output for overlayfs and xfs
> open files, so I need to write another filter for that.
> 
> The question is, is it worth it?
> 
> Are we ever going to change xfs_io -c fsync to behave
> differently and fsync only the current file?
> 
> If not, that I can just change the comment and keep the command
> and golden output:
> 
>  # path. This produced an iomap warning because writeback failure
> clears Uptodate
>  # status on the page.
>  # For this test to run on overlayfs we open a different file to perform
> -# shutdown+fsync while keeping the writeback target file open.
> +# shutdown while keeping the writeback target file open.
> +# xfs_io -c fsync post-shutdown performs fsync also on the writeback
> target file,
> +# which is critical for triggering the writeback failure.
>  file=$SCRATCH_MNT/file
>  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
>  $XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
>         -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
>         -c "mwrite 0 4k" $file | _filter_xfs_io

OK, if you persist, my personal opinion is: at least don't change the
original test logic of other fs especially when they're not wrong but
the change is a little tricky.

For example (not tested):

  shutdown_cmd="shutdown"
  # shutdown the underlying fs if there is
  shutdown_handle="$(_scratch_shutdown_handle)"
  if [ "$shutdown_handle" != "$SCRATCH_MNT" ];then
          shutdown_cmd="\"open $shutdown_handle\" -c shutdown -c close"
  fi
  $XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c ${shutdown_cmd}
          -c fsync -c "mwrite 0 4k" $file | _filter_xfs_io

Thanks,
Zorro

> 
> 
> > So I have 3 crude methods which is just
> > out of my mind, hope to get review:)
> >
> > 1) Skip this test for overlay directly, as this patch does.
> > 2) Add a _require_real_scratch_shutdown() helper, require the $FSTYP supporting
> >    GOINGDOWN ioctl, don't fallback to the underlying fs. Then let generic/623
> >    use it to skip overlay and other fs which really doesn't support shutdown.
> > 3) Change xfsprogs/io/shutdown.c, let shutdown_f() accept an extra mountpoint
> >    path argument, to shutdown a specified mountpoint (?? not sure if it's
> >    reasonable for xfs_io :). Then let generic/623 require and use this feature.
> >
> 
> All of the above are overkills.
> The test fix already works.
> 
> Thanks,
> Amir.
>
Amir Goldstein June 1, 2022, 6:57 a.m. UTC | #7
On Wed, Jun 1, 2022 at 9:26 AM Zorro Lang <zlang@redhat.com> wrote:
>
> On Tue, May 31, 2022 at 12:42:01PM +0300, Amir Goldstein wrote:
> > On Tue, May 31, 2022 at 11:59 AM Zorro Lang <zlang@redhat.com> wrote:
> > >
> > > On Tue, May 31, 2022 at 07:58:48AM +1000, Dave Chinner wrote:
> > > > On Mon, May 30, 2022 at 07:17:42PM +0300, Amir Goldstein wrote:
> > > > > On Mon, May 30, 2022 at 4:29 PM Zorro Lang <zlang@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, May 30, 2022 at 02:29:05PM +0300, Amir Goldstein wrote:
> > > > > > > For this test to run on overlayfs we open a different file to perform
> > > > > > > shutdown+fsync while keeping the writeback target file open.
> > > > > > >
> > > > > > > We should probably perform fsync on the writeback target file, but
> > > > > > > the bug is reproduced on xfs and overlayfs+xfs also as is.
> > > > > > >
> > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > > > ---
> > > > > > >
> > > > > > > Zorro,
> > > > > > >
> > > > > > > I tested that this test passes for both xfs and overlayfs+xfs on v5.18
> > > > > > > and tested that both configs fail with the same warning on v5.10.109.
> > > > > > >
> > > > > > > I tried changing fsync to syncfs for the test to be more correct in the
> > > > > > > overlayfs case, but then golden output of xfs and overlayfs+xfs differ
> > > > > > > and that would need some more output filtering (or disregarding output
> > > > > > > completely).
> > > > > > >
> > > > > > > Since this minimal change does the job and does not change test behavior
> > > > > > > on xfs on any of the tested kernels, I thought it might be good enough.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Amir.
> > > > > > >
> > > > > > >  tests/generic/623 | 5 ++++-
> > > > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/tests/generic/623 b/tests/generic/623
> > > > > > > index ea016d91..bb36ad25 100755
> > > > > > > --- a/tests/generic/623
> > > > > > > +++ b/tests/generic/623
> > > > > > > @@ -24,10 +24,13 @@ _scratch_mount
> > > > > > >  # XFS had a regression where it failed to check shutdown status in the fault
> > > > > > >  # path. This produced an iomap warning because writeback failure clears Uptodate
> > > > > > >  # status on the page.
> > > > > > > +# For this test to run on overlayfs we open a different file to perform
> > > > > > > +# shutdown+fsync while keeping the writeback target file open.
> > > >
> > > > To trigger the original bug, the post-shutdown fsync needs to be run
> > > > on the original file. That triggers a writeback error writeback
> > > > which clears the uptodate state on the mapped page. The mwrite that
> > > > follows then trips over the state of the page and attempts IO
> > > > operations without first checking shutdown state.
> > > >
> > > > Hence moving the fsync to a different file will break the mechanism
> > > > the regression test uses to trigger the original bug.
> > > >
> > > > > > >  file=$SCRATCH_MNT/file
> > > > > > >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> > > > > > >  ulimit -c 0
> > > > > > > -$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
> > > > > > > +$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> > > > > > > +     -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
> > > > > >
> > > > > > Did you try to reproduce the original bug which this test case covers?
> > > > > >
> > > > >
> > > > > Yes. As I wrote:
> > > > > "tested that both configs fail with the same warning on v5.10.109"
> > > > > Meaning the same bug that the test triggered before my change
> > > > > in v5.10 is still triggered on xfs in v5.10 and it is triggered on both
> > > > > xfs and overlayfs+xfs in v5.10 with my change.
> > > >
> > > > It reproduced on 5.10, but not because of the reasons you are
> > > > suggesting.
> > > >
> > > > >
> > > > > > According to the "man xfs_io":
> > > > > >
> > > > > >        open [[ -acdfrstRTPL ] path ]
> > > > > >               Closes the current file, and opens the file specified by path instead.
> > > > >
> > > > > The documentation is incorrect.
> > > > > Current file is not closed.
> > > >
> > > > It is not closed, but it's also not the target of subsequent file
> > > > operations until you use "file 0" to switch back to it...
> > >
> > > I checked the xfsprogs/io/open.c, it truely doesn't "Close the current file",
> > > (need to update the man-page?) and it looks like make all opened files as a
> > > list, then operate them one by one(?):
> > >
> > >   execve("/usr/sbin/xfs_io", ["xfs_io", "-c", "mmap 0 4k", "-c", "mwrite 0 4k", "-c", "open testfile", "-c", "fsync", "-c", "mwrite 0 4k", "testfile"], 0x7ffdc2285b78 /* 42 vars */) = 0
> > >   ...
> > >   openat(AT_FDCWD, "testfile", O_RDWR)    = 3
> > >   ...
> > >   mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 3, 0) = 0x7efc04548000
> > >   ...
> > >   openat(AT_FDCWD, "testfile", O_RDWR)    = 4
> > >   ...
> > >   fsync(3)                                = 0
> > >   fsync(4)                                = 0
> > >   exit_group(0)                           = ?
> > >   +++ exited with 0 +++
> > >
> > > Anyway, looks like we're stuck at here.
> >
> > We are not stuck at all.
> > It is very simple to fix.
> > As you can see from the strace above and from Dave explanation,
> > The test as I posted in v1 already does the right thing, because
> > it does fsync the original file.
> > Which explains why the test I posted DOES work as expected on
> > both xfs and overlayfs and DOES reproduce the bug.
> >
> > However, if we want to write the test in a way that does NOT assume
> > that xfs_io -c fsync operates on all the open files, I need to add some
> > more commands:
> >
> > $XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> >         -c "open $(_scratch_shutdown_handle)" -c shutdown \
> >         -c "file 0" -c fsync \
> >         -c "mwrite 0 4k" $file | _filter_scratch | _filter_xfs_io
> >
> > The only problem is this complicates the golden output
> > which does not produce the same output for overlayfs and xfs
> > open files, so I need to write another filter for that.
> >
> > The question is, is it worth it?
> >
> > Are we ever going to change xfs_io -c fsync to behave
> > differently and fsync only the current file?
> >
> > If not, that I can just change the comment and keep the command
> > and golden output:
> >
> >  # path. This produced an iomap warning because writeback failure
> > clears Uptodate
> >  # status on the page.
> >  # For this test to run on overlayfs we open a different file to perform
> > -# shutdown+fsync while keeping the writeback target file open.
> > +# shutdown while keeping the writeback target file open.
> > +# xfs_io -c fsync post-shutdown performs fsync also on the writeback
> > target file,
> > +# which is critical for triggering the writeback failure.
> >  file=$SCRATCH_MNT/file
> >  $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
> >  $XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
> >         -c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
> >         -c "mwrite 0 4k" $file | _filter_xfs_io
>
> OK, if you persist, my personal opinion is: at least don't change the
> original test logic of other fs especially when they're not wrong but
> the change is a little tricky.
>
> For example (not tested):
>
>   shutdown_cmd="shutdown"
>   # shutdown the underlying fs if there is
>   shutdown_handle="$(_scratch_shutdown_handle)"
>   if [ "$shutdown_handle" != "$SCRATCH_MNT" ];then
>           shutdown_cmd="\"open $shutdown_handle\" -c shutdown -c close"
>   fi
>   $XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c ${shutdown_cmd}
>           -c fsync -c "mwrite 0 4k" $file | _filter_xfs_io
>

Ok, that works for me, but the extra close changes golder output
for overlayfs:

$ xfs_io -f x -c "open y" -c close
[000] x              (foreign,non-sync,non-direct,read-write)

So either we add more logic to filter that out or we just leave out the close
making the test in overlayfs depends on the fsync-all behavior of xfs_io
and leaving the test logic on xfs unchanged per your proposal.

If you are ok with that, I will test the version that you proposed and re-post.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/tests/generic/623 b/tests/generic/623
index ea016d91..bb36ad25 100755
--- a/tests/generic/623
+++ b/tests/generic/623
@@ -24,10 +24,13 @@  _scratch_mount
 # XFS had a regression where it failed to check shutdown status in the fault
 # path. This produced an iomap warning because writeback failure clears Uptodate
 # status on the page.
+# For this test to run on overlayfs we open a different file to perform
+# shutdown+fsync while keeping the writeback target file open.
 file=$SCRATCH_MNT/file
 $XFS_IO_PROG -fc "pwrite 0 4k" -c fsync $file | _filter_xfs_io
 ulimit -c 0
-$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" -c shutdown -c fsync \
+$XFS_IO_PROG -x -c "mmap 0 4k" -c "mwrite 0 4k" \
+	-c "open $(_scratch_shutdown_handle)" -c shutdown -c fsync \
 	-c "mwrite 0 4k" $file | _filter_xfs_io
 
 # success, all done