diff mbox series

generic/335: explicitly fsync file foo when running on btrfs

Message ID 4e00357fe3aaa843bd8fd02218b97104e5fd74bf.1639060960.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series generic/335: explicitly fsync file foo when running on btrfs | expand

Commit Message

Filipe Manana Dec. 9, 2021, 2:44 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

The test is relying on the fact that an fsync on directory "a" will
result in persisting the changes of its subdirectory "b", namely the
rename of "/a/b/foo" to "/c/foo". For this particular filesystem layout,
that will happen on btrfs, because all the directory entries end up in
the same metadata leaf. However that is not a behaviour we can always
guarantee on btrfs. For example, if we add more files to directory
"a" before and after creating subdirectory "b", like this:

  mkdir $SCRATCH_MNT/a
  for ((i = 0; i < 1000; i++)); do
      echo -n > $SCRATCH_MNT/a/file_$i
  done
  mkdir $SCRATCH_MNT/a/b
  for ((i = 1000; i < 2000; i++)); do
      echo -n > $SCRATCH_MNT/a/file_$i
  done
  mkdir $SCRATCH_MNT/c
  touch $SCRATCH_MNT/a/b/foo

  sync

  # The rest of the test remains unchanged...
  (...)

Then after fsyncing only directory "a", the rename of file "foo" from
"/a/b/foo" to "/c/foo" is not persisted.

Guaranteeing that on btrfs would be expensive on large directories, as
it would require scanning for every subdirectory. It's also not required
by posix for the fsync on a directory to persist changes inside its
subdirectories. So add an explicit fsync on file "foo" when the filesystem
being tested is btrfs.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 tests/generic/335 | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Dave Chinner Dec. 10, 2021, 12:25 a.m. UTC | #1
On Thu, Dec 09, 2021 at 02:44:06PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The test is relying on the fact that an fsync on directory "a" will
> result in persisting the changes of its subdirectory "b", namely the
> rename of "/a/b/foo" to "/c/foo". For this particular filesystem layout,
> that will happen on btrfs, because all the directory entries end up in
> the same metadata leaf. However that is not a behaviour we can always
> guarantee on btrfs. For example, if we add more files to directory
> "a" before and after creating subdirectory "b", like this:
> 
>   mkdir $SCRATCH_MNT/a
>   for ((i = 0; i < 1000; i++)); do
>       echo -n > $SCRATCH_MNT/a/file_$i
>   done
>   mkdir $SCRATCH_MNT/a/b
>   for ((i = 1000; i < 2000; i++)); do
>       echo -n > $SCRATCH_MNT/a/file_$i
>   done
>   mkdir $SCRATCH_MNT/c
>   touch $SCRATCH_MNT/a/b/foo
> 
>   sync
> 
>   # The rest of the test remains unchanged...
>   (...)
> 
> Then after fsyncing only directory "a", the rename of file "foo" from
> "/a/b/foo" to "/c/foo" is not persisted.
> 
> Guaranteeing that on btrfs would be expensive on large directories, as
> it would require scanning for every subdirectory. It's also not required
> by posix for the fsync on a directory to persist changes inside its
> subdirectories. So add an explicit fsync on file "foo" when the filesystem
> being tested is btrfs.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  tests/generic/335 | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tests/generic/335 b/tests/generic/335
> index e04f7a5f..196ada64 100755
> --- a/tests/generic/335
> +++ b/tests/generic/335
> @@ -51,6 +51,15 @@ mv $SCRATCH_MNT/a/b/foo $SCRATCH_MNT/c/
>  touch $SCRATCH_MNT/a/bar
>  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
>  
> +# btrfs can not guarantee that when we fsync a directory all its subdirectories
> +# created on past transactions are fsynced as well. It may do it sometimes, but
> +# it's not guaranteed, giving such guarantees would be too expensive for large
> +# directories and posix does not require that recursive behaviour. So if we want
> +# the rename of "foo" to be persisted, explicitly fsync "foo".
> +if [ $FSTYP == "btrfs" ]; then
> +	$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/c/foo
> +fi

Doesn't this imply a regression in the behaviour of btrfs? After
all, this test was written explicitly to exercise a bug in btrfs log
recovery:

commit 72da52702a4b5bea1170aed791893487d0748566
Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Feb 19 12:15:17 2016 +1100

    generic: test directory fsync after rename operation

    Test that if we move one file between directories, fsync the parent
    directory of the old directory, power fail and remount the filesystem,
    the file is not lost and it's located at the destination directory.

    This is motivated by a bug found in btrfs, which is fixed by the patch
    (for the linux kernel) titled:

      "Btrfs: fix file loss on log replay after renaming a file and fsync"

    Tested against ext3, ext4, xfs, f2fs and reiserfs.

    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

What's changed that now makes this an invalid test for btrfs log
recovery? Has btrfs loosened it's metadata ordering guarantees in
the past few years?

Cheers,

Dave.
Filipe Manana Dec. 10, 2021, 10:50 a.m. UTC | #2
On Fri, Dec 10, 2021 at 12:32 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Dec 09, 2021 at 02:44:06PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > The test is relying on the fact that an fsync on directory "a" will
> > result in persisting the changes of its subdirectory "b", namely the
> > rename of "/a/b/foo" to "/c/foo". For this particular filesystem layout,
> > that will happen on btrfs, because all the directory entries end up in
> > the same metadata leaf. However that is not a behaviour we can always
> > guarantee on btrfs. For example, if we add more files to directory
> > "a" before and after creating subdirectory "b", like this:
> >
> >   mkdir $SCRATCH_MNT/a
> >   for ((i = 0; i < 1000; i++)); do
> >       echo -n > $SCRATCH_MNT/a/file_$i
> >   done
> >   mkdir $SCRATCH_MNT/a/b
> >   for ((i = 1000; i < 2000; i++)); do
> >       echo -n > $SCRATCH_MNT/a/file_$i
> >   done
> >   mkdir $SCRATCH_MNT/c
> >   touch $SCRATCH_MNT/a/b/foo
> >
> >   sync
> >
> >   # The rest of the test remains unchanged...
> >   (...)
> >
> > Then after fsyncing only directory "a", the rename of file "foo" from
> > "/a/b/foo" to "/c/foo" is not persisted.
> >
> > Guaranteeing that on btrfs would be expensive on large directories, as
> > it would require scanning for every subdirectory. It's also not required
> > by posix for the fsync on a directory to persist changes inside its
> > subdirectories. So add an explicit fsync on file "foo" when the filesystem
> > being tested is btrfs.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >  tests/generic/335 | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/tests/generic/335 b/tests/generic/335
> > index e04f7a5f..196ada64 100755
> > --- a/tests/generic/335
> > +++ b/tests/generic/335
> > @@ -51,6 +51,15 @@ mv $SCRATCH_MNT/a/b/foo $SCRATCH_MNT/c/
> >  touch $SCRATCH_MNT/a/bar
> >  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
> >
> > +# btrfs can not guarantee that when we fsync a directory all its subdirectories
> > +# created on past transactions are fsynced as well. It may do it sometimes, but
> > +# it's not guaranteed, giving such guarantees would be too expensive for large
> > +# directories and posix does not require that recursive behaviour. So if we want
> > +# the rename of "foo" to be persisted, explicitly fsync "foo".
> > +if [ $FSTYP == "btrfs" ]; then
> > +     $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/c/foo
> > +fi
>
> Doesn't this imply a regression in the behaviour of btrfs? After
> all, this test was written explicitly to exercise a bug in btrfs log
> recovery:
>
> commit 72da52702a4b5bea1170aed791893487d0748566
> Author: Filipe Manana <fdmanana@suse.com>
> Date:   Fri Feb 19 12:15:17 2016 +1100
>
>     generic: test directory fsync after rename operation
>
>     Test that if we move one file between directories, fsync the parent
>     directory of the old directory, power fail and remount the filesystem,
>     the file is not lost and it's located at the destination directory.
>
>     This is motivated by a bug found in btrfs, which is fixed by the patch
>     (for the linux kernel) titled:
>
>       "Btrfs: fix file loss on log replay after renaming a file and fsync"
>
>     Tested against ext3, ext4, xfs, f2fs and reiserfs.
>
>     Signed-off-by: Filipe Manana <fdmanana@suse.com>
>     Reviewed-by: Dave Chinner <dchinner@redhat.com>
>     Signed-off-by: Dave Chinner <david@fromorbit.com>
>
> What's changed that now makes this an invalid test for btrfs log
> recovery? Has btrfs loosened it's metadata ordering guarantees in
> the past few years?

No, btrfs has not loosened its metadata ordering guarantees.
The test was written because btrfs had a bug where it would lose file
foo if we fsynced only directory "/a".

I should have made the test check that after log replay, file foo
exists either at "/c/foo" or at "/a/b/foo" - that the file was not
lost.
Perhaps that's a better change than making the test fsync file foo explicitly.

As I mentioned in the changelog, in this particular case the rename is
persisted,
because all the metadata is very small and fits in a single leaf of metadata.

But, for example, adding those 1000 files after creating "/a" and
another 1000 files after creating "/a/b", makes
the test fail because it expects the rename to have been persisted,
that file foo is at "/c/foo" - in that case
the file is still at "/a/b/foo".

Thanks.



>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/tests/generic/335 b/tests/generic/335
index e04f7a5f..196ada64 100755
--- a/tests/generic/335
+++ b/tests/generic/335
@@ -51,6 +51,15 @@  mv $SCRATCH_MNT/a/b/foo $SCRATCH_MNT/c/
 touch $SCRATCH_MNT/a/bar
 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
 
+# btrfs can not guarantee that when we fsync a directory all its subdirectories
+# created on past transactions are fsynced as well. It may do it sometimes, but
+# it's not guaranteed, giving such guarantees would be too expensive for large
+# directories and posix does not require that recursive behaviour. So if we want
+# the rename of "foo" to be persisted, explicitly fsync "foo".
+if [ $FSTYP == "btrfs" ]; then
+	$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/c/foo
+fi
+
 echo "Filesystem content before power failure:"
 ls -R $SCRATCH_MNT/a $SCRATCH_MNT/c | _filter_scratch