diff mbox

xfstests: generic/342 run failed in f2fs

Message ID e367054d-046d-d6e5-943c-7fbe10493029@sandeen.net (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Sandeen Dec. 25, 2017, 5:56 a.m. UTC
On 12/24/17 9:28 PM, Chen Rong wrote:
> Hi, everyone:
> 
> the issue as below:

First we need to look - what does the test do?

# Test that if we rename a file, create a new file that has the old name of the
# other file and is a child of the same parent directory, fsync the new inode,
# power fail and mount the filesystem, we do not lose the first file and that
# file has the name it was renamed to.

Ok, so it is a test of fsync'd file renames/creation over simulated device
failure, and wants to be sure that all files and file contents are as
expected if data integrity syscalls were made before the device failure.

> # ./check generic/342
> FSTYP         -- f2fs
> PLATFORM      -- Linux/x86_64 node01 4.15.0-rc4
> MKFS_OPTIONS  -- /dev/sde1
> MOUNT_OPTIONS -- -o acl,user_xattr /dev/sde1 /tmp/test1
> 
> generic/342      - output mismatch (see /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad)
>     --- tests/generic/342.out   2017-12-13 13:47:20.144000000 -0500
>     +++ /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad 2017-12-25 11:13:12.928000000 -0500
>     @@ -8,8 +8,7 @@
>      48c940ba3b8671d3d6ea74e4ccad8ca3  SCRATCH_MNT/a/bar
>      Directory a/ contents after log replay:
>      SCRATCH_MNT/a:
>     -bar

the test created foo, fsynced it, then renamed it to bar and created a
/new/ file also named foo, then fsynced the new file foo.

Despite that 2nd fsync, the renamed file "bar" is not present in
your case.

However, don't we really need to fsync the directory after
the rename and recreation to ensure persistence?

iows:  does this patch make the test pass on f2fs?  It does
for me, and I think it's the only valid way to run the test:


without an fsync of the parent dir, I don't think we can
guarantee that filename changes or additions will persist
on every filesystem.

>      foo
>      File digests after log replay:
>      9e5d56a1f9b2c93589f9d55480f971a1  SCRATCH_MNT/a/foo
>     ...
>     (Run 'diff -u tests/generic/342.out /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad'  to see the entire diff)

Generally best to do that suggested diff command to see if 
there are any more pieces of changed/wrong test output.

> Ran: generic/342
> Failures: generic/342
> Failed 1 of 1 tests
> 
> Is it a problem with my computer or a already known issue with f2fs?

or a problem with the test itself ...

-Eric
 
> thanks!
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chen Rong Dec. 25, 2017, 6:30 a.m. UTC | #1
On 2017年12月25日 13:56, Eric Sandeen wrote:
> On 12/24/17 9:28 PM, Chen Rong wrote:
>> Hi, everyone:
>>
>> the issue as below:
> First we need to look - what does the test do?
>
> # Test that if we rename a file, create a new file that has the old name of the
> # other file and is a child of the same parent directory, fsync the new inode,
> # power fail and mount the filesystem, we do not lose the first file and that
> # file has the name it was renamed to.
>
> Ok, so it is a test of fsync'd file renames/creation over simulated device
> failure, and wants to be sure that all files and file contents are as
> expected if data integrity syscalls were made before the device failure.
>
>> # ./check generic/342
>> FSTYP         -- f2fs
>> PLATFORM      -- Linux/x86_64 node01 4.15.0-rc4
>> MKFS_OPTIONS  -- /dev/sde1
>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/sde1 /tmp/test1
>>
>> generic/342      - output mismatch (see /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad)
>>      --- tests/generic/342.out   2017-12-13 13:47:20.144000000 -0500
>>      +++ /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad 2017-12-25 11:13:12.928000000 -0500
>>      @@ -8,8 +8,7 @@
>>       48c940ba3b8671d3d6ea74e4ccad8ca3  SCRATCH_MNT/a/bar
>>       Directory a/ contents after log replay:
>>       SCRATCH_MNT/a:
>>      -bar
> the test created foo, fsynced it, then renamed it to bar and created a
> /new/ file also named foo, then fsynced the new file foo.
>
> Despite that 2nd fsync, the renamed file "bar" is not present in
> your case.
>
> However, don't we really need to fsync the directory after
> the rename and recreation to ensure persistence?
>
> iows:  does this patch make the test pass on f2fs?  It does
> for me, and I think it's the only valid way to run the test:
the patch works for me. but btrfs could pass the test without it, why so 
different?
> diff --git a/tests/generic/342 b/tests/generic/342
> index ed64e05..6a9e5bd 100755
> --- a/tests/generic/342
> +++ b/tests/generic/342
> @@ -68,6 +68,7 @@ sync
>   mv $SCRATCH_MNT/a/foo $SCRATCH_MNT/a/bar
>   $XFS_IO_PROG -f -c "pwrite -S 0xba 0 16K" $SCRATCH_MNT/a/foo | _filter_xfs_io
>   $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a/foo
> +$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
>   
>   echo "File digests before log replay:"
>   md5sum $SCRATCH_MNT/a/foo | _filter_scratch
>
> without an fsync of the parent dir, I don't think we can
> guarantee that filename changes or additions will persist
> on every filesystem.
>
>>       foo
>>       File digests after log replay:
>>       9e5d56a1f9b2c93589f9d55480f971a1  SCRATCH_MNT/a/foo
>>      ...
>>      (Run 'diff -u tests/generic/342.out /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad'  to see the entire diff)
> Generally best to do that suggested diff command to see if
> there are any more pieces of changed/wrong test output.
>
>> Ran: generic/342
>> Failures: generic/342
>> Failed 1 of 1 tests
>>
>> Is it a problem with my computer or a already known issue with f2fs?
> or a problem with the test itself ...
cool!
>
> -Eric
>   
>> thanks!
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 25, 2017, 7:47 a.m. UTC | #2
On 12/24/17 10:30 PM, Chen Rong wrote:
> 
> 
> On 2017年12月25日 13:56, Eric Sandeen wrote:
>> On 12/24/17 9:28 PM, Chen Rong wrote:
>>> Hi, everyone:
>>>
>>> the issue as below:
>> First we need to look - what does the test do?
>>
>> # Test that if we rename a file, create a new file that has the old name of the
>> # other file and is a child of the same parent directory, fsync the new inode,
>> # power fail and mount the filesystem, we do not lose the first file and that
>> # file has the name it was renamed to.
>>
>> Ok, so it is a test of fsync'd file renames/creation over simulated device
>> failure, and wants to be sure that all files and file contents are as
>> expected if data integrity syscalls were made before the device failure.
>>
>>> # ./check generic/342
>>> FSTYP         -- f2fs
>>> PLATFORM      -- Linux/x86_64 node01 4.15.0-rc4
>>> MKFS_OPTIONS  -- /dev/sde1
>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/sde1 /tmp/test1
>>>
>>> generic/342      - output mismatch (see /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad)
>>>      --- tests/generic/342.out   2017-12-13 13:47:20.144000000 -0500
>>>      +++ /home/ubuntu/git/xfstests-dev/results//generic/342.out.bad 2017-12-25 11:13:12.928000000 -0500
>>>      @@ -8,8 +8,7 @@
>>>       48c940ba3b8671d3d6ea74e4ccad8ca3  SCRATCH_MNT/a/bar
>>>       Directory a/ contents after log replay:
>>>       SCRATCH_MNT/a:
>>>      -bar
>> the test created foo, fsynced it, then renamed it to bar and created a
>> /new/ file also named foo, then fsynced the new file foo.
>>
>> Despite that 2nd fsync, the renamed file "bar" is not present in
>> your case.
>>
>> However, don't we really need to fsync the directory after
>> the rename and recreation to ensure persistence?
>>
>> iows:  does this patch make the test pass on f2fs?  It does
>> for me, and I think it's the only valid way to run the test:
> the patch works for me. but btrfs could pass the test without it, why so different?

Filesystems are free to do /more/ than the minimum required by posix -
see ext4_sync_parent for example.  Or xfs_finish_rename, for synchronous
mounts:

         * If this is a synchronous mount, make sure that the rename transaction
         * goes to disk before returning to the user.
         */
        if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
                xfs_trans_set_sync(tp);

so behavior can be fs-dependent, or mount option dependent, etc.

But to be portable, if an app wants directory changes to be persistent
before proceeding, it must fsync the directory after making changes.

I don't know about f2fs's design intent, whether it guarantees more
than posix requires, but to guarantee that this test works on any posix
fs, I think that directory fsync is needed.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Dec. 25, 2017, 4:31 p.m. UTC | #3
On Sun, Dec 24, 2017 at 11:47:20PM -0800, Eric Sandeen wrote:
> Filesystems are free to do /more/ than the minimum required by posix -
> see ext4_sync_parent for example.  Or xfs_finish_rename, for synchronous
> mounts:
> 
>          * If this is a synchronous mount, make sure that the rename transaction
>          * goes to disk before returning to the user.
>          */
>         if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
>                 xfs_trans_set_sync(tp);
> 
> so behavior can be fs-dependent, or mount option dependent, etc.
> 
> But to be portable, if an app wants directory changes to be persistent
> before proceeding, it must fsync the directory after making changes.
> 
> I don't know about f2fs's design intent, whether it guarantees more
> than posix requires, but to guarantee that this test works on any posix
> fs, I think that directory fsync is needed.

Agreed that this is a test bug, and we should add the fsync to the
parent directory.

It might also be a good idea for f2fs to do more, given that fsync is
a slow enough operation that so long as you can make sure the fsync of
the parent directory happens within the same atomic update as the
child inode, you might as well give the more expansive guarantee.  But
obviously that's up to the f2fs developers to decide whether they want
to do that work.

Cheers,

      	     	       	       	    - Ted

--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaegeuk Kim Dec. 27, 2017, 7:11 p.m. UTC | #4
On 12/25, Theodore Ts'o wrote:
> On Sun, Dec 24, 2017 at 11:47:20PM -0800, Eric Sandeen wrote:
> > Filesystems are free to do /more/ than the minimum required by posix -
> > see ext4_sync_parent for example.  Or xfs_finish_rename, for synchronous
> > mounts:
> > 
> >          * If this is a synchronous mount, make sure that the rename transaction
> >          * goes to disk before returning to the user.
> >          */
> >         if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
> >                 xfs_trans_set_sync(tp);
> > 
> > so behavior can be fs-dependent, or mount option dependent, etc.
> > 
> > But to be portable, if an app wants directory changes to be persistent
> > before proceeding, it must fsync the directory after making changes.
> > 
> > I don't know about f2fs's design intent, whether it guarantees more
> > than posix requires, but to guarantee that this test works on any posix
> > fs, I think that directory fsync is needed.
> 
> Agreed that this is a test bug, and we should add the fsync to the
> parent directory.

Agreed too. Or, how about using "-o dirsync"?

> 
> It might also be a good idea for f2fs to do more, given that fsync is
> a slow enough operation that so long as you can make sure the fsync of
> the parent directory happens within the same atomic update as the
> child inode, you might as well give the more expansive guarantee.  But
> obviously that's up to the f2fs developers to decide whether they want
> to do that work.

Indeed. Actually, since one of our goals was to reduce fsync latencies for
Android, we decided to support posix in a minimum way. In order to avoid
complex directory updates recursively, however, we allowed the fsync on
directories to trigger checkpoint requiring many IO operations.

IMHO, as a quick fix, it seems we need to handle MS_DIRSYNC likewse xfs.
#define MS_DIRSYNC      128     /* Directory modifications are synchronous */

Thanks,

> 
> Cheers,
> 
>       	     	       	       	    - Ted
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chao Yu Dec. 28, 2017, 3:17 a.m. UTC | #5
On 2017/12/28 3:11, Jaegeuk Kim wrote:
> On 12/25, Theodore Ts'o wrote:
>> On Sun, Dec 24, 2017 at 11:47:20PM -0800, Eric Sandeen wrote:
>>> Filesystems are free to do /more/ than the minimum required by posix -
>>> see ext4_sync_parent for example.  Or xfs_finish_rename, for synchronous
>>> mounts:
>>>
>>>          * If this is a synchronous mount, make sure that the rename transaction
>>>          * goes to disk before returning to the user.
>>>          */
>>>         if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
>>>                 xfs_trans_set_sync(tp);
>>>
>>> so behavior can be fs-dependent, or mount option dependent, etc.
>>>
>>> But to be portable, if an app wants directory changes to be persistent
>>> before proceeding, it must fsync the directory after making changes.
>>>
>>> I don't know about f2fs's design intent, whether it guarantees more
>>> than posix requires, but to guarantee that this test works on any posix
>>> fs, I think that directory fsync is needed.
>>
>> Agreed that this is a test bug, and we should add the fsync to the
>> parent directory.

Agreeed.

> 
> Agreed too. Or, how about using "-o dirsync"?

IMO, according to comments of generic/342, it wants to test posix sematic
that supported by current fs' fsync(), if we want to test dirsync mount
option, that would be better in another testcase?

> 
>>
>> It might also be a good idea for f2fs to do more, given that fsync is
>> a slow enough operation that so long as you can make sure the fsync of
>> the parent directory happens within the same atomic update as the
>> child inode, you might as well give the more expansive guarantee.  But
>> obviously that's up to the f2fs developers to decide whether they want
>> to do that work.
> 
> Indeed. Actually, since one of our goals was to reduce fsync latencies for
> Android, we decided to support posix in a minimum way. In order to avoid

Agreed, in order to maximize performance of fsync regular file, we'd better
not change current sematic of f2fs' fsync so far, moreover, there is no
such requirement from upper layer application.

> complex directory updates recursively, however, we allowed the fsync on
> directories to trigger checkpoint requiring many IO operations.

Some applications use its own database software layer instead generic
sqlite in android, and they use deletion mode in their database, so
whenever they trigger database transaction, we need to do checkpoint. There
is such scenario that, before checkpoint it comes huge number of node &
meta updating, result in that checkpoint becomes very slow.

Maybe we can consider to fsync checkpointed directory with the way fsyncing
regular file rather than triggering checkpoint to improve its performance?

> 
> IMHO, as a quick fix, it seems we need to handle MS_DIRSYNC likewse xfs.
> #define MS_DIRSYNC      128     /* Directory modifications are synchronous */

We have supported that yet in commit b7e1d800031c ("f2fs: implement -o
dirsync"). ;)

Thanks,

> 
> Thanks,
> 
>>
>> Cheers,
>>
>>       	     	       	       	    - Ted
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Dec. 28, 2017, 3:41 a.m. UTC | #6
On Thu, Dec 28, 2017 at 11:17:09AM +0800, Chao Yu wrote:
> > Indeed. Actually, since one of our goals was to reduce fsync latencies for
> > Android, we decided to support posix in a minimum way. In order to avoid
> 
> Agreed, in order to maximize performance of fsync regular file, we'd better
> not change current sematic of f2fs' fsync so far, moreover, there is no
> such requirement from upper layer application.

The cost of forcing an fsync on the parent directory only if the
regular file was just created should really be minimal.  Most of the
performance critical use cases for fsync() is after writing data on an
already-existing file.

Even in the case where a package manager is writing, say, lots of
newly created files followed by an fsync (such as a 21 megabyte docker
binary) the overhead of including a handful of 4k metadata blocks for
the containing directory to make sure /usr/bin/docker exists as part
of the fsync() is in the noise compared to all of the other data
blocks being written by the package manager.

So I suspect the concerns of "oh, we can't provide the same guarantees
as ext4 and f2fs" are a bit overblown here.  But ultimately, that's up
to the f2fs developers...

> > IMHO, as a quick fix, it seems we need to handle MS_DIRSYNC likewse xfs.
> > #define MS_DIRSYNC      128     /* Directory modifications are synchronous */
> 
> We have supported that yet in commit b7e1d800031c ("f2fs: implement -o
> dirsync"). ;)

To be clear, we're not talking about dirsync mode where *all*
directory modifications are synchronous.  What we're talking about
here is the case where if you fsync() a regular file, and the regular
file is newly created --- AND ONLY THAT CASE --- to make sure the
parent directory is included in the set of metadata blocks updated by
the fsync() against the regular file.

Best regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Dec. 28, 2017, 9:09 a.m. UTC | #7
On Wed, Dec 27, 2017 at 11:11:30AM -0800, Jaegeuk Kim wrote:
> On 12/25, Theodore Ts'o wrote:
> > On Sun, Dec 24, 2017 at 11:47:20PM -0800, Eric Sandeen wrote:
> > > Filesystems are free to do /more/ than the minimum required by posix -
> > > see ext4_sync_parent for example.  Or xfs_finish_rename, for synchronous
> > > mounts:
> > > 
> > >          * If this is a synchronous mount, make sure that the rename transaction
> > >          * goes to disk before returning to the user.
> > >          */
> > >         if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
> > >                 xfs_trans_set_sync(tp);
> > > 
> > > so behavior can be fs-dependent, or mount option dependent, etc.
> > > 
> > > But to be portable, if an app wants directory changes to be persistent
> > > before proceeding, it must fsync the directory after making changes.
> > > 
> > > I don't know about f2fs's design intent, whether it guarantees more
> > > than posix requires, but to guarantee that this test works on any posix
> > > fs, I think that directory fsync is needed.
> > 
> > Agreed that this is a test bug, and we should add the fsync to the
> > parent directory.
> 
> Agreed too. Or, how about using "-o dirsync"?
> 
> > 
> > It might also be a good idea for f2fs to do more, given that fsync is
> > a slow enough operation that so long as you can make sure the fsync of
> > the parent directory happens within the same atomic update as the
> > child inode, you might as well give the more expansive guarantee.  But
> > obviously that's up to the f2fs developers to decide whether they want
> > to do that work.
> 
> Indeed. Actually, since one of our goals was to reduce fsync latencies for
> Android, we decided to support posix in a minimum way. In order to avoid
> complex directory updates recursively, however, we allowed the fsync on
> directories to trigger checkpoint requiring many IO operations.

So what you are really saying is that f2fs is not strictly ordered
w.r.t metadata crash consistency after fsync()? Wasn't that
considered a bug in btrfs that had to be fixed (and did get fixed)?

Oh, yeah, it's right there in the test commit history:

commit f02fe949113f35ae221ec1ab5c9959912f594bf4
Author: Filipe Manana <fdmanana@suse.com>
Date:   Tue Apr 5 11:47:55 2016 +1000

    generic: add test for fsync after renaming file
    
    Test that if we rename a file, create a new file that has the old name
    of the other file and is a child of the same parent directory, fsync the
    new inode, power fail and mount the filesystem, we do not lose the first
    file and that file has the name it was renamed to.
    
    This test is motivated by an issue found in btrfs which is fixed by the
    following patch for the linux kernel:
    
      "Btrfs: fix file loss caused by fsync after rename and new inode"
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: Eryu Guan <eguan@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

There's a whole lot more detail in the kernel commit 2be63d5ce929
("Btrfs: fix file loss on log replay after renaming a file and
fsync") but my point is that we considered this a btrfs filesystem
bug and so changing the test defeats it's purpose as a regression
test for the btrfs bug.

So IMO the test should not be changed. And I think we should be
consistent and consider this f2fs failure as a f2fs bug that needs
fixing to bring it's behaviour in line with xfs, ext4, and btrfs.

Remember this when quoting POSIX about fsync behaviour: Posix is a
terrible standard when it comes to data integrity. We go way, way
beyond what POSIX specifies as a valid fsync implementation (i.e.
posix allows "do nothing and return success" as a conformant
implementation). Ext4, XFS and btrfs all have strictly ordered
metadata crash recovery semantics and all of the crash recovery
tests expect this behaviour from the filesytem being tested. The
underlying intention is that by encoding it into these tests, all
widely used and future linux filesystems meet or exceed this crash
integrity requirement.

IOWs, changing the test is the wrong thing to do on many levels....

Cheers,

Dave.
Jaegeuk Kim Dec. 28, 2017, 4:29 p.m. UTC | #8
On 12/28, Dave Chinner wrote:
> On Wed, Dec 27, 2017 at 11:11:30AM -0800, Jaegeuk Kim wrote:
> > On 12/25, Theodore Ts'o wrote:
> > > On Sun, Dec 24, 2017 at 11:47:20PM -0800, Eric Sandeen wrote:
> > > > Filesystems are free to do /more/ than the minimum required by posix -
> > > > see ext4_sync_parent for example.  Or xfs_finish_rename, for synchronous
> > > > mounts:
> > > > 
> > > >          * If this is a synchronous mount, make sure that the rename transaction
> > > >          * goes to disk before returning to the user.
> > > >          */
> > > >         if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
> > > >                 xfs_trans_set_sync(tp);
> > > > 
> > > > so behavior can be fs-dependent, or mount option dependent, etc.
> > > > 
> > > > But to be portable, if an app wants directory changes to be persistent
> > > > before proceeding, it must fsync the directory after making changes.
> > > > 
> > > > I don't know about f2fs's design intent, whether it guarantees more
> > > > than posix requires, but to guarantee that this test works on any posix
> > > > fs, I think that directory fsync is needed.
> > > 
> > > Agreed that this is a test bug, and we should add the fsync to the
> > > parent directory.
> > 
> > Agreed too. Or, how about using "-o dirsync"?
> > 
> > > 
> > > It might also be a good idea for f2fs to do more, given that fsync is
> > > a slow enough operation that so long as you can make sure the fsync of
> > > the parent directory happens within the same atomic update as the
> > > child inode, you might as well give the more expansive guarantee.  But
> > > obviously that's up to the f2fs developers to decide whether they want
> > > to do that work.
> > 
> > Indeed. Actually, since one of our goals was to reduce fsync latencies for
> > Android, we decided to support posix in a minimum way. In order to avoid
> > complex directory updates recursively, however, we allowed the fsync on
> > directories to trigger checkpoint requiring many IO operations.
> 
> So what you are really saying is that f2fs is not strictly ordered
> w.r.t metadata crash consistency after fsync()? Wasn't that
> considered a bug in btrfs that had to be fixed (and did get fixed)?
> 
> Oh, yeah, it's right there in the test commit history:
> 
> commit f02fe949113f35ae221ec1ab5c9959912f594bf4
> Author: Filipe Manana <fdmanana@suse.com>
> Date:   Tue Apr 5 11:47:55 2016 +1000
> 
>     generic: add test for fsync after renaming file
>     
>     Test that if we rename a file, create a new file that has the old name
>     of the other file and is a child of the same parent directory, fsync the
>     new inode, power fail and mount the filesystem, we do not lose the first
>     file and that file has the name it was renamed to.
>     
>     This test is motivated by an issue found in btrfs which is fixed by the
>     following patch for the linux kernel:
>     
>       "Btrfs: fix file loss caused by fsync after rename and new inode"
>     
>     Signed-off-by: Filipe Manana <fdmanana@suse.com>
>     Reviewed-by: Eryu Guan <eguan@redhat.com>
>     Signed-off-by: Dave Chinner <david@fromorbit.com>
> 
> There's a whole lot more detail in the kernel commit 2be63d5ce929
> ("Btrfs: fix file loss on log replay after renaming a file and
> fsync") but my point is that we considered this a btrfs filesystem
> bug and so changing the test defeats it's purpose as a regression
> test for the btrfs bug.
> 
> So IMO the test should not be changed. And I think we should be
> consistent and consider this f2fs failure as a f2fs bug that needs
> fixing to bring it's behaviour in line with xfs, ext4, and btrfs.
> 
> Remember this when quoting POSIX about fsync behaviour: Posix is a
> terrible standard when it comes to data integrity. We go way, way
> beyond what POSIX specifies as a valid fsync implementation (i.e.
> posix allows "do nothing and return success" as a conformant
> implementation). Ext4, XFS and btrfs all have strictly ordered
> metadata crash recovery semantics and all of the crash recovery
> tests expect this behaviour from the filesytem being tested. The
> underlying intention is that by encoding it into these tests, all
> widely used and future linux filesystems meet or exceed this crash
> integrity requirement.

I just noticed that this test is not applied for generic filesystems,
but for ones that support metadata_journaling. So, IMHO, we don't have
to rely on POSIX too much here. Back to metadata_journalling, we indeed
have to treat this as a bug in f2fs, since this recovery behavior sounds
like defacto across the major filesystems.

I submitted a patch to address this, based on the hint by Ted.
https://patchwork.kernel.org/patch/10135085/

Thanks,

> 
> IOWs, changing the test is the wrong thing to do on many levels....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Dec. 28, 2017, 4:59 p.m. UTC | #9
On 12/28/17 1:09 AM, Dave Chinner wrote:
...

> There's a whole lot more detail in the kernel commit 2be63d5ce929
> ("Btrfs: fix file loss on log replay after renaming a file and
> fsync") but my point is that we considered this a btrfs filesystem
> bug and so changing the test defeats it's purpose as a regression
> test for the btrfs bug.
> 
> So IMO the test should not be changed. And I think we should be
> consistent and consider this f2fs failure as a f2fs bug that needs
> fixing to bring it's behaviour in line with xfs, ext4, and btrfs.
> 
> Remember this when quoting POSIX about fsync behaviour: Posix is a
> terrible standard when it comes to data integrity. We go way, way
> beyond what POSIX specifies as a valid fsync implementation (i.e.
> posix allows "do nothing and return success" as a conformant
> implementation). Ext4, XFS and btrfs all have strictly ordered
> metadata crash recovery semantics and all of the crash recovery
> tests expect this behaviour from the filesytem being tested. The
> underlying intention is that by encoding it into these tests, all
> widely used and future linux filesystems meet or exceed this crash
> integrity requirement.
> 
> IOWs, changing the test is the wrong thing to do on many levels....
> 
> Cheers,
> 
> Dave.
> 

Thanks for digging into the btrfs commit which changed the behavior
tested by this testcase, I had meant to do that. 

Yeah, that's all fair, for sure.  But this shouldn't be implicit in the
testcase; it should explicitly document that it is testing a de-facto
new standard adhered to by several filesystems, what that standard is,
etc, and not leave it to the reader.  ;)  

If a filesystem explicitly chooses not to adhere to that standard they
/could/ opt out of the test.

It's makes a lot of sense for linux filesystems to behave in consistent
ways, but if we're going to start enforcing that through xfstests we
need to be very clear about it, IMHO.


-Eric
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Dec. 28, 2017, 9:21 p.m. UTC | #10
On Thu, Dec 28, 2017 at 08:59:18AM -0800, Eric Sandeen wrote:
> On 12/28/17 1:09 AM, Dave Chinner wrote:
> ...
> 
> > There's a whole lot more detail in the kernel commit 2be63d5ce929
> > ("Btrfs: fix file loss on log replay after renaming a file and
> > fsync") but my point is that we considered this a btrfs filesystem
> > bug and so changing the test defeats it's purpose as a regression
> > test for the btrfs bug.
> > 
> > So IMO the test should not be changed. And I think we should be
> > consistent and consider this f2fs failure as a f2fs bug that needs
> > fixing to bring it's behaviour in line with xfs, ext4, and btrfs.
> > 
> > Remember this when quoting POSIX about fsync behaviour: Posix is a
> > terrible standard when it comes to data integrity. We go way, way
> > beyond what POSIX specifies as a valid fsync implementation (i.e.
> > posix allows "do nothing and return success" as a conformant
> > implementation). Ext4, XFS and btrfs all have strictly ordered
> > metadata crash recovery semantics and all of the crash recovery
> > tests expect this behaviour from the filesytem being tested. The
> > underlying intention is that by encoding it into these tests, all
> > widely used and future linux filesystems meet or exceed this crash
> > integrity requirement.
> > 
> > IOWs, changing the test is the wrong thing to do on many levels....
> > 
> > Cheers,
> > 
> > Dave.
> > 
> 
> Thanks for digging into the btrfs commit which changed the behavior
> tested by this testcase, I had meant to do that. 
> 
> Yeah, that's all fair, for sure.  But this shouldn't be implicit in the
> testcase; it should explicitly document that it is testing a de-facto
> new standard adhered to by several filesystems, what that standard is,
> etc, and not leave it to the reader.  ;)  

Well, the point of the test case is to catch recovery behaviour
that is not properly ordered in this complex corner case. It's not
an obvious thing for developers to test, as we can see by both btrfs
and now f2fs failing this test.

What that failure actually means and implies is completely separate
to the test itself. i.e. the test is just a canary that tells us a
filesystem is not strictly ordered in it's crash recovery behaviour.
If we don't have tests somewhere that exercise the behaviour we want
filesystems to have, we'll never know if a filesystem does or
doesn't behave as we want them to.

The fact we'd like all filesystems to have strictly ordered crash
recovery semantics has nothing to do with the test. It's based on
the fact that XFS and ext3/4 have had this behaviour since forever,
and apps written on these filesystems are quite likely to skip
the directory fsync() because it's not necessary. That's seen by
btrfs users reporting lost files where other filesystems are fine.

We want all linux filesystems to behave the same way in these
circumstances, but we can't enforce it because....

> If a filesystem explicitly chooses not to adhere to that standard they
> /could/ opt out of the test.

... that is a option the filesystem developers can take.

> It's makes a lot of sense for linux filesystems to behave in consistent
> ways, but if we're going to start enforcing that through xfstests we
> need to be very clear about it, IMHO.

xfstests is not the way we enforce it - it's just tells us something
is inconsistent. What the fs developers do from there (fix it or
skip the test) is up to them.

Cheers,

Dave.
diff mbox

Patch

diff --git a/tests/generic/342 b/tests/generic/342
index ed64e05..6a9e5bd 100755
--- a/tests/generic/342
+++ b/tests/generic/342
@@ -68,6 +68,7 @@  sync
 mv $SCRATCH_MNT/a/foo $SCRATCH_MNT/a/bar
 $XFS_IO_PROG -f -c "pwrite -S 0xba 0 16K" $SCRATCH_MNT/a/foo | _filter_xfs_io
 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a/foo
+$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
 
 echo "File digests before log replay:"
 md5sum $SCRATCH_MNT/a/foo | _filter_scratch