diff mbox

generic/311: Disable dmesg check

Message ID 1437109003-2357-1-git-send-email-chandan@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Chandan Rajendra July 17, 2015, 4:56 a.m. UTC
When running generic/311 on Btrfs' subpagesize-blocksize patchset (on ppc64
with 4k sectorsize and 16k node/leaf size) I noticed the following call trace,

BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
BTRFS: Transaction aborted (error -5)

WARNING: at /root/repos/linux/fs/btrfs/super.c:260
Modules linked in:
CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L 4.0.0-rc5-11671-g8b82e73e #63
task: c000000079aaddb0 ti: c000000079a48000 task.ti: c000000079a48000
NIP: c000000000499aa0 LR: c000000000499a9c CTR: c000000000779630
REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L   (4.0.0-rc5-11671-g8b82e73e)
MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>  CR: 28008828  XER: 20000000
CFAR: c000000000a23914 SOFTE: 1
GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
Call Trace:
[c000000079a4b700] [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190 (unreliable)
[c000000079a4b7a0] [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
[c000000079a4b850] [c0000000004d5b3c] .btrfs_commit_transaction+0x37c/0xca0
[c000000079a4b960] [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
[c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
[c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
[c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
[c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
[c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
[c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
[c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
[c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
[c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
Instruction dump:
ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO failure


The call trace is seen when executing _run_test() for the 8th time.
The above trace is actually a false-positive failure as indicated below,
 fsync-tester
   fsync(fd)
   Write delayed inode item to fs tree
     (assume transid to be 160)
     (assume tree block to start at logical address 29720576)
 md5sum $testfile
   This causes a delayed inode to be added
 Load flakey table
   i.e. drop writes that are initiated from now onwards
 Unmount filesystem
   btrfs_sync_fs is invoked
     Write 29720576 metadata block to disk
     free_extent_buffer(29720576)
       release_extent_buffer(29720576)
   Start writing delayed inode
     Traverse the fs tree
       (assume the parent tree block of 29720576 is still in memory)
       When reading 29720576 from disk, parent's blkptr will have generation
       set to 160. But the on-disk tree block will have an older
       generation (say, 158). Transid verification fails and hence the
       transaction gets aborted

The test only cares about the FS instance before the unmount
operation (i.e. the synced FS). Hence to get the test to pass, ignore the
false-positive trace that could be generated.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 tests/generic/311 | 2 ++
 1 file changed, 2 insertions(+)

Comments

Brian Foster July 17, 2015, 10:16 a.m. UTC | #1
On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
> When running generic/311 on Btrfs' subpagesize-blocksize patchset (on ppc64
> with 4k sectorsize and 16k node/leaf size) I noticed the following call trace,
> 
> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
> BTRFS: Transaction aborted (error -5)
> 
> WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> Modules linked in:
> CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L 4.0.0-rc5-11671-g8b82e73e #63
> task: c000000079aaddb0 ti: c000000079a48000 task.ti: c000000079a48000
> NIP: c000000000499aa0 LR: c000000000499a9c CTR: c000000000779630
> REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L   (4.0.0-rc5-11671-g8b82e73e)
> MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>  CR: 28008828  XER: 20000000
> CFAR: c000000000a23914 SOFTE: 1
> GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
> GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
> GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
> GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
> GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
> GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
> GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
> GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
> NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> Call Trace:
> [c000000079a4b700] [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190 (unreliable)
> [c000000079a4b7a0] [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> [c000000079a4b850] [c0000000004d5b3c] .btrfs_commit_transaction+0x37c/0xca0
> [c000000079a4b960] [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> [c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
> [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> [c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
> [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> Instruction dump:
> ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
> 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
> BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO failure
> 
> 
> The call trace is seen when executing _run_test() for the 8th time.
> The above trace is actually a false-positive failure as indicated below,
>  fsync-tester
>    fsync(fd)
>    Write delayed inode item to fs tree
>      (assume transid to be 160)
>      (assume tree block to start at logical address 29720576)
>  md5sum $testfile
>    This causes a delayed inode to be added
>  Load flakey table
>    i.e. drop writes that are initiated from now onwards
>  Unmount filesystem
>    btrfs_sync_fs is invoked
>      Write 29720576 metadata block to disk
>      free_extent_buffer(29720576)
>        release_extent_buffer(29720576)
>    Start writing delayed inode
>      Traverse the fs tree
>        (assume the parent tree block of 29720576 is still in memory)
>        When reading 29720576 from disk, parent's blkptr will have generation
>        set to 160. But the on-disk tree block will have an older
>        generation (say, 158). Transid verification fails and hence the
>        transaction gets aborted
> 
> The test only cares about the FS instance before the unmount
> operation (i.e. the synced FS). Hence to get the test to pass, ignore the
> false-positive trace that could be generated.
> 
> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> ---
>  tests/generic/311 | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/generic/311 b/tests/generic/311
> index d21b6eb..cd6391d 100755
> --- a/tests/generic/311
> +++ b/tests/generic/311
> @@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
>  
>  [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
>  
> +_disable_dmesg_check
> +

Hmm, I don't think this is something we'd want to do unconditionally.
E.g., if something hits the logs for xfs or ext4, we probably want to
hear about it.

Perhaps check that the fs is btrfs and possibly the fs params are such
that the known warning occurs..? I'd defer to the btrfs folks on how
best to check that, so long as it doesn't affect other fs'.

Brian

>  rm -f $seqres.full
>  SEED=1
>  testfile=$SCRATCH_MNT/$seq.fsync
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chandan Rajendra July 17, 2015, 11:40 a.m. UTC | #2
On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
> On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
> > When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
> > ppc64
> > with 4k sectorsize and 16k node/leaf size) I noticed the following call
> > trace,
> > 
> > BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160
> > found 158 BTRFS (device dm-0): parent transid verify failed on 29720576
> > wanted 160 found 158 BTRFS: Transaction aborted (error -5)
> > 
> > WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> > Modules linked in:
> > CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
> > 4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti: c000000079a48000
> > task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c CTR:
> > c000000000779630
> > REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L  
> > (4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> 
> > CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
> > GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
> > GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
> > GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
> > GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
> > GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
> > GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
> > GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
> > GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
> > NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> > LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> > Call Trace:
> > [c000000079a4b700] [c000000000499a9c]
> > .__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
> > [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> > [c000000079a4b850] [c0000000004d5b3c]
> > .btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
> > [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> > [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> > [c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
> > [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> > [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> > [c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
> > [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> > [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> > [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> > [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> > Instruction dump:
> > ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
> > 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
> > BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO
> > failure
> > 
> > 
> > The call trace is seen when executing _run_test() for the 8th time.
> > The above trace is actually a false-positive failure as indicated below,
> > 
> >  fsync-tester
> >  
> >    fsync(fd)
> >    Write delayed inode item to fs tree
> >    
> >      (assume transid to be 160)
> >      (assume tree block to start at logical address 29720576)
> >  
> >  md5sum $testfile
> >  
> >    This causes a delayed inode to be added
> >  
> >  Load flakey table
> >  
> >    i.e. drop writes that are initiated from now onwards
> >  
> >  Unmount filesystem
> >  
> >    btrfs_sync_fs is invoked
> >    
> >      Write 29720576 metadata block to disk
> >      free_extent_buffer(29720576)
> >      
> >        release_extent_buffer(29720576)
> >    
> >    Start writing delayed inode
> >    
> >      Traverse the fs tree
> >      
> >        (assume the parent tree block of 29720576 is still in memory)
> >        When reading 29720576 from disk, parent's blkptr will have
> >        generation
> >        set to 160. But the on-disk tree block will have an older
> >        generation (say, 158). Transid verification fails and hence the
> >        transaction gets aborted
> > 
> > The test only cares about the FS instance before the unmount
> > operation (i.e. the synced FS). Hence to get the test to pass, ignore the
> > false-positive trace that could be generated.
> > 
> > Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> > ---
> > 
> >  tests/generic/311 | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/tests/generic/311 b/tests/generic/311
> > index d21b6eb..cd6391d 100755
> > --- a/tests/generic/311
> > +++ b/tests/generic/311
> > @@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
> > 
> >  [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
> > 
> > +_disable_dmesg_check
> > +
> 
> Hmm, I don't think this is something we'd want to do unconditionally.
> E.g., if something hits the logs for xfs or ext4, we probably want to
> hear about it.
Sorry, I forgot that I was dealing with generic tests. I agree with what you
say.

> 
> Perhaps check that the fs is btrfs and possibly the fs params are such
> that the known warning occurs..? I'd defer to the btrfs folks on how
> best to check that, so long as it doesn't affect other fs'.
>
The above trace is independent of any Btrfs parameters. Its just that it was
easily reproducible on ppc64 with the subpagesize-blocksize patchset
applied. So IMHO disabling call trace check when testing Btrfs filesystem is
the way to go. But again, as you said, we can wait for comments from other
Btrfs devs.

> Brian
> 
> >  rm -f $seqres.full
> >  SEED=1
> >  testfile=$SCRATCH_MNT/$seq.fsync
Dave Chinner July 19, 2015, 11:54 p.m. UTC | #3
On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
> On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
> > On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
> > > When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
> > > ppc64
> > > with 4k sectorsize and 16k node/leaf size) I noticed the following call
> > > trace,
> > > 
> > > BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160
> > > found 158 BTRFS (device dm-0): parent transid verify failed on 29720576
> > > wanted 160 found 158 BTRFS: Transaction aborted (error -5)
> > > 
> > > WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> > > Modules linked in:
> > > CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
> > > 4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti: c000000079a48000
> > > task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c CTR:
> > > c000000000779630
> > > REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L  
> > > (4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> 
> > > CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
> > > GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
> > > GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
> > > GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
> > > GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
> > > GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
> > > GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
> > > GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
> > > GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
> > > NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> > > LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> > > Call Trace:
> > > [c000000079a4b700] [c000000000499a9c]
> > > .__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
> > > [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> > > [c000000079a4b850] [c0000000004d5b3c]
> > > .btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
> > > [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> > > [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> > > [c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
> > > [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> > > [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> > > [c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
> > > [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> > > [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> > > [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> > > [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> > > Instruction dump:
> > > ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
> > > 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
> > > BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO
> > > failure
> > > 
> > > 
> > > The call trace is seen when executing _run_test() for the 8th time.
> > > The above trace is actually a false-positive failure as indicated below,
> > > 
> > >  fsync-tester
> > >  
> > >    fsync(fd)
> > >    Write delayed inode item to fs tree
> > >    
> > >      (assume transid to be 160)
> > >      (assume tree block to start at logical address 29720576)
> > >  
> > >  md5sum $testfile
> > >  
> > >    This causes a delayed inode to be added
> > >  
> > >  Load flakey table
> > >  
> > >    i.e. drop writes that are initiated from now onwards
> > >  
> > >  Unmount filesystem
> > >  
> > >    btrfs_sync_fs is invoked
> > >    
> > >      Write 29720576 metadata block to disk
> > >      free_extent_buffer(29720576)
> > >      
> > >        release_extent_buffer(29720576)
> > >    
> > >    Start writing delayed inode
> > >    
> > >      Traverse the fs tree
> > >      
> > >        (assume the parent tree block of 29720576 is still in memory)
> > >        When reading 29720576 from disk, parent's blkptr will have
> > >        generation
> > >        set to 160. But the on-disk tree block will have an older
> > >        generation (say, 158). Transid verification fails and hence the
> > >        transaction gets aborted
> > > 
> > > The test only cares about the FS instance before the unmount
> > > operation (i.e. the synced FS). Hence to get the test to pass, ignore the
> > > false-positive trace that could be generated.
> > > 
> > > Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> > > ---
> > > 
> > >  tests/generic/311 | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/tests/generic/311 b/tests/generic/311
> > > index d21b6eb..cd6391d 100755
> > > --- a/tests/generic/311
> > > +++ b/tests/generic/311
> > > @@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
> > > 
> > >  [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
> > > 
> > > +_disable_dmesg_check
> > > +
> > 
> > Hmm, I don't think this is something we'd want to do unconditionally.
> > E.g., if something hits the logs for xfs or ext4, we probably want to
> > hear about it.
> Sorry, I forgot that I was dealing with generic tests. I agree with what you
> say.
> 
> > 
> > Perhaps check that the fs is btrfs and possibly the fs params are such
> > that the known warning occurs..? I'd defer to the btrfs folks on how
> > best to check that, so long as it doesn't affect other fs'.
> >
> The above trace is independent of any Btrfs parameters. Its just that it was
> easily reproducible on ppc64 with the subpagesize-blocksize patchset
> applied. So IMHO disabling call trace check when testing Btrfs filesystem is
> the way to go. But again, as you said, we can wait for comments from other
> Btrfs devs.

As a general principle, we don't disable warnings/errors in tests
just to make tests pass.  The bug in the code being tested should be
fixed so the warning/error is not generated. If you do not want
to see the error, then expung the test from your
subpagesize-blocksize test runs.

Cheers,

Dave.
Josef Bacik July 20, 2015, 12:55 p.m. UTC | #4
On 07/19/2015 07:54 PM, Dave Chinner wrote:
> On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
>> On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
>>> On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
>>>> When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
>>>> ppc64
>>>> with 4k sectorsize and 16k node/leaf size) I noticed the following call
>>>> trace,
>>>>
>>>> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160
>>>> found 158 BTRFS (device dm-0): parent transid verify failed on 29720576
>>>> wanted 160 found 158 BTRFS: Transaction aborted (error -5)
>>>>
>>>> WARNING: at /root/repos/linux/fs/btrfs/super.c:260
>>>> Modules linked in:
>>>> CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
>>>> 4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti: c000000079a48000
>>>> task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c CTR:
>>>> c000000000779630
>>>> REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
>>>> (4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
>>>> CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
>>>> GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
>>>> GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
>>>> GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
>>>> GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
>>>> GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
>>>> GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
>>>> GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
>>>> GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
>>>> NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
>>>> LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
>>>> Call Trace:
>>>> [c000000079a4b700] [c000000000499a9c]
>>>> .__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
>>>> [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
>>>> [c000000079a4b850] [c0000000004d5b3c]
>>>> .btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
>>>> [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
>>>> [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
>>>> [c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
>>>> [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
>>>> [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
>>>> [c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
>>>> [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
>>>> [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
>>>> [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
>>>> [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
>>>> Instruction dump:
>>>> ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
>>>> 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
>>>> BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO
>>>> failure
>>>>
>>>>
>>>> The call trace is seen when executing _run_test() for the 8th time.
>>>> The above trace is actually a false-positive failure as indicated below,
>>>>
>>>>   fsync-tester
>>>>
>>>>     fsync(fd)
>>>>     Write delayed inode item to fs tree
>>>>
>>>>       (assume transid to be 160)
>>>>       (assume tree block to start at logical address 29720576)
>>>>
>>>>   md5sum $testfile
>>>>
>>>>     This causes a delayed inode to be added
>>>>
>>>>   Load flakey table
>>>>
>>>>     i.e. drop writes that are initiated from now onwards
>>>>
>>>>   Unmount filesystem
>>>>
>>>>     btrfs_sync_fs is invoked
>>>>
>>>>       Write 29720576 metadata block to disk
>>>>       free_extent_buffer(29720576)
>>>>
>>>>         release_extent_buffer(29720576)
>>>>
>>>>     Start writing delayed inode
>>>>
>>>>       Traverse the fs tree
>>>>
>>>>         (assume the parent tree block of 29720576 is still in memory)
>>>>         When reading 29720576 from disk, parent's blkptr will have
>>>>         generation
>>>>         set to 160. But the on-disk tree block will have an older
>>>>         generation (say, 158). Transid verification fails and hence the
>>>>         transaction gets aborted
>>>>
>>>> The test only cares about the FS instance before the unmount
>>>> operation (i.e. the synced FS). Hence to get the test to pass, ignore the
>>>> false-positive trace that could be generated.
>>>>
>>>> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>>>> ---
>>>>
>>>>   tests/generic/311 | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/tests/generic/311 b/tests/generic/311
>>>> index d21b6eb..cd6391d 100755
>>>> --- a/tests/generic/311
>>>> +++ b/tests/generic/311
>>>> @@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
>>>>
>>>>   [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
>>>>
>>>> +_disable_dmesg_check
>>>> +
>>>
>>> Hmm, I don't think this is something we'd want to do unconditionally.
>>> E.g., if something hits the logs for xfs or ext4, we probably want to
>>> hear about it.
>> Sorry, I forgot that I was dealing with generic tests. I agree with what you
>> say.
>>
>>>
>>> Perhaps check that the fs is btrfs and possibly the fs params are such
>>> that the known warning occurs..? I'd defer to the btrfs folks on how
>>> best to check that, so long as it doesn't affect other fs'.
>>>
>> The above trace is independent of any Btrfs parameters. Its just that it was
>> easily reproducible on ppc64 with the subpagesize-blocksize patchset
>> applied. So IMHO disabling call trace check when testing Btrfs filesystem is
>> the way to go. But again, as you said, we can wait for comments from other
>> Btrfs devs.
>
> As a general principle, we don't disable warnings/errors in tests
> just to make tests pass.  The bug in the code being tested should be
> fixed so the warning/error is not generated. If you do not want
> to see the error, then expung the test from your
> subpagesize-blocksize test runs.
>

The same problem would happen if you did echo 3> 
/proc/sys/vm/drop_caches right after dropping writes before doing the 
unmount.  The sub page blocksize stuff is just exposing the race and 
thus causing this warning to happen.  The warning has nothing to do with 
the test, it happens after drop writes so I think it's fine to ignore 
it.  Maybe a different mechanism could be put into place to say "hey 
ignore dmesg for this part" or something like that.

That all being said, it is a little weird that the sub page blocksize 
stuff hits this so consistently, I wonder why things are getting evicted 
that quickly.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 20, 2015, 10:12 p.m. UTC | #5
On Mon, Jul 20, 2015 at 08:55:32AM -0400, Josef Bacik wrote:
> On 07/19/2015 07:54 PM, Dave Chinner wrote:
> >On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
> >>On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
> >>>On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
> >>>>When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
> >>>>ppc64
> >>>>with 4k sectorsize and 16k node/leaf size) I noticed the following call
> >>>>trace,
> >>>>
> >>>>BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160
> >>>>found 158 BTRFS (device dm-0): parent transid verify failed on 29720576
> >>>>wanted 160 found 158 BTRFS: Transaction aborted (error -5)
> >>>>
> >>>>WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> >>>>Modules linked in:
> >>>>CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
> >>>>4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti: c000000079a48000
> >>>>task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c CTR:
> >>>>c000000000779630
> >>>>REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
> >>>>(4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
> >>>>CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
> >>>>GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
> >>>>GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
> >>>>GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
> >>>>GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
> >>>>GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
> >>>>GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
> >>>>GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
> >>>>GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
> >>>>NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> >>>>LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> >>>>Call Trace:
> >>>>[c000000079a4b700] [c000000000499a9c]
> >>>>.__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
> >>>>[c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> >>>>[c000000079a4b850] [c0000000004d5b3c]
> >>>>.btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
> >>>>[c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> >>>>[c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> >>>>[c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
> >>>>[c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> >>>>[c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> >>>>[c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
> >>>>[c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> >>>>[c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> >>>>[c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> >>>>[c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> >>>>Instruction dump:
> >>>>ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
> >>>>7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
> >>>>BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO
> >>>>failure
> >>>>
> >>>>
> >>>>The call trace is seen when executing _run_test() for the 8th time.
> >>>>The above trace is actually a false-positive failure as indicated below,
> >>>>
> >>>>  fsync-tester
> >>>>
> >>>>    fsync(fd)
> >>>>    Write delayed inode item to fs tree
> >>>>
> >>>>      (assume transid to be 160)
> >>>>      (assume tree block to start at logical address 29720576)
> >>>>
> >>>>  md5sum $testfile
> >>>>
> >>>>    This causes a delayed inode to be added
> >>>>
> >>>>  Load flakey table
> >>>>
> >>>>    i.e. drop writes that are initiated from now onwards
> >>>>
> >>>>  Unmount filesystem
> >>>>
> >>>>    btrfs_sync_fs is invoked
> >>>>
> >>>>      Write 29720576 metadata block to disk
> >>>>      free_extent_buffer(29720576)
> >>>>
> >>>>        release_extent_buffer(29720576)
> >>>>
> >>>>    Start writing delayed inode
> >>>>
> >>>>      Traverse the fs tree
> >>>>
> >>>>        (assume the parent tree block of 29720576 is still in memory)
> >>>>        When reading 29720576 from disk, parent's blkptr will have
> >>>>        generation
> >>>>        set to 160. But the on-disk tree block will have an older
> >>>>        generation (say, 158). Transid verification fails and hence the
> >>>>        transaction gets aborted
> >>>>
> >>>>The test only cares about the FS instance before the unmount
> >>>>operation (i.e. the synced FS). Hence to get the test to pass, ignore the
> >>>>false-positive trace that could be generated.
> >>>>
> >>>>Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> >>>>---
> >>>>
> >>>>  tests/generic/311 | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>>diff --git a/tests/generic/311 b/tests/generic/311
> >>>>index d21b6eb..cd6391d 100755
> >>>>--- a/tests/generic/311
> >>>>+++ b/tests/generic/311
> >>>>@@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
> >>>>
> >>>>  [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
> >>>>
> >>>>+_disable_dmesg_check
> >>>>+
> >>>
> >>>Hmm, I don't think this is something we'd want to do unconditionally.
> >>>E.g., if something hits the logs for xfs or ext4, we probably want to
> >>>hear about it.
> >>Sorry, I forgot that I was dealing with generic tests. I agree with what you
> >>say.
> >>
> >>>
> >>>Perhaps check that the fs is btrfs and possibly the fs params are such
> >>>that the known warning occurs..? I'd defer to the btrfs folks on how
> >>>best to check that, so long as it doesn't affect other fs'.
> >>>
> >>The above trace is independent of any Btrfs parameters. Its just that it was
> >>easily reproducible on ppc64 with the subpagesize-blocksize patchset
> >>applied. So IMHO disabling call trace check when testing Btrfs filesystem is
> >>the way to go. But again, as you said, we can wait for comments from other
> >>Btrfs devs.
> >
> >As a general principle, we don't disable warnings/errors in tests
> >just to make tests pass.  The bug in the code being tested should be
> >fixed so the warning/error is not generated. If you do not want
> >to see the error, then expung the test from your
> >subpagesize-blocksize test runs.
> >
> 
> The same problem would happen if you did echo 3>
> /proc/sys/vm/drop_caches right after dropping writes before doing
> the unmount.

Nice, you have multiple ways of reproducing the problem....

> The sub page blocksize stuff is just exposing the race
> and thus causing this warning to happen.  The warning has nothing to
> do with the test, it happens after drop writes so I think it's fine
> to ignore it.  Maybe a different mechanism could be put into place
> to say "hey ignore dmesg for this part" or something like that.
> 
> That all being said, it is a little weird that the sub page
> blocksize stuff hits this so consistently, I wonder why things are
> getting evicted that quickly.  Thanks,

Josef, you've just explained exactly why silencing/ignoring dmesg is
the wrong thing to do. The test is exposing some problem (even a
known problem) via a behaviour that is not understood. And when I
hear "there's a race condition, and this test hits it easily" I see
a test that can be used to verify that the problem has been fixed.

i.e. we don't silence warnings just because a test triggers it
easily; having a test fail is a clear reminder that either:

	a) there is a problem that needs fixing; or
	b) there is a warning being emitted under normal operating
	   situations that users are going to notice and report. And
	   if users can see it, then see a).

Cheers,

Dave.
Chandan Rajendra July 21, 2015, 10:33 a.m. UTC | #6
On Tuesday 21 Jul 2015 08:12:20 Dave Chinner wrote:
> On Mon, Jul 20, 2015 at 08:55:32AM -0400, Josef Bacik wrote:
> > On 07/19/2015 07:54 PM, Dave Chinner wrote:
> > >On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
> > >>On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
> > >>>On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
> > >>>>When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
> > >>>>ppc64
> > >>>>with 4k sectorsize and 16k node/leaf size) I noticed the following
> > >>>>call
> > >>>>trace,
> > >>>>
> > >>>>BTRFS (device dm-0): parent transid verify failed on 29720576 wanted
> > >>>>160
> > >>>>found 158 BTRFS (device dm-0): parent transid verify failed on
> > >>>>29720576
> > >>>>wanted 160 found 158 BTRFS: Transaction aborted (error -5)
> > >>>>
> > >>>>WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> > >>>>Modules linked in:
> > >>>>CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
> > >>>>4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti:
> > >>>>c000000079a48000
> > >>>>task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c
> > >>>>CTR:
> > >>>>c000000000779630
> > >>>>REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
> > >>>>(4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
> > >>>>CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
> > >>>>GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8
> > >>>>0000000000000025
> > >>>>GPR04: 0000000000000001 0000000000000502 c00000000107e918
> > >>>>0000000000000cda
> > >>>>GPR08: 0000000000000007 0000000000000007 0000000000000001
> > >>>>c0000000010f5044
> > >>>>GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000
> > >>>>0000000010152e00
> > >>>>GPR16: 0000010002979380 0000000010140724 0000000000000000
> > >>>>0000000000000000
> > >>>>GPR20: ffffffffffffffff 0000000000000000 0000000000000000
> > >>>>0000000000000000
> > >>>>GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800
> > >>>>c000000000aac270
> > >>>>GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800
> > >>>>c0000000679204d0
> > >>>>NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> > >>>>LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> > >>>>Call Trace:
> > >>>>[c000000079a4b700] [c000000000499a9c]
> > >>>>.__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
> > >>>>[c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> > >>>>[c000000079a4b850] [c0000000004d5b3c]
> > >>>>.btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
> > >>>>[c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> > >>>>[c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> > >>>>[c000000079a4ba80] [c000000000218070]
> > >>>>.generic_shutdown_super+0x40/0x170
> > >>>>[c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> > >>>>[c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> > >>>>[c000000079a4bc10] [c000000000218ac8]
> > >>>>.deactivate_locked_super+0x98/0xe0
> > >>>>[c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> > >>>>[c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> > >>>>[c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> > >>>>[c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> > >>>>Instruction dump:
> > >>>>ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc
> > >>>>3c62ffd2
> > >>>>7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000
> > >>>>60000000
> > >>>>BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5
> > >>>>IO
> > >>>>failure
> > >>>>
> > >>>>
> > >>>>The call trace is seen when executing _run_test() for the 8th time.
> > >>>>The above trace is actually a false-positive failure as indicated
> > >>>>below,
> > >>>>
> > >>>>  fsync-tester
> > >>>>  
> > >>>>    fsync(fd)
> > >>>>    Write delayed inode item to fs tree
> > >>>>    
> > >>>>      (assume transid to be 160)
> > >>>>      (assume tree block to start at logical address 29720576)
> > >>>>  
> > >>>>  md5sum $testfile
> > >>>>  
> > >>>>    This causes a delayed inode to be added
> > >>>>  
> > >>>>  Load flakey table
> > >>>>  
> > >>>>    i.e. drop writes that are initiated from now onwards
> > >>>>  
> > >>>>  Unmount filesystem
> > >>>>  
> > >>>>    btrfs_sync_fs is invoked
> > >>>>    
> > >>>>      Write 29720576 metadata block to disk
> > >>>>      free_extent_buffer(29720576)
> > >>>>      
> > >>>>        release_extent_buffer(29720576)
> > >>>>    
> > >>>>    Start writing delayed inode
> > >>>>    
> > >>>>      Traverse the fs tree
> > >>>>      
> > >>>>        (assume the parent tree block of 29720576 is still in memory)
> > >>>>        When reading 29720576 from disk, parent's blkptr will have
> > >>>>        generation
> > >>>>        set to 160. But the on-disk tree block will have an older
> > >>>>        generation (say, 158). Transid verification fails and hence
> > >>>>        the
> > >>>>        transaction gets aborted
> > >>>>
> > >>>>The test only cares about the FS instance before the unmount
> > >>>>operation (i.e. the synced FS). Hence to get the test to pass, ignore
> > >>>>the
> > >>>>false-positive trace that could be generated.
> > >>>>
> > >>>>Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> > >>>>---
> > >>>>
> > >>>>  tests/generic/311 | 2 ++
> > >>>>  1 file changed, 2 insertions(+)
> > >>>>
> > >>>>diff --git a/tests/generic/311 b/tests/generic/311
> > >>>>index d21b6eb..cd6391d 100755
> > >>>>--- a/tests/generic/311
> > >>>>+++ b/tests/generic/311
> > >>>>@@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
> > >>>>
> > >>>>  [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
> > >>>>
> > >>>>+_disable_dmesg_check
> > >>>>+
> > >>>
> > >>>Hmm, I don't think this is something we'd want to do unconditionally.
> > >>>E.g., if something hits the logs for xfs or ext4, we probably want to
> > >>>hear about it.
> > >>
> > >>Sorry, I forgot that I was dealing with generic tests. I agree with what
> > >>you say.
> > >>
> > >>>Perhaps check that the fs is btrfs and possibly the fs params are such
> > >>>that the known warning occurs..? I'd defer to the btrfs folks on how
> > >>>best to check that, so long as it doesn't affect other fs'.
> > >>
> > >>The above trace is independent of any Btrfs parameters. Its just that it
> > >>was easily reproducible on ppc64 with the subpagesize-blocksize
> > >>patchset applied. So IMHO disabling call trace check when testing Btrfs
> > >>filesystem is the way to go. But again, as you said, we can wait for
> > >>comments from other Btrfs devs.
> > >
> > >As a general principle, we don't disable warnings/errors in tests
> > >just to make tests pass.  The bug in the code being tested should be
> > >fixed so the warning/error is not generated. If you do not want
> > >to see the error, then expung the test from your
> > >subpagesize-blocksize test runs.
> > 
> > The same problem would happen if you did echo 3>
> > /proc/sys/vm/drop_caches right after dropping writes before doing
> > the unmount.
> 
> Nice, you have multiple ways of reproducing the problem....

Dave, Apologies if I am sounding repetitive. The call trace that is showing up
when executing the test isn't indicating a failure in Btrfs' code. Its a
result of, 

1. Perform fsync(fd).
2. Load dm-flakey.
   i.e. drop writes from now onwards.
3. Write a dirty metadata btree block to disk.
   The data however doesn't reach the disk. The corresponding block on disk
   still has old data.
4. Evict the btree block from memory.
5. Read the btree block from disk.
   Verification of the btree block read from the disk fails since the
   transaction id expected from the btree block is different from the one
   stored in the parent btree block. This causes the transaction to be aborted
   and hence a call trace gets printed.

From the above we see that there is no issue in Btrfs. The code in fact is
doing the right thing by aborting the transaction. Unfortunately the test
notices the call trace in dmesg and gets marked as a failure.

> 
> > The sub page blocksize stuff is just exposing the race
> > and thus causing this warning to happen.  The warning has nothing to
> > do with the test, it happens after drop writes so I think it's fine
> > to ignore it.  Maybe a different mechanism could be put into place
> > to say "hey ignore dmesg for this part" or something like that.
> > 
> > That all being said, it is a little weird that the sub page
> > blocksize stuff hits this so consistently, I wonder why things are
> > getting evicted that quickly.  Thanks,
> 
> Josef, you've just explained exactly why silencing/ignoring dmesg is
> the wrong thing to do. The test is exposing some problem (even a
> known problem) via a behaviour that is not understood. And when I
> hear "there's a race condition, and this test hits it easily" I see
> a test that can be used to verify that the problem has been fixed.
> 
> i.e. we don't silence warnings just because a test triggers it
> easily; having a test fail is a clear reminder that either:
> 
> 	a) there is a problem that needs fixing; or
> 	b) there is a warning being emitted under normal operating
> 	   situations that users are going to notice and report. And
> 	   if users can see it, then see a).

IMHO the problem lies in the test i.e. the test assumes that a btree block
that was written to the disk after dm-flakey was loaded (with write operations
being dropped) would never be evicted from memory and re-read again.

> 
> Cheers,
> 
> Dave.
Dave Chinner July 22, 2015, 12:01 a.m. UTC | #7
On Tue, Jul 21, 2015 at 04:03:17PM +0530, Chandan Rajendra wrote:
> On Tuesday 21 Jul 2015 08:12:20 Dave Chinner wrote:
> > On Mon, Jul 20, 2015 at 08:55:32AM -0400, Josef Bacik wrote:
> > > On 07/19/2015 07:54 PM, Dave Chinner wrote:
> > > >On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
> > > >>On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
> > > >>>On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
> > > >>>>When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
> > > >>>>ppc64
> > > >>>>with 4k sectorsize and 16k node/leaf size) I noticed the following
> > > >>>>call
> > > >>>>trace,
> > > >>>>
> > > >>>>BTRFS (device dm-0): parent transid verify failed on 29720576 wanted
> > > >>>>160
> > > >>>>found 158 BTRFS (device dm-0): parent transid verify failed on
> > > >>>>29720576
> > > >>>>wanted 160 found 158 BTRFS: Transaction aborted (error -5)
> > > >>>>
> > > >>>>WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> > > >>>>Modules linked in:
> > > >>>>CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
> > > >>>>4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti:
> > > >>>>c000000079a48000
> > > >>>>task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c
> > > >>>>CTR:
> > > >>>>c000000000779630
> > > >>>>REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
> > > >>>>(4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
> > > >>>>CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
> > > >>>>GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8
> > > >>>>0000000000000025
> > > >>>>GPR04: 0000000000000001 0000000000000502 c00000000107e918
> > > >>>>0000000000000cda
> > > >>>>GPR08: 0000000000000007 0000000000000007 0000000000000001
> > > >>>>c0000000010f5044
> > > >>>>GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000
> > > >>>>0000000010152e00
> > > >>>>GPR16: 0000010002979380 0000000010140724 0000000000000000
> > > >>>>0000000000000000
> > > >>>>GPR20: ffffffffffffffff 0000000000000000 0000000000000000
> > > >>>>0000000000000000
> > > >>>>GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800
> > > >>>>c000000000aac270
> > > >>>>GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800
> > > >>>>c0000000679204d0
> > > >>>>NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> > > >>>>LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> > > >>>>Call Trace:
> > > >>>>[c000000079a4b700] [c000000000499a9c]
> > > >>>>.__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
> > > >>>>[c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> > > >>>>[c000000079a4b850] [c0000000004d5b3c]
> > > >>>>.btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
> > > >>>>[c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> > > >>>>[c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> > > >>>>[c000000079a4ba80] [c000000000218070]
> > > >>>>.generic_shutdown_super+0x40/0x170
> > > >>>>[c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> > > >>>>[c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> > > >>>>[c000000079a4bc10] [c000000000218ac8]
> > > >>>>.deactivate_locked_super+0x98/0xe0
> > > >>>>[c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> > > >>>>[c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> > > >>>>[c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> > > >>>>[c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> > > >>>>Instruction dump:
> > > >>>>ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc
> > > >>>>3c62ffd2
> > > >>>>7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000
> > > >>>>60000000
> > > >>>>BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5
> > > >>>>IO
> > > >>>>failure
> > > >>>>
> > > >>>>
> > > >>>>The call trace is seen when executing _run_test() for the 8th time.
> > > >>>>The above trace is actually a false-positive failure as indicated
> > > >>>>below,
> > > >>>>
> > > >>>>  fsync-tester
> > > >>>>  
> > > >>>>    fsync(fd)
> > > >>>>    Write delayed inode item to fs tree
> > > >>>>    
> > > >>>>      (assume transid to be 160)
> > > >>>>      (assume tree block to start at logical address 29720576)
> > > >>>>  
> > > >>>>  md5sum $testfile
> > > >>>>  
> > > >>>>    This causes a delayed inode to be added
> > > >>>>  
> > > >>>>  Load flakey table
> > > >>>>  
> > > >>>>    i.e. drop writes that are initiated from now onwards
> > > >>>>  
> > > >>>>  Unmount filesystem
> > > >>>>  
> > > >>>>    btrfs_sync_fs is invoked
> > > >>>>    
> > > >>>>      Write 29720576 metadata block to disk
> > > >>>>      free_extent_buffer(29720576)
> > > >>>>      
> > > >>>>        release_extent_buffer(29720576)
> > > >>>>    
> > > >>>>    Start writing delayed inode
> > > >>>>    
> > > >>>>      Traverse the fs tree
> > > >>>>      
> > > >>>>        (assume the parent tree block of 29720576 is still in memory)
> > > >>>>        When reading 29720576 from disk, parent's blkptr will have
> > > >>>>        generation
> > > >>>>        set to 160. But the on-disk tree block will have an older
> > > >>>>        generation (say, 158). Transid verification fails and hence
> > > >>>>        the
> > > >>>>        transaction gets aborted
> > > >>>>
> > > >>>>The test only cares about the FS instance before the unmount
> > > >>>>operation (i.e. the synced FS). Hence to get the test to pass, ignore
> > > >>>>the
> > > >>>>false-positive trace that could be generated.
> > > >>>>
> > > >>>>Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> > > >>>>---
> > > >>>>
> > > >>>>  tests/generic/311 | 2 ++
> > > >>>>  1 file changed, 2 insertions(+)
> > > >>>>
> > > >>>>diff --git a/tests/generic/311 b/tests/generic/311
> > > >>>>index d21b6eb..cd6391d 100755
> > > >>>>--- a/tests/generic/311
> > > >>>>+++ b/tests/generic/311
> > > >>>>@@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
> > > >>>>
> > > >>>>  [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
> > > >>>>
> > > >>>>+_disable_dmesg_check
> > > >>>>+
> > > >>>
> > > >>>Hmm, I don't think this is something we'd want to do unconditionally.
> > > >>>E.g., if something hits the logs for xfs or ext4, we probably want to
> > > >>>hear about it.
> > > >>
> > > >>Sorry, I forgot that I was dealing with generic tests. I agree with what
> > > >>you say.
> > > >>
> > > >>>Perhaps check that the fs is btrfs and possibly the fs params are such
> > > >>>that the known warning occurs..? I'd defer to the btrfs folks on how
> > > >>>best to check that, so long as it doesn't affect other fs'.
> > > >>
> > > >>The above trace is independent of any Btrfs parameters. Its just that it
> > > >>was easily reproducible on ppc64 with the subpagesize-blocksize
> > > >>patchset applied. So IMHO disabling call trace check when testing Btrfs
> > > >>filesystem is the way to go. But again, as you said, we can wait for
> > > >>comments from other Btrfs devs.
> > > >
> > > >As a general principle, we don't disable warnings/errors in tests
> > > >just to make tests pass.  The bug in the code being tested should be
> > > >fixed so the warning/error is not generated. If you do not want
> > > >to see the error, then expung the test from your
> > > >subpagesize-blocksize test runs.
> > > 
> > > The same problem would happen if you did echo 3>
> > > /proc/sys/vm/drop_caches right after dropping writes before doing
> > > the unmount.
> > 
> > Nice, you have multiple ways of reproducing the problem....
> 
> Dave, Apologies if I am sounding repetitive. The call trace that is showing up
> when executing the test isn't indicating a failure in Btrfs' code. Its a
> result of, 
> 
> 1. Perform fsync(fd).
> 2. Load dm-flakey.
>    i.e. drop writes from now onwards.
> 3. Write a dirty metadata btree block to disk.
>    The data however doesn't reach the disk. The corresponding block on disk
>    still has old data.

What needs to be written? The fsync should have written everything
that was needed to give a consistent filesystem on disk after a
unmount/mount cycle.

> 4. Evict the btree block from memory.

... then removing the dirty objects from memory (!!!!!)...

> 5. Read the btree block from disk.
>    Verification of the btree block read from the disk fails since the
>    transaction id expected from the btree block is different from the one
>    stored in the parent btree block. This causes the transaction to be aborted
>    and hence a call trace gets printed.

.... and failing later because the on-disk metadata is stale due to
the earlier write failure and subsequent dirty memory object
eviction..

> From the above we see that there is no issue in Btrfs. The code in fact is
> doing the right thing by aborting the transaction. Unfortunately the test
> notices the call trace in dmesg and gets marked as a failure.

[...]

> IMHO the problem lies in the test i.e. the test assumes that a btree block
> that was written to the disk after dm-flakey was loaded (with write operations
> being dropped) would never be evicted from memory and re-read again.

The test is making a pretty valid assumption - a fundamental
principle in filesystem design: never free dirty objects cached in
memory until it they are safe on disk and marked clean by IO
completion.  i.e. freeing dirty in-memory metadata objects that have
not been written back will *always* result in a corrupt filesystem
structure on disk that will be tripped over on the next cold read of
the metadata.

The elephant in the room: btrfs is a COW filesystem. How does it end
up with a stale metadata pointer in the btree when the last write of
the COW update is to change the superblock to point at the new,
completely written, consistent-on-disk btree?

And if btrfs is allowing inconsistent btrees to be seen on disk
(which should never happen with COW), why is that a problem of the
test (ie. the messenger) rather than the filesystem?

Josef, Chris, is this really how btrfs handles metadata write
failures? This handling of errors seems like a design flaw rather
than a desireable behaviour to me....

Cheers,

Dave.
Josef Bacik July 22, 2015, 12:37 a.m. UTC | #8
On 07/21/2015 08:01 PM, Dave Chinner wrote:
> On Tue, Jul 21, 2015 at 04:03:17PM +0530, Chandan Rajendra wrote:
>> On Tuesday 21 Jul 2015 08:12:20 Dave Chinner wrote:
>>> On Mon, Jul 20, 2015 at 08:55:32AM -0400, Josef Bacik wrote:
>>>> On 07/19/2015 07:54 PM, Dave Chinner wrote:
>>>>> On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
>>>>>> On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
>>>>>>> On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
>>>>>>>> When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
>>>>>>>> ppc64
>>>>>>>> with 4k sectorsize and 16k node/leaf size) I noticed the following
>>>>>>>> call
>>>>>>>> trace,
>>>>>>>>
>>>>>>>> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted
>>>>>>>> 160
>>>>>>>> found 158 BTRFS (device dm-0): parent transid verify failed on
>>>>>>>> 29720576
>>>>>>>> wanted 160 found 158 BTRFS: Transaction aborted (error -5)
>>>>>>>>
>>>>>>>> WARNING: at /root/repos/linux/fs/btrfs/super.c:260
>>>>>>>> Modules linked in:
>>>>>>>> CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
>>>>>>>> 4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti:
>>>>>>>> c000000079a48000
>>>>>>>> task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c
>>>>>>>> CTR:
>>>>>>>> c000000000779630
>>>>>>>> REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
>>>>>>>> (4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
>>>>>>>> CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
>>>>>>>> GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8
>>>>>>>> 0000000000000025
>>>>>>>> GPR04: 0000000000000001 0000000000000502 c00000000107e918
>>>>>>>> 0000000000000cda
>>>>>>>> GPR08: 0000000000000007 0000000000000007 0000000000000001
>>>>>>>> c0000000010f5044
>>>>>>>> GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000
>>>>>>>> 0000000010152e00
>>>>>>>> GPR16: 0000010002979380 0000000010140724 0000000000000000
>>>>>>>> 0000000000000000
>>>>>>>> GPR20: ffffffffffffffff 0000000000000000 0000000000000000
>>>>>>>> 0000000000000000
>>>>>>>> GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800
>>>>>>>> c000000000aac270
>>>>>>>> GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800
>>>>>>>> c0000000679204d0
>>>>>>>> NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
>>>>>>>> LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
>>>>>>>> Call Trace:
>>>>>>>> [c000000079a4b700] [c000000000499a9c]
>>>>>>>> .__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
>>>>>>>> [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
>>>>>>>> [c000000079a4b850] [c0000000004d5b3c]
>>>>>>>> .btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
>>>>>>>> [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
>>>>>>>> [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
>>>>>>>> [c000000079a4ba80] [c000000000218070]
>>>>>>>> .generic_shutdown_super+0x40/0x170
>>>>>>>> [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
>>>>>>>> [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
>>>>>>>> [c000000079a4bc10] [c000000000218ac8]
>>>>>>>> .deactivate_locked_super+0x98/0xe0
>>>>>>>> [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
>>>>>>>> [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
>>>>>>>> [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
>>>>>>>> [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
>>>>>>>> Instruction dump:
>>>>>>>> ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc
>>>>>>>> 3c62ffd2
>>>>>>>> 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000
>>>>>>>> 60000000
>>>>>>>> BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5
>>>>>>>> IO
>>>>>>>> failure
>>>>>>>>
>>>>>>>>
>>>>>>>> The call trace is seen when executing _run_test() for the 8th time.
>>>>>>>> The above trace is actually a false-positive failure as indicated
>>>>>>>> below,
>>>>>>>>
>>>>>>>>   fsync-tester
>>>>>>>>
>>>>>>>>     fsync(fd)
>>>>>>>>     Write delayed inode item to fs tree
>>>>>>>>
>>>>>>>>       (assume transid to be 160)
>>>>>>>>       (assume tree block to start at logical address 29720576)
>>>>>>>>
>>>>>>>>   md5sum $testfile
>>>>>>>>
>>>>>>>>     This causes a delayed inode to be added
>>>>>>>>
>>>>>>>>   Load flakey table
>>>>>>>>
>>>>>>>>     i.e. drop writes that are initiated from now onwards
>>>>>>>>
>>>>>>>>   Unmount filesystem
>>>>>>>>
>>>>>>>>     btrfs_sync_fs is invoked
>>>>>>>>
>>>>>>>>       Write 29720576 metadata block to disk
>>>>>>>>       free_extent_buffer(29720576)
>>>>>>>>
>>>>>>>>         release_extent_buffer(29720576)
>>>>>>>>
>>>>>>>>     Start writing delayed inode
>>>>>>>>
>>>>>>>>       Traverse the fs tree
>>>>>>>>
>>>>>>>>         (assume the parent tree block of 29720576 is still in memory)
>>>>>>>>         When reading 29720576 from disk, parent's blkptr will have
>>>>>>>>         generation
>>>>>>>>         set to 160. But the on-disk tree block will have an older
>>>>>>>>         generation (say, 158). Transid verification fails and hence
>>>>>>>>         the
>>>>>>>>         transaction gets aborted
>>>>>>>>
>>>>>>>> The test only cares about the FS instance before the unmount
>>>>>>>> operation (i.e. the synced FS). Hence to get the test to pass, ignore
>>>>>>>> the
>>>>>>>> false-positive trace that could be generated.
>>>>>>>>
>>>>>>>> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>   tests/generic/311 | 2 ++
>>>>>>>>   1 file changed, 2 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/tests/generic/311 b/tests/generic/311
>>>>>>>> index d21b6eb..cd6391d 100755
>>>>>>>> --- a/tests/generic/311
>>>>>>>> +++ b/tests/generic/311
>>>>>>>> @@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
>>>>>>>>
>>>>>>>>   [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
>>>>>>>>
>>>>>>>> +_disable_dmesg_check
>>>>>>>> +
>>>>>>>
>>>>>>> Hmm, I don't think this is something we'd want to do unconditionally.
>>>>>>> E.g., if something hits the logs for xfs or ext4, we probably want to
>>>>>>> hear about it.
>>>>>>
>>>>>> Sorry, I forgot that I was dealing with generic tests. I agree with what
>>>>>> you say.
>>>>>>
>>>>>>> Perhaps check that the fs is btrfs and possibly the fs params are such
>>>>>>> that the known warning occurs..? I'd defer to the btrfs folks on how
>>>>>>> best to check that, so long as it doesn't affect other fs'.
>>>>>>
>>>>>> The above trace is independent of any Btrfs parameters. Its just that it
>>>>>> was easily reproducible on ppc64 with the subpagesize-blocksize
>>>>>> patchset applied. So IMHO disabling call trace check when testing Btrfs
>>>>>> filesystem is the way to go. But again, as you said, we can wait for
>>>>>> comments from other Btrfs devs.
>>>>>
>>>>> As a general principle, we don't disable warnings/errors in tests
>>>>> just to make tests pass.  The bug in the code being tested should be
>>>>> fixed so the warning/error is not generated. If you do not want
>>>>> to see the error, then expung the test from your
>>>>> subpagesize-blocksize test runs.
>>>>
>>>> The same problem would happen if you did echo 3>
>>>> /proc/sys/vm/drop_caches right after dropping writes before doing
>>>> the unmount.
>>>
>>> Nice, you have multiple ways of reproducing the problem....
>>
>> Dave, Apologies if I am sounding repetitive. The call trace that is showing up
>> when executing the test isn't indicating a failure in Btrfs' code. Its a
>> result of,
>>
>> 1. Perform fsync(fd).
>> 2. Load dm-flakey.
>>     i.e. drop writes from now onwards.
>> 3. Write a dirty metadata btree block to disk.
>>     The data however doesn't reach the disk. The corresponding block on disk
>>     still has old data.
>
> What needs to be written? The fsync should have written everything
> that was needed to give a consistent filesystem on disk after a
> unmount/mount cycle.
>
>> 4. Evict the btree block from memory.
>
> ... then removing the dirty objects from memory (!!!!!)...
>
>> 5. Read the btree block from disk.
>>     Verification of the btree block read from the disk fails since the
>>     transaction id expected from the btree block is different from the one
>>     stored in the parent btree block. This causes the transaction to be aborted
>>     and hence a call trace gets printed.
>
> .... and failing later because the on-disk metadata is stale due to
> the earlier write failure and subsequent dirty memory object
> eviction..
>
>>  From the above we see that there is no issue in Btrfs. The code in fact is
>> doing the right thing by aborting the transaction. Unfortunately the test
>> notices the call trace in dmesg and gets marked as a failure.
>
> [...]
>
>> IMHO the problem lies in the test i.e. the test assumes that a btree block
>> that was written to the disk after dm-flakey was loaded (with write operations
>> being dropped) would never be evicted from memory and re-read again.
>
> The test is making a pretty valid assumption - a fundamental
> principle in filesystem design: never free dirty objects cached in
> memory until it they are safe on disk and marked clean by IO
> completion.  i.e. freeing dirty in-memory metadata objects that have
> not been written back will *always* result in a corrupt filesystem
> structure on disk that will be tripped over on the next cold read of
> the metadata.
>
> The elephant in the room: btrfs is a COW filesystem. How does it end
> up with a stale metadata pointer in the btree when the last write of
> the COW update is to change the superblock to point at the new,
> completely written, consistent-on-disk btree?
>
> And if btrfs is allowing inconsistent btrees to be seen on disk
> (which should never happen with COW), why is that a problem of the
> test (ie. the messenger) rather than the filesystem?
>
> Josef, Chris, is this really how btrfs handles metadata write
> failures? This handling of errors seems like a design flaw rather
> than a desireable behaviour to me....
>

Jeeze you are cranky Dave, lets think about what this test is doing.

write shit to fs
fsync said shit
drop all writes from this point onwards
unmount file system

This test is to verify that "said shit" is on disk after an fsync.  The 
"drop all writes from this point onwards" does this silently.  We assume 
when the device told us that any writes that we did completed 
successfully that they actually completed successfully.

This is where transid's come in, they are there to catch the case where 
the device told us it completed the write when it's a horrible horrible 
lying sack of shit.

So we do our fsync() which writes to our special log, and we do our FUA 
to our super.  Then we drop writes.

Meanwhile the helper threads finish and update the real actual tree we 
care about and this dirties metadata.  Then we unmount the file system, 
and all of this is lost.  But we don't know that, because WE AREN'T 
GETTING AN ERROR.

Maybe this is where you are getting confused.  Fsync() for btrfs isn't 
like fsync() for ext4/xfs.  You don't get all the dirtied things on 
disk, you get only the dirty things you care about, and specifically you 
only get the changed dirty things.  So we go add all the new extent 
entries to a special log tree and write _that_ out and point to that in 
our super block.  Then the actual real metadata is dirtied at some point 
afterwards and that is written out during the normal transaction commit 
time at some point in the future, at which point we drop the log because 
we know our stuff is in the real place.

In Chandan's case his metadata is being evicted from memory so it has to 
be read back during this point _after_ the drop all writes stuff.  So we 
think the IO had been completed, and so we're understandably pissed when 
we go to read back that data and it doesn't match what it should.  This 
is the flaw in the test itself, not btrfs.

So what exactly would you like us to do?  Keep all records of old trees 
forever and ever?  Yes, if the disk silently drops our writes and lies 
to us eventually we could possibly end up with inconsistent trees on 
disk because we will have free'd and re-allocated the older version of 
the tree block.  We'd run out of space if we operated on the idea that 
the drive could lie to us like that.

The same would happen in a journalling file system.  Eventually you have 
to wrap around in the journal and start writing over old journal 
entries.  If the disk lied and said it completed the writes when it 
never did you could end up in an equally shitty situation.  This isn't 
just a COW problem, this is an everybody problem.  We all spin the giant 
wheel of trade offs and decide that if the drive is going to lie to us 
this badly we hope you have backups.

This is part of the reason for dm-log-writes, to solve this kind of 
shitty situation that dm-flakey introduces.  We no longer have to worry 
about these weird races, we just replay the log and verify that we have 
a consistent file system at any arbitrary point.  So we can either

1) Drop/change the dmesg check for this test.  I'd really rather just 
say "ok we dropped writes, from here on we don't care about errors" 
until we turn writes back on, because honestly anything can happen. Or...

2) Replace this test with a dm-log-writes variant so we don't have to 
worry about the problem at all.

There is no glaring architectual problem with btrfs, we're just 
different because we're COW and do our fsync() different than everybody 
else.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 22, 2015, 5:27 a.m. UTC | #9
On Tue, Jul 21, 2015 at 08:37:39PM -0400, Josef Bacik wrote:
> On 07/21/2015 08:01 PM, Dave Chinner wrote:
> >On Tue, Jul 21, 2015 at 04:03:17PM +0530, Chandan Rajendra wrote:
> >>On Tuesday 21 Jul 2015 08:12:20 Dave Chinner wrote:
> >>>On Mon, Jul 20, 2015 at 08:55:32AM -0400, Josef Bacik wrote:
> >>>>On 07/19/2015 07:54 PM, Dave Chinner wrote:
> >>>>>On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
> >>>>>>On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
> >>>>>>>On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
> >>>>>>>>When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
> >>>>>>>>ppc64
> >>>>>>>>with 4k sectorsize and 16k node/leaf size) I noticed the following
> >>>>>>>>call
> >>>>>>>>trace,
> >>>>>>>>
> >>>>>>>>BTRFS (device dm-0): parent transid verify failed on 29720576 wanted
> >>>>>>>>160
> >>>>>>>>found 158 BTRFS (device dm-0): parent transid verify failed on
> >>>>>>>>29720576
> >>>>>>>>wanted 160 found 158 BTRFS: Transaction aborted (error -5)
> >>>>>>>>
> >>>>>>>>WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> >>>>>>>>Modules linked in:
> >>>>>>>>CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
> >>>>>>>>4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti:
> >>>>>>>>c000000079a48000
> >>>>>>>>task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c
> >>>>>>>>CTR:
> >>>>>>>>c000000000779630
> >>>>>>>>REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
> >>>>>>>>(4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
> >>>>>>>>CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
> >>>>>>>>GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8
> >>>>>>>>0000000000000025
> >>>>>>>>GPR04: 0000000000000001 0000000000000502 c00000000107e918
> >>>>>>>>0000000000000cda
> >>>>>>>>GPR08: 0000000000000007 0000000000000007 0000000000000001
> >>>>>>>>c0000000010f5044
> >>>>>>>>GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000
> >>>>>>>>0000000010152e00
> >>>>>>>>GPR16: 0000010002979380 0000000010140724 0000000000000000
> >>>>>>>>0000000000000000
> >>>>>>>>GPR20: ffffffffffffffff 0000000000000000 0000000000000000
> >>>>>>>>0000000000000000
> >>>>>>>>GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800
> >>>>>>>>c000000000aac270
> >>>>>>>>GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800
> >>>>>>>>c0000000679204d0
> >>>>>>>>NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> >>>>>>>>LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> >>>>>>>>Call Trace:
> >>>>>>>>[c000000079a4b700] [c000000000499a9c]
> >>>>>>>>.__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
> >>>>>>>>[c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> >>>>>>>>[c000000079a4b850] [c0000000004d5b3c]
> >>>>>>>>.btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
> >>>>>>>>[c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> >>>>>>>>[c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> >>>>>>>>[c000000079a4ba80] [c000000000218070]
> >>>>>>>>.generic_shutdown_super+0x40/0x170
> >>>>>>>>[c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> >>>>>>>>[c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> >>>>>>>>[c000000079a4bc10] [c000000000218ac8]
> >>>>>>>>.deactivate_locked_super+0x98/0xe0
> >>>>>>>>[c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> >>>>>>>>[c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> >>>>>>>>[c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> >>>>>>>>[c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> >>>>>>>>Instruction dump:
> >>>>>>>>ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc
> >>>>>>>>3c62ffd2
> >>>>>>>>7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000
> >>>>>>>>60000000
> >>>>>>>>BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5
> >>>>>>>>IO
> >>>>>>>>failure
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>The call trace is seen when executing _run_test() for the 8th time.
> >>>>>>>>The above trace is actually a false-positive failure as indicated
> >>>>>>>>below,
> >>>>>>>>
> >>>>>>>>  fsync-tester
> >>>>>>>>
> >>>>>>>>    fsync(fd)
> >>>>>>>>    Write delayed inode item to fs tree
> >>>>>>>>
> >>>>>>>>      (assume transid to be 160)
> >>>>>>>>      (assume tree block to start at logical address 29720576)
> >>>>>>>>
> >>>>>>>>  md5sum $testfile
> >>>>>>>>
> >>>>>>>>    This causes a delayed inode to be added
> >>>>>>>>
> >>>>>>>>  Load flakey table
> >>>>>>>>
> >>>>>>>>    i.e. drop writes that are initiated from now onwards
> >>>>>>>>
> >>>>>>>>  Unmount filesystem
> >>>>>>>>
> >>>>>>>>    btrfs_sync_fs is invoked
> >>>>>>>>
> >>>>>>>>      Write 29720576 metadata block to disk
> >>>>>>>>      free_extent_buffer(29720576)
> >>>>>>>>
> >>>>>>>>        release_extent_buffer(29720576)
> >>>>>>>>
> >>>>>>>>    Start writing delayed inode
> >>>>>>>>
> >>>>>>>>      Traverse the fs tree
> >>>>>>>>
> >>>>>>>>        (assume the parent tree block of 29720576 is still in memory)
> >>>>>>>>        When reading 29720576 from disk, parent's blkptr will have
> >>>>>>>>        generation
> >>>>>>>>        set to 160. But the on-disk tree block will have an older
> >>>>>>>>        generation (say, 158). Transid verification fails and hence
> >>>>>>>>        the
> >>>>>>>>        transaction gets aborted
> >>>>>>>>
> >>>>>>>>The test only cares about the FS instance before the unmount
> >>>>>>>>operation (i.e. the synced FS). Hence to get the test to pass, ignore
> >>>>>>>>the
> >>>>>>>>false-positive trace that could be generated.
> >>>>>>>>
> >>>>>>>>Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> >>>>>>>>---
> >>>>>>>>
> >>>>>>>>  tests/generic/311 | 2 ++
> >>>>>>>>  1 file changed, 2 insertions(+)
> >>>>>>>>
> >>>>>>>>diff --git a/tests/generic/311 b/tests/generic/311
> >>>>>>>>index d21b6eb..cd6391d 100755
> >>>>>>>>--- a/tests/generic/311
> >>>>>>>>+++ b/tests/generic/311
> >>>>>>>>@@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
> >>>>>>>>
> >>>>>>>>  [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
> >>>>>>>>
> >>>>>>>>+_disable_dmesg_check
> >>>>>>>>+
> >>>>>>>
> >>>>>>>Hmm, I don't think this is something we'd want to do unconditionally.
> >>>>>>>E.g., if something hits the logs for xfs or ext4, we probably want to
> >>>>>>>hear about it.
> >>>>>>
> >>>>>>Sorry, I forgot that I was dealing with generic tests. I agree with what
> >>>>>>you say.
> >>>>>>
> >>>>>>>Perhaps check that the fs is btrfs and possibly the fs params are such
> >>>>>>>that the known warning occurs..? I'd defer to the btrfs folks on how
> >>>>>>>best to check that, so long as it doesn't affect other fs'.
> >>>>>>
> >>>>>>The above trace is independent of any Btrfs parameters. Its just that it
> >>>>>>was easily reproducible on ppc64 with the subpagesize-blocksize
> >>>>>>patchset applied. So IMHO disabling call trace check when testing Btrfs
> >>>>>>filesystem is the way to go. But again, as you said, we can wait for
> >>>>>>comments from other Btrfs devs.
> >>>>>
> >>>>>As a general principle, we don't disable warnings/errors in tests
> >>>>>just to make tests pass.  The bug in the code being tested should be
> >>>>>fixed so the warning/error is not generated. If you do not want
> >>>>>to see the error, then expung the test from your
> >>>>>subpagesize-blocksize test runs.
> >>>>
> >>>>The same problem would happen if you did echo 3>
> >>>>/proc/sys/vm/drop_caches right after dropping writes before doing
> >>>>the unmount.
> >>>
> >>>Nice, you have multiple ways of reproducing the problem....
> >>
> >>Dave, Apologies if I am sounding repetitive. The call trace that is showing up
> >>when executing the test isn't indicating a failure in Btrfs' code. Its a
> >>result of,
> >>
> >>1. Perform fsync(fd).
> >>2. Load dm-flakey.
> >>    i.e. drop writes from now onwards.
> >>3. Write a dirty metadata btree block to disk.
> >>    The data however doesn't reach the disk. The corresponding block on disk
> >>    still has old data.
> >
> >What needs to be written? The fsync should have written everything
> >that was needed to give a consistent filesystem on disk after a
> >unmount/mount cycle.
> >
> >>4. Evict the btree block from memory.
> >
> >... then removing the dirty objects from memory (!!!!!)...
> >
> >>5. Read the btree block from disk.
> >>    Verification of the btree block read from the disk fails since the
> >>    transaction id expected from the btree block is different from the one
> >>    stored in the parent btree block. This causes the transaction to be aborted
> >>    and hence a call trace gets printed.
> >
> >.... and failing later because the on-disk metadata is stale due to
> >the earlier write failure and subsequent dirty memory object
> >eviction..
> >
> >> From the above we see that there is no issue in Btrfs. The code in fact is
> >>doing the right thing by aborting the transaction. Unfortunately the test
> >>notices the call trace in dmesg and gets marked as a failure.
> >
> >[...]
> >
> >>IMHO the problem lies in the test i.e. the test assumes that a btree block
> >>that was written to the disk after dm-flakey was loaded (with write operations
> >>being dropped) would never be evicted from memory and re-read again.
> >
> >The test is making a pretty valid assumption - a fundamental
> >principle in filesystem design: never free dirty objects cached in
> >memory until it they are safe on disk and marked clean by IO
> >completion.  i.e. freeing dirty in-memory metadata objects that have
> >not been written back will *always* result in a corrupt filesystem
> >structure on disk that will be tripped over on the next cold read of
> >the metadata.
> >
> >The elephant in the room: btrfs is a COW filesystem. How does it end
> >up with a stale metadata pointer in the btree when the last write of
> >the COW update is to change the superblock to point at the new,
> >completely written, consistent-on-disk btree?
> >
> >And if btrfs is allowing inconsistent btrees to be seen on disk
> >(which should never happen with COW), why is that a problem of the
> >test (ie. the messenger) rather than the filesystem?
> >
> >Josef, Chris, is this really how btrfs handles metadata write
> >failures? This handling of errors seems like a design flaw rather
> >than a desireable behaviour to me....
> >
> 
> Jeeze you are cranky Dave, lets think about what this test is doing.

Do you really see anything above that indicates I'm cranky? I'm just
following the logic as i read it, reaching a conclusion and the
saying "this doesn't sound right - can an expert please confirm I've
got this right?".  Last thing I expected was to be shouted at and
treated like an idiot.

BTW, you'll know when I'm cranky - swearing is a real good
indication that I'm pissed off at something or someone.

> write shit to fs
> fsync said shit
> drop all writes from this point onwards
> unmount file system
> 
> This test is to verify that "said shit" is on disk after an fsync.
> The "drop all writes from this point onwards" does this silently.
> We assume when the device told us that any writes that we did
> completed successfully that they actually completed successfully.

Sorry, I forgot it dm-flakey didn't return errors. My mistake - I
don't look at dm-flakey very often as the tests don't fail on XFS
very often.

BTW, doesn't such an obviously ludicrous mistake (in hindsight)
indicate to you that I simply forgot how dm-flakey behaves? I make
mistakes all the time; don't you forget how things work after not
looking at them for months at a time?

i.e.  asking "have you forgotten that dm-flakey completes writes
without errors?" would have been a much better reply than ranting,
shouting, swearing and treating people like they are idiots.

> In Chandan's case his metadata is being evicted from memory so it
> has to be read back during this point _after_ the drop all writes
> stuff.

At last - a clear explanation of the root cause.  The patch
description makes no mention of read after memory reclaim, and you
only described the problem caused by dropping caches as a "race"
without actually explaining what the race was.

The above two lines is such a simple, obvious description of the
problem that is clear to most fs developers. If the patch had the
above description, then I don't think anyone would have
misunderstood it. However, it was completely obfuscated by
lots of nitty, gritty details about confusing btrfs symptoms and
behaviour and so left us needing to try to understand what btrfs
is doing or not doing.

A better description of the problem would have saved a lot of time
(and the rest of us from an unnecessary shouty rant).

> This is part of the reason for dm-log-writes, to solve this kind of
> shitty situation that dm-flakey introduces.

Which you've also never explained in any of the dm-log-write patches
you've posted in the past.  IOWs, get the dm-log-writes support code
reviewed and fix the dm-flakey tests to use it so this problem goes
away.

Cheers,

Dave.
Josef Bacik July 22, 2015, 11:28 a.m. UTC | #10
On 07/22/2015 01:27 AM, Dave Chinner wrote:
> On Tue, Jul 21, 2015 at 08:37:39PM -0400, Josef Bacik wrote:
>> On 07/21/2015 08:01 PM, Dave Chinner wrote:
>>> On Tue, Jul 21, 2015 at 04:03:17PM +0530, Chandan Rajendra wrote:
>>>> On Tuesday 21 Jul 2015 08:12:20 Dave Chinner wrote:
>>>>> On Mon, Jul 20, 2015 at 08:55:32AM -0400, Josef Bacik wrote:
>>>>>> On 07/19/2015 07:54 PM, Dave Chinner wrote:
>>>>>>> On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
>>>>>>>> On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
>>>>>>>>> On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
>>>>>>>>>> When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
>>>>>>>>>> ppc64
>>>>>>>>>> with 4k sectorsize and 16k node/leaf size) I noticed the following
>>>>>>>>>> call
>>>>>>>>>> trace,
>>>>>>>>>>
>>>>>>>>>> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted
>>>>>>>>>> 160
>>>>>>>>>> found 158 BTRFS (device dm-0): parent transid verify failed on
>>>>>>>>>> 29720576
>>>>>>>>>> wanted 160 found 158 BTRFS: Transaction aborted (error -5)
>>>>>>>>>>
>>>>>>>>>> WARNING: at /root/repos/linux/fs/btrfs/super.c:260
>>>>>>>>>> Modules linked in:
>>>>>>>>>> CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
>>>>>>>>>> 4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti:
>>>>>>>>>> c000000079a48000
>>>>>>>>>> task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c
>>>>>>>>>> CTR:
>>>>>>>>>> c000000000779630
>>>>>>>>>> REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
>>>>>>>>>> (4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
>>>>>>>>>> CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
>>>>>>>>>> GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8
>>>>>>>>>> 0000000000000025
>>>>>>>>>> GPR04: 0000000000000001 0000000000000502 c00000000107e918
>>>>>>>>>> 0000000000000cda
>>>>>>>>>> GPR08: 0000000000000007 0000000000000007 0000000000000001
>>>>>>>>>> c0000000010f5044
>>>>>>>>>> GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000
>>>>>>>>>> 0000000010152e00
>>>>>>>>>> GPR16: 0000010002979380 0000000010140724 0000000000000000
>>>>>>>>>> 0000000000000000
>>>>>>>>>> GPR20: ffffffffffffffff 0000000000000000 0000000000000000
>>>>>>>>>> 0000000000000000
>>>>>>>>>> GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800
>>>>>>>>>> c000000000aac270
>>>>>>>>>> GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800
>>>>>>>>>> c0000000679204d0
>>>>>>>>>> NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
>>>>>>>>>> LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
>>>>>>>>>> Call Trace:
>>>>>>>>>> [c000000079a4b700] [c000000000499a9c]
>>>>>>>>>> .__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
>>>>>>>>>> [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
>>>>>>>>>> [c000000079a4b850] [c0000000004d5b3c]
>>>>>>>>>> .btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
>>>>>>>>>> [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
>>>>>>>>>> [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
>>>>>>>>>> [c000000079a4ba80] [c000000000218070]
>>>>>>>>>> .generic_shutdown_super+0x40/0x170
>>>>>>>>>> [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
>>>>>>>>>> [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
>>>>>>>>>> [c000000079a4bc10] [c000000000218ac8]
>>>>>>>>>> .deactivate_locked_super+0x98/0xe0
>>>>>>>>>> [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
>>>>>>>>>> [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
>>>>>>>>>> [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
>>>>>>>>>> [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
>>>>>>>>>> Instruction dump:
>>>>>>>>>> ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc
>>>>>>>>>> 3c62ffd2
>>>>>>>>>> 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000
>>>>>>>>>> 60000000
>>>>>>>>>> BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5
>>>>>>>>>> IO
>>>>>>>>>> failure
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The call trace is seen when executing _run_test() for the 8th time.
>>>>>>>>>> The above trace is actually a false-positive failure as indicated
>>>>>>>>>> below,
>>>>>>>>>>
>>>>>>>>>>   fsync-tester
>>>>>>>>>>
>>>>>>>>>>     fsync(fd)
>>>>>>>>>>     Write delayed inode item to fs tree
>>>>>>>>>>
>>>>>>>>>>       (assume transid to be 160)
>>>>>>>>>>       (assume tree block to start at logical address 29720576)
>>>>>>>>>>
>>>>>>>>>>   md5sum $testfile
>>>>>>>>>>
>>>>>>>>>>     This causes a delayed inode to be added
>>>>>>>>>>
>>>>>>>>>>   Load flakey table
>>>>>>>>>>
>>>>>>>>>>     i.e. drop writes that are initiated from now onwards
>>>>>>>>>>
>>>>>>>>>>   Unmount filesystem
>>>>>>>>>>
>>>>>>>>>>     btrfs_sync_fs is invoked
>>>>>>>>>>
>>>>>>>>>>       Write 29720576 metadata block to disk
>>>>>>>>>>       free_extent_buffer(29720576)
>>>>>>>>>>
>>>>>>>>>>         release_extent_buffer(29720576)
>>>>>>>>>>
>>>>>>>>>>     Start writing delayed inode
>>>>>>>>>>
>>>>>>>>>>       Traverse the fs tree
>>>>>>>>>>
>>>>>>>>>>         (assume the parent tree block of 29720576 is still in memory)
>>>>>>>>>>         When reading 29720576 from disk, parent's blkptr will have
>>>>>>>>>>         generation
>>>>>>>>>>         set to 160. But the on-disk tree block will have an older
>>>>>>>>>>         generation (say, 158). Transid verification fails and hence
>>>>>>>>>>         the
>>>>>>>>>>         transaction gets aborted
>>>>>>>>>>
>>>>>>>>>> The test only cares about the FS instance before the unmount
>>>>>>>>>> operation (i.e. the synced FS). Hence to get the test to pass, ignore
>>>>>>>>>> the
>>>>>>>>>> false-positive trace that could be generated.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>>   tests/generic/311 | 2 ++
>>>>>>>>>>   1 file changed, 2 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/tests/generic/311 b/tests/generic/311
>>>>>>>>>> index d21b6eb..cd6391d 100755
>>>>>>>>>> --- a/tests/generic/311
>>>>>>>>>> +++ b/tests/generic/311
>>>>>>>>>> @@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
>>>>>>>>>>
>>>>>>>>>>   [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
>>>>>>>>>>
>>>>>>>>>> +_disable_dmesg_check
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Hmm, I don't think this is something we'd want to do unconditionally.
>>>>>>>>> E.g., if something hits the logs for xfs or ext4, we probably want to
>>>>>>>>> hear about it.
>>>>>>>>
>>>>>>>> Sorry, I forgot that I was dealing with generic tests. I agree with what
>>>>>>>> you say.
>>>>>>>>
>>>>>>>>> Perhaps check that the fs is btrfs and possibly the fs params are such
>>>>>>>>> that the known warning occurs..? I'd defer to the btrfs folks on how
>>>>>>>>> best to check that, so long as it doesn't affect other fs'.
>>>>>>>>
>>>>>>>> The above trace is independent of any Btrfs parameters. Its just that it
>>>>>>>> was easily reproducible on ppc64 with the subpagesize-blocksize
>>>>>>>> patchset applied. So IMHO disabling call trace check when testing Btrfs
>>>>>>>> filesystem is the way to go. But again, as you said, we can wait for
>>>>>>>> comments from other Btrfs devs.
>>>>>>>
>>>>>>> As a general principle, we don't disable warnings/errors in tests
>>>>>>> just to make tests pass.  The bug in the code being tested should be
>>>>>>> fixed so the warning/error is not generated. If you do not want
>>>>>>> to see the error, then expung the test from your
>>>>>>> subpagesize-blocksize test runs.
>>>>>>
>>>>>> The same problem would happen if you did echo 3>
>>>>>> /proc/sys/vm/drop_caches right after dropping writes before doing
>>>>>> the unmount.
>>>>>
>>>>> Nice, you have multiple ways of reproducing the problem....
>>>>
>>>> Dave, Apologies if I am sounding repetitive. The call trace that is showing up
>>>> when executing the test isn't indicating a failure in Btrfs' code. Its a
>>>> result of,
>>>>
>>>> 1. Perform fsync(fd).
>>>> 2. Load dm-flakey.
>>>>     i.e. drop writes from now onwards.
>>>> 3. Write a dirty metadata btree block to disk.
>>>>     The data however doesn't reach the disk. The corresponding block on disk
>>>>     still has old data.
>>>
>>> What needs to be written? The fsync should have written everything
>>> that was needed to give a consistent filesystem on disk after a
>>> unmount/mount cycle.
>>>
>>>> 4. Evict the btree block from memory.
>>>
>>> ... then removing the dirty objects from memory (!!!!!)...
>>>
>>>> 5. Read the btree block from disk.
>>>>     Verification of the btree block read from the disk fails since the
>>>>     transaction id expected from the btree block is different from the one
>>>>     stored in the parent btree block. This causes the transaction to be aborted
>>>>     and hence a call trace gets printed.
>>>
>>> .... and failing later because the on-disk metadata is stale due to
>>> the earlier write failure and subsequent dirty memory object
>>> eviction..
>>>
>>>>  From the above we see that there is no issue in Btrfs. The code in fact is
>>>> doing the right thing by aborting the transaction. Unfortunately the test
>>>> notices the call trace in dmesg and gets marked as a failure.
>>>
>>> [...]
>>>
>>>> IMHO the problem lies in the test i.e. the test assumes that a btree block
>>>> that was written to the disk after dm-flakey was loaded (with write operations
>>>> being dropped) would never be evicted from memory and re-read again.
>>>
>>> The test is making a pretty valid assumption - a fundamental
>>> principle in filesystem design: never free dirty objects cached in
>>> memory until it they are safe on disk and marked clean by IO
>>> completion.  i.e. freeing dirty in-memory metadata objects that have
>>> not been written back will *always* result in a corrupt filesystem
>>> structure on disk that will be tripped over on the next cold read of
>>> the metadata.
>>>
>>> The elephant in the room: btrfs is a COW filesystem. How does it end
>>> up with a stale metadata pointer in the btree when the last write of
>>> the COW update is to change the superblock to point at the new,
>>> completely written, consistent-on-disk btree?
>>>
>>> And if btrfs is allowing inconsistent btrees to be seen on disk
>>> (which should never happen with COW), why is that a problem of the
>>> test (ie. the messenger) rather than the filesystem?
>>>
>>> Josef, Chris, is this really how btrfs handles metadata write
>>> failures? This handling of errors seems like a design flaw rather
>>> than a desireable behaviour to me....
>>>
>>
>> Jeeze you are cranky Dave, lets think about what this test is doing.
>
> Do you really see anything above that indicates I'm cranky? I'm just
> following the logic as i read it, reaching a conclusion and the
> saying "this doesn't sound right - can an expert please confirm I've
> got this right?".  Last thing I expected was to be shouted at and
> treated like an idiot.
>
> BTW, you'll know when I'm cranky - swearing is a real good
> indication that I'm pissed off at something or someone.
>

Sorry, I didn't mean to come across angry or condescending.  Obviously I 
appreciate your expertise and questions, I wasn't pissed or anything, 
this is just how I normally talk, you know I use swearing as 
punctuation.  I'm not going to keep arguing about the rest of this 
because it's pointless now.  I'm sorry it came across as angry and 
condescending, that was the furthest thing from my mind when I wrote it.

>> write shit to fs
>> fsync said shit
>> drop all writes from this point onwards
>> unmount file system
>>
>> This test is to verify that "said shit" is on disk after an fsync.
>> The "drop all writes from this point onwards" does this silently.
>> We assume when the device told us that any writes that we did
>> completed successfully that they actually completed successfully.
>
> Sorry, I forgot it dm-flakey didn't return errors. My mistake - I
> don't look at dm-flakey very often as the tests don't fail on XFS
> very often.
>
> BTW, doesn't such an obviously ludicrous mistake (in hindsight)
> indicate to you that I simply forgot how dm-flakey behaves? I make
> mistakes all the time; don't you forget how things work after not
> looking at them for months at a time?
>
> i.e.  asking "have you forgotten that dm-flakey completes writes
> without errors?" would have been a much better reply than ranting,
> shouting, swearing and treating people like they are idiots.
>
>> In Chandan's case his metadata is being evicted from memory so it
>> has to be read back during this point _after_ the drop all writes
>> stuff.
>
> At last - a clear explanation of the root cause.  The patch
> description makes no mention of read after memory reclaim, and you
> only described the problem caused by dropping caches as a "race"
> without actually explaining what the race was.
>
> The above two lines is such a simple, obvious description of the
> problem that is clear to most fs developers. If the patch had the
> above description, then I don't think anyone would have
> misunderstood it. However, it was completely obfuscated by
> lots of nitty, gritty details about confusing btrfs symptoms and
> behaviour and so left us needing to try to understand what btrfs
> is doing or not doing.
>
> A better description of the problem would have saved a lot of time
> (and the rest of us from an unnecessary shouty rant).
>
>> This is part of the reason for dm-log-writes, to solve this kind of
>> shitty situation that dm-flakey introduces.
>
> Which you've also never explained in any of the dm-log-write patches
> you've posted in the past.  IOWs, get the dm-log-writes support code
> reviewed and fix the dm-flakey tests to use it so this problem goes
> away.
>

The dm-log-writes target is already upstream, I sent the xfstests stuff 
months ago, not sure if they ever made it in.  I'll check and get them 
posted again if not.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik July 22, 2015, 12:47 p.m. UTC | #11
On 07/22/2015 01:27 AM, Dave Chinner wrote:
> On Tue, Jul 21, 2015 at 08:37:39PM -0400, Josef Bacik wrote:
>> On 07/21/2015 08:01 PM, Dave Chinner wrote:
>>> On Tue, Jul 21, 2015 at 04:03:17PM +0530, Chandan Rajendra wrote:
>>>> On Tuesday 21 Jul 2015 08:12:20 Dave Chinner wrote:
>>>>> On Mon, Jul 20, 2015 at 08:55:32AM -0400, Josef Bacik wrote:
>>>>>> On 07/19/2015 07:54 PM, Dave Chinner wrote:
>>>>>>> On Fri, Jul 17, 2015 at 05:10:50PM +0530, Chandan Rajendra wrote:
>>>>>>>> On Friday 17 Jul 2015 06:16:02 Brian Foster wrote:
>>>>>>>>> On Fri, Jul 17, 2015 at 12:56:43AM -0400, Chandan Rajendra wrote:
>>>>>>>>>> When running generic/311 on Btrfs' subpagesize-blocksize patchset (on
>>>>>>>>>> ppc64
>>>>>>>>>> with 4k sectorsize and 16k node/leaf size) I noticed the following
>>>>>>>>>> call
>>>>>>>>>> trace,
>>>>>>>>>>
>>>>>>>>>> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted
>>>>>>>>>> 160
>>>>>>>>>> found 158 BTRFS (device dm-0): parent transid verify failed on
>>>>>>>>>> 29720576
>>>>>>>>>> wanted 160 found 158 BTRFS: Transaction aborted (error -5)
>>>>>>>>>>
>>>>>>>>>> WARNING: at /root/repos/linux/fs/btrfs/super.c:260
>>>>>>>>>> Modules linked in:
>>>>>>>>>> CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L
>>>>>>>>>> 4.0.0-rc5-11671-g8b82e73e #63 task: c000000079aaddb0 ti:
>>>>>>>>>> c000000079a48000
>>>>>>>>>> task.ti: c000000079a48000 NIP: c000000000499aa0 LR: c000000000499a9c
>>>>>>>>>> CTR:
>>>>>>>>>> c000000000779630
>>>>>>>>>> REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L
>>>>>>>>>> (4.0.0-rc5-11671-g8b82e73e) MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>
>>>>>>>>>> CR: 28008828  XER: 20000000 CFAR: c000000000a23914 SOFTE: 1
>>>>>>>>>> GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8
>>>>>>>>>> 0000000000000025
>>>>>>>>>> GPR04: 0000000000000001 0000000000000502 c00000000107e918
>>>>>>>>>> 0000000000000cda
>>>>>>>>>> GPR08: 0000000000000007 0000000000000007 0000000000000001
>>>>>>>>>> c0000000010f5044
>>>>>>>>>> GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000
>>>>>>>>>> 0000000010152e00
>>>>>>>>>> GPR16: 0000010002979380 0000000010140724 0000000000000000
>>>>>>>>>> 0000000000000000
>>>>>>>>>> GPR20: ffffffffffffffff 0000000000000000 0000000000000000
>>>>>>>>>> 0000000000000000
>>>>>>>>>> GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800
>>>>>>>>>> c000000000aac270
>>>>>>>>>> GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800
>>>>>>>>>> c0000000679204d0
>>>>>>>>>> NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
>>>>>>>>>> LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
>>>>>>>>>> Call Trace:
>>>>>>>>>> [c000000079a4b700] [c000000000499a9c]
>>>>>>>>>> .__btrfs_abort_transaction+0x17c/0x190 (unreliable) [c000000079a4b7a0]
>>>>>>>>>> [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
>>>>>>>>>> [c000000079a4b850] [c0000000004d5b3c]
>>>>>>>>>> .btrfs_commit_transaction+0x37c/0xca0 [c000000079a4b960]
>>>>>>>>>> [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
>>>>>>>>>> [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
>>>>>>>>>> [c000000079a4ba80] [c000000000218070]
>>>>>>>>>> .generic_shutdown_super+0x40/0x170
>>>>>>>>>> [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
>>>>>>>>>> [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
>>>>>>>>>> [c000000079a4bc10] [c000000000218ac8]
>>>>>>>>>> .deactivate_locked_super+0x98/0xe0
>>>>>>>>>> [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
>>>>>>>>>> [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
>>>>>>>>>> [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
>>>>>>>>>> [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
>>>>>>>>>> Instruction dump:
>>>>>>>>>> ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc
>>>>>>>>>> 3c62ffd2
>>>>>>>>>> 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000
>>>>>>>>>> 60000000
>>>>>>>>>> BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5
>>>>>>>>>> IO
>>>>>>>>>> failure
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The call trace is seen when executing _run_test() for the 8th time.
>>>>>>>>>> The above trace is actually a false-positive failure as indicated
>>>>>>>>>> below,
>>>>>>>>>>
>>>>>>>>>>   fsync-tester
>>>>>>>>>>
>>>>>>>>>>     fsync(fd)
>>>>>>>>>>     Write delayed inode item to fs tree
>>>>>>>>>>
>>>>>>>>>>       (assume transid to be 160)
>>>>>>>>>>       (assume tree block to start at logical address 29720576)
>>>>>>>>>>
>>>>>>>>>>   md5sum $testfile
>>>>>>>>>>
>>>>>>>>>>     This causes a delayed inode to be added
>>>>>>>>>>
>>>>>>>>>>   Load flakey table
>>>>>>>>>>
>>>>>>>>>>     i.e. drop writes that are initiated from now onwards
>>>>>>>>>>
>>>>>>>>>>   Unmount filesystem
>>>>>>>>>>
>>>>>>>>>>     btrfs_sync_fs is invoked
>>>>>>>>>>
>>>>>>>>>>       Write 29720576 metadata block to disk
>>>>>>>>>>       free_extent_buffer(29720576)
>>>>>>>>>>
>>>>>>>>>>         release_extent_buffer(29720576)
>>>>>>>>>>
>>>>>>>>>>     Start writing delayed inode
>>>>>>>>>>
>>>>>>>>>>       Traverse the fs tree
>>>>>>>>>>
>>>>>>>>>>         (assume the parent tree block of 29720576 is still in memory)
>>>>>>>>>>         When reading 29720576 from disk, parent's blkptr will have
>>>>>>>>>>         generation
>>>>>>>>>>         set to 160. But the on-disk tree block will have an older
>>>>>>>>>>         generation (say, 158). Transid verification fails and hence
>>>>>>>>>>         the
>>>>>>>>>>         transaction gets aborted
>>>>>>>>>>
>>>>>>>>>> The test only cares about the FS instance before the unmount
>>>>>>>>>> operation (i.e. the synced FS). Hence to get the test to pass, ignore
>>>>>>>>>> the
>>>>>>>>>> false-positive trace that could be generated.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>>   tests/generic/311 | 2 ++
>>>>>>>>>>   1 file changed, 2 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/tests/generic/311 b/tests/generic/311
>>>>>>>>>> index d21b6eb..cd6391d 100755
>>>>>>>>>> --- a/tests/generic/311
>>>>>>>>>> +++ b/tests/generic/311
>>>>>>>>>> @@ -64,6 +64,8 @@ _require_xfs_io_command "falloc"
>>>>>>>>>>
>>>>>>>>>>   [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
>>>>>>>>>>
>>>>>>>>>> +_disable_dmesg_check
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Hmm, I don't think this is something we'd want to do unconditionally.
>>>>>>>>> E.g., if something hits the logs for xfs or ext4, we probably want to
>>>>>>>>> hear about it.
>>>>>>>>
>>>>>>>> Sorry, I forgot that I was dealing with generic tests. I agree with what
>>>>>>>> you say.
>>>>>>>>
>>>>>>>>> Perhaps check that the fs is btrfs and possibly the fs params are such
>>>>>>>>> that the known warning occurs..? I'd defer to the btrfs folks on how
>>>>>>>>> best to check that, so long as it doesn't affect other fs'.
>>>>>>>>
>>>>>>>> The above trace is independent of any Btrfs parameters. Its just that it
>>>>>>>> was easily reproducible on ppc64 with the subpagesize-blocksize
>>>>>>>> patchset applied. So IMHO disabling call trace check when testing Btrfs
>>>>>>>> filesystem is the way to go. But again, as you said, we can wait for
>>>>>>>> comments from other Btrfs devs.
>>>>>>>
>>>>>>> As a general principle, we don't disable warnings/errors in tests
>>>>>>> just to make tests pass.  The bug in the code being tested should be
>>>>>>> fixed so the warning/error is not generated. If you do not want
>>>>>>> to see the error, then expung the test from your
>>>>>>> subpagesize-blocksize test runs.
>>>>>>
>>>>>> The same problem would happen if you did echo 3>
>>>>>> /proc/sys/vm/drop_caches right after dropping writes before doing
>>>>>> the unmount.
>>>>>
>>>>> Nice, you have multiple ways of reproducing the problem....
>>>>
>>>> Dave, Apologies if I am sounding repetitive. The call trace that is showing up
>>>> when executing the test isn't indicating a failure in Btrfs' code. Its a
>>>> result of,
>>>>
>>>> 1. Perform fsync(fd).
>>>> 2. Load dm-flakey.
>>>>     i.e. drop writes from now onwards.
>>>> 3. Write a dirty metadata btree block to disk.
>>>>     The data however doesn't reach the disk. The corresponding block on disk
>>>>     still has old data.
>>>
>>> What needs to be written? The fsync should have written everything
>>> that was needed to give a consistent filesystem on disk after a
>>> unmount/mount cycle.
>>>
>>>> 4. Evict the btree block from memory.
>>>
>>> ... then removing the dirty objects from memory (!!!!!)...
>>>
>>>> 5. Read the btree block from disk.
>>>>     Verification of the btree block read from the disk fails since the
>>>>     transaction id expected from the btree block is different from the one
>>>>     stored in the parent btree block. This causes the transaction to be aborted
>>>>     and hence a call trace gets printed.
>>>
>>> .... and failing later because the on-disk metadata is stale due to
>>> the earlier write failure and subsequent dirty memory object
>>> eviction..
>>>
>>>>  From the above we see that there is no issue in Btrfs. The code in fact is
>>>> doing the right thing by aborting the transaction. Unfortunately the test
>>>> notices the call trace in dmesg and gets marked as a failure.
>>>
>>> [...]
>>>
>>>> IMHO the problem lies in the test i.e. the test assumes that a btree block
>>>> that was written to the disk after dm-flakey was loaded (with write operations
>>>> being dropped) would never be evicted from memory and re-read again.
>>>
>>> The test is making a pretty valid assumption - a fundamental
>>> principle in filesystem design: never free dirty objects cached in
>>> memory until it they are safe on disk and marked clean by IO
>>> completion.  i.e. freeing dirty in-memory metadata objects that have
>>> not been written back will *always* result in a corrupt filesystem
>>> structure on disk that will be tripped over on the next cold read of
>>> the metadata.
>>>
>>> The elephant in the room: btrfs is a COW filesystem. How does it end
>>> up with a stale metadata pointer in the btree when the last write of
>>> the COW update is to change the superblock to point at the new,
>>> completely written, consistent-on-disk btree?
>>>
>>> And if btrfs is allowing inconsistent btrees to be seen on disk
>>> (which should never happen with COW), why is that a problem of the
>>> test (ie. the messenger) rather than the filesystem?
>>>
>>> Josef, Chris, is this really how btrfs handles metadata write
>>> failures? This handling of errors seems like a design flaw rather
>>> than a desireable behaviour to me....
>>>
>>
>> Jeeze you are cranky Dave, lets think about what this test is doing.
>
> Do you really see anything above that indicates I'm cranky? I'm just
> following the logic as i read it, reaching a conclusion and the
> saying "this doesn't sound right - can an expert please confirm I've
> got this right?".  Last thing I expected was to be shouted at and
> treated like an idiot.
>
> BTW, you'll know when I'm cranky - swearing is a real good
> indication that I'm pissed off at something or someone.
>
>> write shit to fs
>> fsync said shit
>> drop all writes from this point onwards
>> unmount file system
>>
>> This test is to verify that "said shit" is on disk after an fsync.
>> The "drop all writes from this point onwards" does this silently.
>> We assume when the device told us that any writes that we did
>> completed successfully that they actually completed successfully.
>
> Sorry, I forgot it dm-flakey didn't return errors. My mistake - I
> don't look at dm-flakey very often as the tests don't fail on XFS
> very often.
>
> BTW, doesn't such an obviously ludicrous mistake (in hindsight)
> indicate to you that I simply forgot how dm-flakey behaves? I make
> mistakes all the time; don't you forget how things work after not
> looking at them for months at a time?
>

Ok one more thing I swear because honestly I feel like this right here 
is the root of everybody's frustrations.  You are not a btrfs expert, so 
I expect to explain btrfs behavior to you, that's what the bulk of our 
emails were about.  You are Dave Chinner, when you speak people listen 
because you have been here forever and done everything.  You are the 
center piece for most LSF conversations, you are CC'ed on anything 
remotely complicated in fs/ or mm related to fs stuff.  When you start 
throwing around things like

"Josef, Chris, is this really how btrfs handles metadata write
failures? This handling of errors seems like a design flaw rather
than a desireable behaviour to me..."

it carries weight.  I know you are asking an honest question, but the 
headlines of Phoronix will read "Btrfs: broken by design says long time 
fs guru Dave Chinner", and I'll spend the next year answering stupid 
assertions in email/reddit/person about how btrfs doesn't handle write 
errors properly.  Remember that email from Edward Shishkin like 7 years 
ago about how btrfs's btree was broken by design?  I _still_ have to 
explain to people that he just found a bug and it really wasn't a design 
problem, even though the bug was fixed in the same email thread.

Your opinion matters greatly to me and the community at large, when you 
start questioning btrfs's design I and everybody else takes that very 
seriously.  The tone I took was a mistake, it came across as ranty and 
angry, when I wanted more fun, light, and conversational.  I'm extremely 
sorry for that and will ship you booze in payment.

So yes it is completely legitimate that you didn't remember how 
dm-flakey works, but we mere mortals sometimes forget that you are 
mortal too ;).  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 22, 2015, 9:10 p.m. UTC | #12
On Wed, Jul 22, 2015 at 07:28:55AM -0400, Josef Bacik wrote:
> On 07/22/2015 01:27 AM, Dave Chinner wrote:
> >>>Josef, Chris, is this really how btrfs handles metadata write
> >>>failures? This handling of errors seems like a design flaw rather
> >>>than a desireable behaviour to me....
> >>>
> >>
> >>Jeeze you are cranky Dave, lets think about what this test is doing.
> >
> >Do you really see anything above that indicates I'm cranky? I'm just
> >following the logic as i read it, reaching a conclusion and the
> >saying "this doesn't sound right - can an expert please confirm I've
> >got this right?".  Last thing I expected was to be shouted at and
> >treated like an idiot.
> >
> >BTW, you'll know when I'm cranky - swearing is a real good
> >indication that I'm pissed off at something or someone.
> 
> Sorry, I didn't mean to come across angry or condescending.
> Obviously I appreciate your expertise and questions, I wasn't pissed
> or anything, this is just how I normally talk, you know I use
> swearing as punctuation.

Yes, I know that, and that doesn't worry me in the least.

You don't normally shout, though.

> >>This is part of the reason for dm-log-writes, to solve this kind of
> >>shitty situation that dm-flakey introduces.
> >
> >Which you've also never explained in any of the dm-log-write patches
> >you've posted in the past.  IOWs, get the dm-log-writes support code
> >reviewed and fix the dm-flakey tests to use it so this problem goes
> >away.
> 
> The dm-log-writes target is already upstream, I sent the xfstests
> stuff months ago, not sure if they ever made it in.  I'll check and
> get them posted again if not.  Thanks,

They didn't, because it wasn't clear when the kernel code was going
to be committed. I did notice yesterday that it was upstream.

Cheers,

Dave.
Dave Chinner July 22, 2015, 11:02 p.m. UTC | #13
On Wed, Jul 22, 2015 at 08:47:54AM -0400, Josef Bacik wrote:
> On 07/22/2015 01:27 AM, Dave Chinner wrote:
> >On Tue, Jul 21, 2015 at 08:37:39PM -0400, Josef Bacik wrote:
> >>This test is to verify that "said shit" is on disk after an fsync.
> >>The "drop all writes from this point onwards" does this silently.
> >>We assume when the device told us that any writes that we did
> >>completed successfully that they actually completed successfully.
> >
> >Sorry, I forgot it dm-flakey didn't return errors. My mistake - I
> >don't look at dm-flakey very often as the tests don't fail on XFS
> >very often.
> >
> >BTW, doesn't such an obviously ludicrous mistake (in hindsight)
> >indicate to you that I simply forgot how dm-flakey behaves? I make
> >mistakes all the time; don't you forget how things work after not
> >looking at them for months at a time?
> 
> Ok one more thing I swear because honestly I feel like this right
> here is the root of everybody's frustrations.  You are not a btrfs
> expert, so I expect to explain btrfs behavior to you, that's what
> the bulk of our emails were about.  You are Dave Chinner, when you
> speak people listen because you have been here forever and done
> everything.  You are the center piece for most LSF conversations,
> you are CC'ed on anything remotely complicated in fs/ or mm related
> to fs stuff.  When you start throwing around things like
> 
> "Josef, Chris, is this really how btrfs handles metadata write
> failures? This handling of errors seems like a design flaw rather
> than a desireable behaviour to me..."
> 
> it carries weight.

Yes, that's a big problem. First of all, I'd suggest that everyone
read this:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1118019/

It was written 15 years ago - and I've been thinking about it ever
since.  I came across it a couple of months ago here:

http://pipeline.corante.com/archives/2015/05/26/the_curse_of_expertise.php

What you've just said encapsulates the problem of being considered
an expert - people switch off their brain when the expert says
something, regardless of whether they are right or wrong, simply
because they are "the expert".

I don't want the "Expert" status that I'm given, or the weight being
a Maintainer conveys. I didn't choose to be either, they are just
baggage I have accumulated along the way. And I still make just as
many mistakes as I did years ago - probably more because I'm getting
old and senile - and I rely on peers to catch the mistakes I
make. If you guys are really switching your brains off rather
than thinking properly about what an "expert" says, then we've got a
wider perception problem we need to solve.

From that perspective, I can fully understand the premise in the
above article that "if what you say is being given "expert"
weighting, then it's time to move on".

But, as Derek commented on the article:

"We need more people for whom that could possibly become a problem,
and more people who would notice that it had become one."

As such, I'm not yet convinced that "move on" is the only solution
to the problem - making people aware of their unconcious biases must
help to some extent, as does making audiences acutely aware that the
"experts" are just as fallable as any other developer in the
project.....

> I know you are asking an honest question, but
> the headlines of Phoronix will read "Btrfs: broken by design says
> long time fs guru Dave Chinner",

s/Btrfs: broken by design/Phoronix Considered Harmful/

But, somehow, I don't think they'd publish such an article.... ;)

> and I'll spend the next year
> answering stupid assertions in email/reddit/person about how btrfs
> doesn't handle write errors properly.  Remember that email from
> Edward Shishkin like 7 years ago about how btrfs's btree was broken
> by design?  I _still_ have to explain to people that he just found a
> bug and it really wasn't a design problem, even though the bug was
> fixed in the same email thread.

That's pretty normal for any long term project. It doesn't even have
to be an Expert Opinion - google will elevate random blogs 

> Your opinion matters greatly to me and the community at large, when
> you start questioning btrfs's design I and everybody else takes that
> very seriously.

Which is exactly why I wanted you to clarify the situation - it did
not seem right to me....

> The tone I took was a mistake, it came across as
> ranty and angry, when I wanted more fun, light, and conversational.
> I'm extremely sorry for that and will ship you booze in payment.

I don't need more encouragement to drink. :/

> So yes it is completely legitimate that you didn't remember how
> dm-flakey works, but we mere mortals sometimes forget that you are
> mortal too ;).  Thanks,

Heh. The resident btrfs expert considers himself a "mere mortal". ;)

Cheers,

Dave.
Anand Jain Feb. 20, 2017, 3:03 p.m. UTC | #14
Hi Chandan,

On 07/17/15 12:56, Chandan Rajendra wrote:
> When running generic/311 on Btrfs' subpagesize-blocksize patchset (on ppc64
> with 4k sectorsize and 16k node/leaf size) I noticed the following call trace,
>
> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
> BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
> BTRFS: Transaction aborted (error -5)
>
> WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> Modules linked in:
> CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L 4.0.0-rc5-11671-g8b82e73e #63
> task: c000000079aaddb0 ti: c000000079a48000 task.ti: c000000079a48000
> NIP: c000000000499aa0 LR: c000000000499a9c CTR: c000000000779630
> REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L   (4.0.0-rc5-11671-g8b82e73e)
> MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>  CR: 28008828  XER: 20000000
> CFAR: c000000000a23914 SOFTE: 1
> GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
> GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
> GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
> GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
> GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
> GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
> GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
> GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
> NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> Call Trace:
> [c000000079a4b700] [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190 (unreliable)
> [c000000079a4b7a0] [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> [c000000079a4b850] [c0000000004d5b3c] .btrfs_commit_transaction+0x37c/0xca0
> [c000000079a4b960] [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> [c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
> [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> [c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
> [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> Instruction dump:
> ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
> 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
> BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO failure
>
>
> The call trace is seen when executing _run_test() for the 8th time.
> The above trace is actually a false-positive failure as indicated below,
>  fsync-tester
>    fsync(fd)
>    Write delayed inode item to fs tree
>      (assume transid to be 160)
>      (assume tree block to start at logical address 29720576)
>  md5sum $testfile
>    This causes a delayed inode to be added
>  Load flakey table
>    i.e. drop writes that are initiated from now onwards
>  Unmount filesystem
>    btrfs_sync_fs is invoked
>      Write 29720576 metadata block to disk
>      free_extent_buffer(29720576)
>        release_extent_buffer(29720576)
>    Start writing delayed inode
>      Traverse the fs tree
>        (assume the parent tree block of 29720576 is still in memory)
>        When reading 29720576 from disk, parent's blkptr will have generation
>        set to 160. But the on-disk tree block will have an older
>        generation (say, 158). Transid verification fails and hence the
>        transaction gets aborted
>
> The test only cares about the FS instance before the unmount
> operation (i.e. the synced FS). Hence to get the test to pass, ignore the
> false-positive trace that could be generated.

  Looks like this patch didn't make it, is there any kernel patch
  which fixed this bug ? Or any hints on how to reproduce this bug ?

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chandan Rajendra Feb. 22, 2017, 9:32 a.m. UTC | #15
On Monday, February 20, 2017 11:03:11 PM Anand Jain wrote:
> 
> Hi Chandan,
> 
> On 07/17/15 12:56, Chandan Rajendra wrote:
> > When running generic/311 on Btrfs' subpagesize-blocksize patchset (on ppc64
> > with 4k sectorsize and 16k node/leaf size) I noticed the following call trace,
> >
> > BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
> > BTRFS (device dm-0): parent transid verify failed on 29720576 wanted 160 found 158
> > BTRFS: Transaction aborted (error -5)
> >
> > WARNING: at /root/repos/linux/fs/btrfs/super.c:260
> > Modules linked in:
> > CPU: 3 PID: 30769 Comm: umount Tainted: G        W    L 4.0.0-rc5-11671-g8b82e73e #63
> > task: c000000079aaddb0 ti: c000000079a48000 task.ti: c000000079a48000
> > NIP: c000000000499aa0 LR: c000000000499a9c CTR: c000000000779630
> > REGS: c000000079a4b480 TRAP: 0700   Tainted: G        W   L   (4.0.0-rc5-11671-g8b82e73e)
> > MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>  CR: 28008828  XER: 20000000
> > CFAR: c000000000a23914 SOFTE: 1
> > GPR00: c000000000499a9c c000000079a4b700 c00000000103bdf8 0000000000000025
> > GPR04: 0000000000000001 0000000000000502 c00000000107e918 0000000000000cda
> > GPR08: 0000000000000007 0000000000000007 0000000000000001 c0000000010f5044
> > GPR12: 0000000028008822 c00000000fdc0d80 0000000020000000 0000000010152e00
> > GPR16: 0000010002979380 0000000010140724 0000000000000000 0000000000000000
> > GPR20: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
> > GPR24: c0000000151f61a8 0000000000000000 c000000055e5e800 c000000000aac270
> > GPR28: 00000000000004a4 fffffffffffffffb c000000055e5e800 c0000000679204d0
> > NIP [c000000000499aa0] .__btrfs_abort_transaction+0x180/0x190
> > LR [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190
> > Call Trace:
> > [c000000079a4b700] [c000000000499a9c] .__btrfs_abort_transaction+0x17c/0x190 (unreliable)
> > [c000000079a4b7a0] [c000000000541678] .__btrfs_run_delayed_items+0xe8/0x220
> > [c000000079a4b850] [c0000000004d5b3c] .btrfs_commit_transaction+0x37c/0xca0
> > [c000000079a4b960] [c00000000049824c] .btrfs_sync_fs+0x6c/0x1a0
> > [c000000079a4ba00] [c000000000255270] .sync_filesystem+0xd0/0x100
> > [c000000079a4ba80] [c000000000218070] .generic_shutdown_super+0x40/0x170
> > [c000000079a4bb10] [c000000000218598] .kill_anon_super+0x18/0x30
> > [c000000079a4bb90] [c000000000498418] .btrfs_kill_super+0x18/0xc0
> > [c000000079a4bc10] [c000000000218ac8] .deactivate_locked_super+0x98/0xe0
> > [c000000079a4bc90] [c00000000023e744] .cleanup_mnt+0x54/0xa0
> > [c000000079a4bd10] [c0000000000b7d14] .task_work_run+0x114/0x150
> > [c000000079a4bdb0] [c000000000015f84] .do_notify_resume+0x74/0x80
> > [c000000079a4be30] [c000000000009838] .ret_from_except_lite+0x64/0x68
> > Instruction dump:
> > ebc1fff0 ebe1fff8 4bfffb28 60000000 3ce2ffcd 38e7e818 4bffffbc 3c62ffd2
> > 7fa4eb78 3863b808 48589e1d 60000000 <0fe00000> 4bfffedc 60000000 60000000
> > BTRFS: error (device dm-0) in __btrfs_run_delayed_items:1188: errno=-5 IO failure
> >
> >
> > The call trace is seen when executing _run_test() for the 8th time.
> > The above trace is actually a false-positive failure as indicated below,
> >  fsync-tester
> >    fsync(fd)
> >    Write delayed inode item to fs tree
> >      (assume transid to be 160)
> >      (assume tree block to start at logical address 29720576)
> >  md5sum $testfile
> >    This causes a delayed inode to be added
> >  Load flakey table
> >    i.e. drop writes that are initiated from now onwards
> >  Unmount filesystem
> >    btrfs_sync_fs is invoked
> >      Write 29720576 metadata block to disk
> >      free_extent_buffer(29720576)
> >        release_extent_buffer(29720576)
> >    Start writing delayed inode
> >      Traverse the fs tree
> >        (assume the parent tree block of 29720576 is still in memory)
> >        When reading 29720576 from disk, parent's blkptr will have generation
> >        set to 160. But the on-disk tree block will have an older
> >        generation (say, 158). Transid verification fails and hence the
> >        transaction gets aborted
> >
> > The test only cares about the FS instance before the unmount
> > operation (i.e. the synced FS). Hence to get the test to pass, ignore the
> > false-positive trace that could be generated.
> 
>   Looks like this patch didn't make it, is there any kernel patch
>   which fixed this bug ? Or any hints on how to reproduce this bug ?
> 

Hi Anand,

This bug is easily recreated when executing the test on Btrfs with
subpage-blocksize patchset applied. I haven't been able to test the recently
rebased subpage-blocksize patchset yet.

Coming back to the issue ... The problem exists because the test code uses
dm-flakey. Josef had suggested that using dm-log-writes instead of dm-flakey
should fix the problem. I will work on this and post a patch soon.
Anand Jain Feb. 23, 2017, 6:40 a.m. UTC | #16
Hi Chandan,

> This bug is easily recreated when executing the test on Btrfs with
> subpage-blocksize patchset applied. I haven't been able to test the recently
> rebased subpage-blocksize patchset yet.
>
> Coming back to the issue ... The problem exists because the test code uses
> dm-flakey. Josef had suggested that using dm-log-writes instead of dm-flakey
> should fix the problem. I will work on this and post a patch soon.

  generic/311 is using the drop writes, it appears eio shouldn't be
  reported
	_load_flakey_table $FLAKEY_DROP_WRITES $lockfs

  Looks like there is a regression in the dm-flakey which was fixed by
  ---
  dm flakey: fix reads to be issued if drop_writes configured

  v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
  down_interval") overlooked the 'drop_writes' feature, which is meant to
  allow reads to be issued rather than errored, during the down_interval.

  Fixes: 99f3c90d0d ("dm flakey: error READ bios during the
  down_interval")
  ---

  That also explains why I couldn't reproduce this at mainline.


Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/generic/311 b/tests/generic/311
index d21b6eb..cd6391d 100755
--- a/tests/generic/311
+++ b/tests/generic/311
@@ -64,6 +64,8 @@  _require_xfs_io_command "falloc"
 
 [ -x $here/src/fsync-tester ] || _notrun "fsync-tester not built"
 
+_disable_dmesg_check
+
 rm -f $seqres.full
 SEED=1
 testfile=$SCRATCH_MNT/$seq.fsync