mbox series

[0/2] Fixup and optimization for write time tree checker

Message ID 20190404034708.3399-1-wqu@suse.com (mailing list archive)
Headers show
Series Fixup and optimization for write time tree checker | expand

Message

Qu Wenruo April 4, 2019, 3:47 a.m. UTC
This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/tree_checker_testing

Which is based on misc-next, with the following commit as HEAD:
commit 56d46f96de92ec69963acb7b1d9aed83d2a56a7b (david/misc-next-with-write-checks, david/misc-next)
Author: Nikolay Borisov <nborisov@suse.com>
Date:   Wed Mar 27 14:24:18 2019 +0200

    btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit

These two patches changes the behavior of write time tree-checker, where
the initial patches didn't check the content of the leaf, leaving memory
bit flipping possible to sneak in.

This patchset also introduces a new optimization, where original empty
leaf owner check can be pretty expensive and cause false alerts for
write time tree checker.
With this new optimization, write time tree checker can reuse the
existing btrfs_check_leaf_full().

Qu Wenruo (2):
  btrfs: tree-checker: Remove comprehensive root owner check
  btrfs: Do mandatory tree block check before submitting bio

 fs/btrfs/disk-io.c      | 13 +++++++++++++
 fs/btrfs/tree-checker.c | 24 ------------------------
 2 files changed, 13 insertions(+), 24 deletions(-)

Comments

David Sterba April 5, 2019, 3:49 p.m. UTC | #1
On Thu, Apr 04, 2019 at 11:47:06AM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/tree_checker_testing
> 
> Which is based on misc-next, with the following commit as HEAD:
> commit 56d46f96de92ec69963acb7b1d9aed83d2a56a7b (david/misc-next-with-write-checks, david/misc-next)
> Author: Nikolay Borisov <nborisov@suse.com>
> Date:   Wed Mar 27 14:24:18 2019 +0200
> 
>     btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit
> 
> These two patches changes the behavior of write time tree-checker, where
> the initial patches didn't check the content of the leaf, leaving memory
> bit flipping possible to sneak in.
> 
> This patchset also introduces a new optimization, where original empty
> leaf owner check can be pretty expensive and cause false alerts for
> write time tree checker.
> With this new optimization, write time tree checker can reuse the
> existing btrfs_check_leaf_full().

This on top of misc-next throws 6 'write time' errors:

btrfs/101               [14:00:24][ 3963.736352] run fstests btrfs/101 at 2019-04-05 14:00:24
[ 3970.417649] BTRFS error (device vdc): block=2446344192 write time tree block corruption detected

btrfs/151               [14:06:52][ 4351.133395] run fstests btrfs/151 at 2019-04-05 14:06:52
[ 4351.659771] BTRFS error (device vdb): block=570572800 write time tree block corruption detected

btrfs/164               [14:07:19][ 4378.349990] run fstests btrfs/164 at 2019-04-05 14:07:19
[ 4380.645918] BTRFS error (device vdb): block=2479882240 write time tree block corruption detected

btrfs/176               [14:07:39][ 4398.493736] run fstests btrfs/176 at 2019-04-05 14:07:39
[ 4399.186203] BTRFS error (device vdb): block=22036480 write time tree block corruption detected
[ 4399.302700] BTRFS error (device vdb): block=1372585984 write time tree block corruption detected

btrfs/184               [14:10:13][ 4552.496289] run fstests btrfs/184 at 2019-04-05 14:10:13
[ 4553.292780] BTRFS error (device vdb): block=4602216448 write time tree block corruption detected

Have you seen the errors during your testing?
Qu Wenruo April 6, 2019, 12:22 a.m. UTC | #2
On 2019/4/5 下午11:49, David Sterba wrote:
> On Thu, Apr 04, 2019 at 11:47:06AM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/tree_checker_testing
>>
>> Which is based on misc-next, with the following commit as HEAD:
>> commit 56d46f96de92ec69963acb7b1d9aed83d2a56a7b (david/misc-next-with-write-checks, david/misc-next)
>> Author: Nikolay Borisov <nborisov@suse.com>
>> Date:   Wed Mar 27 14:24:18 2019 +0200
>>
>>     btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit
>>
>> These two patches changes the behavior of write time tree-checker, where
>> the initial patches didn't check the content of the leaf, leaving memory
>> bit flipping possible to sneak in.
>>
>> This patchset also introduces a new optimization, where original empty
>> leaf owner check can be pretty expensive and cause false alerts for
>> write time tree checker.
>> With this new optimization, write time tree checker can reuse the
>> existing btrfs_check_leaf_full().
> 
> This on top of misc-next throws 6 'write time' errors:
> 
> btrfs/101               [14:00:24][ 3963.736352] run fstests btrfs/101 at 2019-04-05 14:00:24
> [ 3970.417649] BTRFS error (device vdc): block=2446344192 write time tree block corruption detected

In my case, these tests just pass, so I haven't checked the dmesg.

But indeed I can reproduce the same dmesg line.

Before removing the device, btrfs will use btrfs_shrink_device(new_size
= 0) to remove all data from that device.

And that will cause on-disk total_bytes to be 0, triggering the bug.

So indeed the behavior is different between read and write time.

I'll check how to do it correctly.

Thanks,
Qu

> 
> btrfs/151               [14:06:52][ 4351.133395] run fstests btrfs/151 at 2019-04-05 14:06:52
> [ 4351.659771] BTRFS error (device vdb): block=570572800 write time tree block corruption detected
> 
> btrfs/164               [14:07:19][ 4378.349990] run fstests btrfs/164 at 2019-04-05 14:07:19
> [ 4380.645918] BTRFS error (device vdb): block=2479882240 write time tree block corruption detected
> 
> btrfs/176               [14:07:39][ 4398.493736] run fstests btrfs/176 at 2019-04-05 14:07:39
> [ 4399.186203] BTRFS error (device vdb): block=22036480 write time tree block corruption detected
> [ 4399.302700] BTRFS error (device vdb): block=1372585984 write time tree block corruption detected
> 
> btrfs/184               [14:10:13][ 4552.496289] run fstests btrfs/184 at 2019-04-05 14:10:13
> [ 4553.292780] BTRFS error (device vdb): block=4602216448 write time tree block corruption detected
> 
> Have you seen the errors during your testing?
>
Qu Wenruo April 6, 2019, 1:57 a.m. UTC | #3
On 2019/4/6 上午8:22, Qu Wenruo wrote:
> 
> 
> On 2019/4/5 下午11:49, David Sterba wrote:
>> On Thu, Apr 04, 2019 at 11:47:06AM +0800, Qu Wenruo wrote:
>>> This patchset can be fetched from github:
>>> https://github.com/adam900710/linux/tree/tree_checker_testing
>>>
>>> Which is based on misc-next, with the following commit as HEAD:
>>> commit 56d46f96de92ec69963acb7b1d9aed83d2a56a7b (david/misc-next-with-write-checks, david/misc-next)
>>> Author: Nikolay Borisov <nborisov@suse.com>
>>> Date:   Wed Mar 27 14:24:18 2019 +0200
>>>
>>>     btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit
>>>
>>> These two patches changes the behavior of write time tree-checker, where
>>> the initial patches didn't check the content of the leaf, leaving memory
>>> bit flipping possible to sneak in.
>>>
>>> This patchset also introduces a new optimization, where original empty
>>> leaf owner check can be pretty expensive and cause false alerts for
>>> write time tree checker.
>>> With this new optimization, write time tree checker can reuse the
>>> existing btrfs_check_leaf_full().
>>
>> This on top of misc-next throws 6 'write time' errors:
>>
>> btrfs/101               [14:00:24][ 3963.736352] run fstests btrfs/101 at 2019-04-05 14:00:24
>> [ 3970.417649] BTRFS error (device vdc): block=2446344192 write time tree block corruption detected
> 
> In my case, these tests just pass, so I haven't checked the dmesg.

Oh, those tests passes because of a bug in mainline.

6dc4f100c175 ("block: allow bio_for_each_segment_all() to iterate over
multi-page bvec") prevents us from breaking bio_for_each_segment_all()
loop, thus not triggering transaction abort.

A nice finding digging into the report.

Thanks,
Qu

> 
> But indeed I can reproduce the same dmesg line.
> 
> Before removing the device, btrfs will use btrfs_shrink_device(new_size
> = 0) to remove all data from that device.
> 
> And that will cause on-disk total_bytes to be 0, triggering the bug.
> 
> So indeed the behavior is different between read and write time.
> 
> I'll check how to do it correctly.
> 
> Thanks,
> Qu
> 
>>
>> btrfs/151               [14:06:52][ 4351.133395] run fstests btrfs/151 at 2019-04-05 14:06:52
>> [ 4351.659771] BTRFS error (device vdb): block=570572800 write time tree block corruption detected
>>
>> btrfs/164               [14:07:19][ 4378.349990] run fstests btrfs/164 at 2019-04-05 14:07:19
>> [ 4380.645918] BTRFS error (device vdb): block=2479882240 write time tree block corruption detected
>>
>> btrfs/176               [14:07:39][ 4398.493736] run fstests btrfs/176 at 2019-04-05 14:07:39
>> [ 4399.186203] BTRFS error (device vdb): block=22036480 write time tree block corruption detected
>> [ 4399.302700] BTRFS error (device vdb): block=1372585984 write time tree block corruption detected
>>
>> btrfs/184               [14:10:13][ 4552.496289] run fstests btrfs/184 at 2019-04-05 14:10:13
>> [ 4553.292780] BTRFS error (device vdb): block=4602216448 write time tree block corruption detected
>>
>> Have you seen the errors during your testing?
>>
>
David Sterba April 8, 2019, 10:18 p.m. UTC | #4
On Sat, Apr 06, 2019 at 09:57:35AM +0800, Qu Wenruo wrote:
> >>>
> >>> This patchset also introduces a new optimization, where original empty
> >>> leaf owner check can be pretty expensive and cause false alerts for
> >>> write time tree checker.
> >>> With this new optimization, write time tree checker can reuse the
> >>> existing btrfs_check_leaf_full().
> >>
> >> This on top of misc-next throws 6 'write time' errors:
> >>
> >> btrfs/101               [14:00:24][ 3963.736352] run fstests btrfs/101 at 2019-04-05 14:00:24
> >> [ 3970.417649] BTRFS error (device vdc): block=2446344192 write time tree block corruption detected
> > 
> > In my case, these tests just pass, so I haven't checked the dmesg.

I have a filter for log (it's mix of terminal and kernel log)

egrep -a '(run fstests|WARNING:|BUG:|RIP:|UBSAN:|^\s+[+-]|CPU:.*Comm:|write time|read time|blocked for more than|Linux version)'

so the unexpected output is not lost among other messages.

> Oh, those tests passes because of a bug in mainline.
> 
> 6dc4f100c175 ("block: allow bio_for_each_segment_all() to iterate over
> multi-page bvec") prevents us from breaking bio_for_each_segment_all()
> loop, thus not triggering transaction abort.
> 
> A nice finding digging into the report.

Indeed, thanks. The hidden for-in-for in the macro would bite us later.