diff mbox series

[2/8] btrfs: do not check header generation in btrfs_clean_tree_block

Message ID ef73c4c67028f9e7d770dca236367f1ea0b56b55.1670451918.git.josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series extent buffer dirty cleanups | expand

Commit Message

Josef Bacik Dec. 7, 2022, 10:28 p.m. UTC
This check is from an era where we didn't have a per-extent buffer dirty
flag, we just messed with the page bits.  All the places we call this
function are when we have a transaction open, so just skip this check
and rely on the dirty flag.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/disk-io.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

Comments

Qu Wenruo Dec. 16, 2022, 5:32 a.m. UTC | #1
On 2022/12/8 06:28, Josef Bacik wrote:
> This check is from an era where we didn't have a per-extent buffer dirty
> flag, we just messed with the page bits.  All the places we call this
> function are when we have a transaction open, so just skip this check
> and rely on the dirty flag.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

This patch is failing a lot of test cases, mostly scrub related 
(btrfs/072, btrfs/074).

Now we will report all kinds of errors during scrub.
Which seems weird, as scrub doesn't use the regular extent buffer 
helpers at all...

Maybe it's related to the generation we got during the extent tree search?
As all the failures points to generation mismatch during scrub.

Thanks,
Qu

> ---
>   fs/btrfs/disk-io.c | 15 ++++++---------
>   1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d0ed52cab304..267163e546a5 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -968,16 +968,13 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>   void btrfs_clean_tree_block(struct extent_buffer *buf)
>   {
>   	struct btrfs_fs_info *fs_info = buf->fs_info;
> -	if (btrfs_header_generation(buf) ==
> -	    fs_info->running_transaction->transid) {
> -		btrfs_assert_tree_write_locked(buf);
> +	btrfs_assert_tree_write_locked(buf);
>   
> -		if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) {
> -			percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> -						 -buf->len,
> -						 fs_info->dirty_metadata_batch);
> -			clear_extent_buffer_dirty(buf);
> -		}
> +	if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) {
> +		percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
> +					 -buf->len,
> +					 fs_info->dirty_metadata_batch);
> +		clear_extent_buffer_dirty(buf);
>   	}
>   }
>
Qu Wenruo Dec. 17, 2022, 2:10 a.m. UTC | #2
On 2022/12/16 13:32, Qu Wenruo wrote:
> 
> 
> On 2022/12/8 06:28, Josef Bacik wrote:
>> This check is from an era where we didn't have a per-extent buffer dirty
>> flag, we just messed with the page bits.  All the places we call this
>> function are when we have a transaction open, so just skip this check
>> and rely on the dirty flag.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> 
> This patch is failing a lot of test cases, mostly scrub related 
> (btrfs/072, btrfs/074).
> 
> Now we will report all kinds of errors during scrub.
> Which seems weird, as scrub doesn't use the regular extent buffer 
> helpers at all...
> 
> Maybe it's related to the generation we got during the extent tree search?
> As all the failures points to generation mismatch during scrub.

I got some extra digging, and it turns out that, if we unconditionally 
clear the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks 
writeback for commit root.

Here is some trace for one extent buffer:

     btrfs_init_new_buffer: eb 7110656 set generation 13
     btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13
     __btrfs_cow_block: eb 7110656 set generation 13

Above 3 lines are where the eb 7110656 got created as a cowed tree block.

     update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13

The eb 7110656 is cowed from 6930432, and that's created at transid 13.

     update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14

But that eb 7110656 got CoWed again in transaction 14. Which means, eb 
7110656 is no longer referred in transid 14, but is still referred in 
transid 13.

     btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14

Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656.

This has a consequence that, the tree block 7110656 will not be written 
back, even it's still referred in transid 13.

This is where the problem is, previously we will continue writing back 
eb 7110656, as its transid is not the running transid, meaning some 
commit root still needs it.

Especially note that, I have added trace output for any tree block write 
back (in btree_csum_one_bio()).
But there is no such trace, meaning the tree block is never properly 
written back.

This also exposed another problem, if we didn't properly writeback tree 
blocks in commit root, we break COW, thus no proper transactional 
protection for our metadata.

     scrub_simple_mirror: tree 7110656 mirror 1 wanted generation 13
                          running trans 14
     scrub_checksum_tree_block: tree generation mismatch, tree 7110656
                                mirror 1 running trans 14, has 15 want 13
     scrub_checksum_tree_block: tree generation mismatch, tree 7110656
                                mirror 0 running trans 14, has 15 want 13

The above lines just shows the scrub failure for it.
As the tree block is not properly written back, we just read out some 
garbage.

And unfortunately our scrub code only checks bytenr, then generation, 
not even checking the fsid, thus we got the generation mismatch error.

     ...
     btree_csum_one_bio: eb 7110656 gen 26

There is an example to prove that, I have added tree block writeback 
trace event.

Thus I'd prefer to have at least a comment explaining why we can not 
just clean the dirty bit for a dirty eb which is not dirtied during the 
current running transaction.

Thanks,
Qu


> 
> Thanks,
> Qu
> 
>> ---
>>   fs/btrfs/disk-io.c | 15 ++++++---------
>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d0ed52cab304..267163e546a5 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -968,16 +968,13 @@ struct extent_buffer *read_tree_block(struct 
>> btrfs_fs_info *fs_info, u64 bytenr,
>>   void btrfs_clean_tree_block(struct extent_buffer *buf)
>>   {
>>       struct btrfs_fs_info *fs_info = buf->fs_info;
>> -    if (btrfs_header_generation(buf) ==
>> -        fs_info->running_transaction->transid) {
>> -        btrfs_assert_tree_write_locked(buf);
>> +    btrfs_assert_tree_write_locked(buf);
>> -        if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) {
>> -            percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
>> -                         -buf->len,
>> -                         fs_info->dirty_metadata_batch);
>> -            clear_extent_buffer_dirty(buf);
>> -        }
>> +    if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) {
>> +        percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
>> +                     -buf->len,
>> +                     fs_info->dirty_metadata_batch);
>> +        clear_extent_buffer_dirty(buf);
>>       }
>>   }
David Sterba Jan. 10, 2023, 3:33 p.m. UTC | #3
On Sat, Dec 17, 2022 at 10:10:24AM +0800, Qu Wenruo wrote:
> On 2022/12/16 13:32, Qu Wenruo wrote:
> > On 2022/12/8 06:28, Josef Bacik wrote:
> >> This check is from an era where we didn't have a per-extent buffer dirty
> >> flag, we just messed with the page bits.  All the places we call this
> >> function are when we have a transaction open, so just skip this check
> >> and rely on the dirty flag.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > 
> > This patch is failing a lot of test cases, mostly scrub related 
> > (btrfs/072, btrfs/074).
> > 
> > Now we will report all kinds of errors during scrub.
> > Which seems weird, as scrub doesn't use the regular extent buffer 
> > helpers at all...
> > 
> > Maybe it's related to the generation we got during the extent tree search?
> > As all the failures points to generation mismatch during scrub.
> 
> I got some extra digging, and it turns out that, if we unconditionally 
> clear the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks 
> writeback for commit root.
> 
> Here is some trace for one extent buffer:
> 
>      btrfs_init_new_buffer: eb 7110656 set generation 13
>      btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13
>      __btrfs_cow_block: eb 7110656 set generation 13
> 
> Above 3 lines are where the eb 7110656 got created as a cowed tree block.
> 
>      update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13
> 
> The eb 7110656 is cowed from 6930432, and that's created at transid 13.
> 
>      update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14
> 
> But that eb 7110656 got CoWed again in transaction 14. Which means, eb 
> 7110656 is no longer referred in transid 14, but is still referred in 
> transid 13.
> 
>      btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14
> 
> Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656.
> 
> This has a consequence that, the tree block 7110656 will not be written 
> back, even it's still referred in transid 13.
> 
> This is where the problem is, previously we will continue writing back 
> eb 7110656, as its transid is not the running transid, meaning some 
> commit root still needs it.
> 
> Especially note that, I have added trace output for any tree block write 
> back (in btree_csum_one_bio()).
> But there is no such trace, meaning the tree block is never properly 
> written back.
> 
> This also exposed another problem, if we didn't properly writeback tree 
> blocks in commit root, we break COW, thus no proper transactional 
> protection for our metadata.
> 
>      scrub_simple_mirror: tree 7110656 mirror 1 wanted generation 13
>                           running trans 14
>      scrub_checksum_tree_block: tree generation mismatch, tree 7110656
>                                 mirror 1 running trans 14, has 15 want 13
>      scrub_checksum_tree_block: tree generation mismatch, tree 7110656
>                                 mirror 0 running trans 14, has 15 want 13
> 
> The above lines just shows the scrub failure for it.
> As the tree block is not properly written back, we just read out some 
> garbage.
> 
> And unfortunately our scrub code only checks bytenr, then generation, 
> not even checking the fsid, thus we got the generation mismatch error.
> 
>      ...
>      btree_csum_one_bio: eb 7110656 gen 26
> 
> There is an example to prove that, I have added tree block writeback 
> trace event.
> 
> Thus I'd prefer to have at least a comment explaining why we can not 
> just clean the dirty bit for a dirty eb which is not dirtied during the 
> current running transaction.

I've temporarily removed the patchset from for-next so we don't have
test failures over the holidays, now it's time to add it back, but based
on the above there are changes needed, right? If yes, I'll wait for the
update.
Qu Wenruo Jan. 10, 2023, 11:03 p.m. UTC | #4
On 2023/1/10 23:33, David Sterba wrote:
> On Sat, Dec 17, 2022 at 10:10:24AM +0800, Qu Wenruo wrote:
>> On 2022/12/16 13:32, Qu Wenruo wrote:
>>> On 2022/12/8 06:28, Josef Bacik wrote:
>>>> This check is from an era where we didn't have a per-extent buffer dirty
>>>> flag, we just messed with the page bits.  All the places we call this
>>>> function are when we have a transaction open, so just skip this check
>>>> and rely on the dirty flag.
>>>>
>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>
>>> This patch is failing a lot of test cases, mostly scrub related
>>> (btrfs/072, btrfs/074).
>>>
>>> Now we will report all kinds of errors during scrub.
>>> Which seems weird, as scrub doesn't use the regular extent buffer
>>> helpers at all...
>>>
>>> Maybe it's related to the generation we got during the extent tree search?
>>> As all the failures points to generation mismatch during scrub.
>>
>> I got some extra digging, and it turns out that, if we unconditionally
>> clear the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks
>> writeback for commit root.
>>
>> Here is some trace for one extent buffer:
>>
>>       btrfs_init_new_buffer: eb 7110656 set generation 13
>>       btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13
>>       __btrfs_cow_block: eb 7110656 set generation 13
>>
>> Above 3 lines are where the eb 7110656 got created as a cowed tree block.
>>
>>       update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13
>>
>> The eb 7110656 is cowed from 6930432, and that's created at transid 13.
>>
>>       update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14
>>
>> But that eb 7110656 got CoWed again in transaction 14. Which means, eb
>> 7110656 is no longer referred in transid 14, but is still referred in
>> transid 13.
>>
>>       btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14
>>
>> Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656.
>>
>> This has a consequence that, the tree block 7110656 will not be written
>> back, even it's still referred in transid 13.
>>
>> This is where the problem is, previously we will continue writing back
>> eb 7110656, as its transid is not the running transid, meaning some
>> commit root still needs it.
>>
>> Especially note that, I have added trace output for any tree block write
>> back (in btree_csum_one_bio()).
>> But there is no such trace, meaning the tree block is never properly
>> written back.
>>
>> This also exposed another problem, if we didn't properly writeback tree
>> blocks in commit root, we break COW, thus no proper transactional
>> protection for our metadata.
>>
>>       scrub_simple_mirror: tree 7110656 mirror 1 wanted generation 13
>>                            running trans 14
>>       scrub_checksum_tree_block: tree generation mismatch, tree 7110656
>>                                  mirror 1 running trans 14, has 15 want 13
>>       scrub_checksum_tree_block: tree generation mismatch, tree 7110656
>>                                  mirror 0 running trans 14, has 15 want 13
>>
>> The above lines just shows the scrub failure for it.
>> As the tree block is not properly written back, we just read out some
>> garbage.
>>
>> And unfortunately our scrub code only checks bytenr, then generation,
>> not even checking the fsid, thus we got the generation mismatch error.
>>
>>       ...
>>       btree_csum_one_bio: eb 7110656 gen 26
>>
>> There is an example to prove that, I have added tree block writeback
>> trace event.
>>
>> Thus I'd prefer to have at least a comment explaining why we can not
>> just clean the dirty bit for a dirty eb which is not dirtied during the
>> current running transaction.
> 
> I've temporarily removed the patchset from for-next so we don't have
> test failures over the holidays, now it's time to add it back, but based
> on the above there are changes needed, right? If yes, I'll wait for the
> update.

I believe we should drop this patch from the series.

But since it's at the very beginning of the series, and a lot of later 
patches are depending on it, it needs some work to resolve it.

Thanks,
Qu
Josef Bacik Jan. 11, 2023, 7:56 p.m. UTC | #5
On Sat, Dec 17, 2022 at 10:10:24AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/12/16 13:32, Qu Wenruo wrote:
> > 
> > 
> > On 2022/12/8 06:28, Josef Bacik wrote:
> > > This check is from an era where we didn't have a per-extent buffer dirty
> > > flag, we just messed with the page bits.  All the places we call this
> > > function are when we have a transaction open, so just skip this check
> > > and rely on the dirty flag.
> > > 
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > 
> > This patch is failing a lot of test cases, mostly scrub related
> > (btrfs/072, btrfs/074).
> > 
> > Now we will report all kinds of errors during scrub.
> > Which seems weird, as scrub doesn't use the regular extent buffer
> > helpers at all...
> > 
> > Maybe it's related to the generation we got during the extent tree search?
> > As all the failures points to generation mismatch during scrub.
> 
> I got some extra digging, and it turns out that, if we unconditionally clear
> the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks writeback for
> commit root.
> 
> Here is some trace for one extent buffer:
> 
>     btrfs_init_new_buffer: eb 7110656 set generation 13
>     btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13
>     __btrfs_cow_block: eb 7110656 set generation 13
> 
> Above 3 lines are where the eb 7110656 got created as a cowed tree block.
> 
>     update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13
> 
> The eb 7110656 is cowed from 6930432, and that's created at transid 13.
> 
>     update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14
> 
> But that eb 7110656 got CoWed again in transaction 14. Which means, eb
> 7110656 is no longer referred in transid 14, but is still referred in
> transid 13.
> 
>     btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14
> 
> Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656.
> 

Ooooh well that's not good, we shouldn't be calling btrfs_clean_tree_block in
this case, that's the real bug.  I'll fix that and update the series.  Nice
catch, thanks!

Josef
David Sterba Jan. 11, 2023, 8:44 p.m. UTC | #6
On Wed, Jan 11, 2023 at 07:03:06AM +0800, Qu Wenruo wrote:
> > I've temporarily removed the patchset from for-next so we don't have
> > test failures over the holidays, now it's time to add it back, but based
> > on the above there are changes needed, right? If yes, I'll wait for the
> > update.
> 
> I believe we should drop this patch from the series.
> 
> But since it's at the very beginning of the series, and a lot of later 
> patches are depending on it, it needs some work to resolve it.

Ok, so it's for Josef, please have a look.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d0ed52cab304..267163e546a5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -968,16 +968,13 @@  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
 void btrfs_clean_tree_block(struct extent_buffer *buf)
 {
 	struct btrfs_fs_info *fs_info = buf->fs_info;
-	if (btrfs_header_generation(buf) ==
-	    fs_info->running_transaction->transid) {
-		btrfs_assert_tree_write_locked(buf);
+	btrfs_assert_tree_write_locked(buf);
 
-		if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) {
-			percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
-						 -buf->len,
-						 fs_info->dirty_metadata_batch);
-			clear_extent_buffer_dirty(buf);
-		}
+	if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) {
+		percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
+					 -buf->len,
+					 fs_info->dirty_metadata_batch);
+		clear_extent_buffer_dirty(buf);
 	}
 }