diff mbox series

btrfs: don't clear EXTENT_BUFFER_DIRTY bit if the tree block is not dirtied in our transaction

Message ID ed62dbe7094fb5e80cb5b6df8590ff621b218ae5.1671252987.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: don't clear EXTENT_BUFFER_DIRTY bit if the tree block is not dirtied in our transaction | expand

Commit Message

Qu Wenruo Dec. 17, 2022, 4:57 a.m. UTC
[BUG]
Since patch "btrfs: do not check header generation in
btrfs_clean_tree_block", a lot of scrub tests will report errors like
btrfs/072:

  QA output created by 072
  Silence is golden
  Scrub find errors in "-m single -d single" test

With newer scrub metadata error reports, we can see the bad tree blocks
are having older fsid from previous runs:

 BTRFS warning (device dm-2): tree block 24117248 mirror 1 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
 BTRFS warning (device dm-2): tree block 24117248 mirror 0 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
 BTRFS error (device dm-2): bdev /dev/mapper/test-scratch2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0

This means, some tree blocks of commit roots are not properly written
back to disks.

[CAUSE]
Patch "btrfs: do not check header generation in btrfs_clean_tree_block"
will unconditionally clear the DIRTY flag, but there is a race that we
can clear the DIRTY flag for extent buffers which is under writeback:

            Transaction A             |   Transaction A + 1
--------------------------------------+----------------------------------
 btrfs_cow_block()                    |
 |- __btrfs_cow_block()               |
    |- btrfs_alloc_tree_block()       |
    |  Tree block X get allocated     |
    |  Generation is A, and marked    |
    |  dirty.                         |
                                      |
 btrfs_commit_transaction()           |
 | Tree block X should be written     |
 | back. As it's dirtied in           |
 | transaction A.                     |
 |                                    |
 |- cur_trans->state =                |
 |  TRANS_STATE_UNBLOCKED             |
 |  Now new transaction can be        |
 |  started.                          |
 |                                    | Transaction A + 1 started
 |                                    |
 |                                    | btrfs_cow_block()
 |                                    | |- __btrfs_cow_block()
 |                                    |    |- update_ref_for_cow()
 |                                    |    |- btrfs_clear_buffer_dirty()
 |                                    |    |  Tree block X get freed, thus
 |                                    |       its DIRTY flag get cleared.
 |                                    |       And its pages are no
 |                                    |       longer dirty.
 |                                    |
 |- btrfs_write_and_wait_transaction()|
    |- btrfs_write_marked_extents()   |
       Here we will only writeback    |
       dirty pages, and tree block X  |
       no longer has its pages dirty, |
       it will no be written back.    |

Thus those tree blocks don't get written back and cause above fsid
mismatch during scrub.

[FIX]
Just bring back the check of the old code, with added commt on why we
should not clear dirty flag unconditionally.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
This patch should just be used for testing.

The proper fix would be an update of the series "extent buffer dirty
cleanups", which should remove the patch "btrfs: do not check header
generation in btrfs_clean_tree_block" completely.

But I really believe we need a new comment on the original check anyway.
---
 fs/btrfs/extent_io.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

David Sterba Dec. 20, 2022, 6:57 p.m. UTC | #1
On Sat, Dec 17, 2022 at 12:57:10PM +0800, Qu Wenruo wrote:
> [BUG]
> Since patch "btrfs: do not check header generation in
> btrfs_clean_tree_block", a lot of scrub tests will report errors like
> btrfs/072:
> 
>   QA output created by 072
>   Silence is golden
>   Scrub find errors in "-m single -d single" test
> 
> With newer scrub metadata error reports, we can see the bad tree blocks
> are having older fsid from previous runs:
> 
>  BTRFS warning (device dm-2): tree block 24117248 mirror 1 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
>  BTRFS warning (device dm-2): tree block 24117248 mirror 0 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
>  BTRFS error (device dm-2): bdev /dev/mapper/test-scratch2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> 
> This means, some tree blocks of commit roots are not properly written
> back to disks.
> 
> [CAUSE]
> Patch "btrfs: do not check header generation in btrfs_clean_tree_block"
> will unconditionally clear the DIRTY flag, but there is a race that we
> can clear the DIRTY flag for extent buffers which is under writeback:
> 
>             Transaction A             |   Transaction A + 1
> --------------------------------------+----------------------------------
>  btrfs_cow_block()                    |
>  |- __btrfs_cow_block()               |
>     |- btrfs_alloc_tree_block()       |
>     |  Tree block X get allocated     |
>     |  Generation is A, and marked    |
>     |  dirty.                         |
>                                       |
>  btrfs_commit_transaction()           |
>  | Tree block X should be written     |
>  | back. As it's dirtied in           |
>  | transaction A.                     |
>  |                                    |
>  |- cur_trans->state =                |
>  |  TRANS_STATE_UNBLOCKED             |
>  |  Now new transaction can be        |
>  |  started.                          |
>  |                                    | Transaction A + 1 started
>  |                                    |
>  |                                    | btrfs_cow_block()
>  |                                    | |- __btrfs_cow_block()
>  |                                    |    |- update_ref_for_cow()
>  |                                    |    |- btrfs_clear_buffer_dirty()
>  |                                    |    |  Tree block X get freed, thus
>  |                                    |       its DIRTY flag get cleared.
>  |                                    |       And its pages are no
>  |                                    |       longer dirty.
>  |                                    |
>  |- btrfs_write_and_wait_transaction()|
>     |- btrfs_write_marked_extents()   |
>        Here we will only writeback    |
>        dirty pages, and tree block X  |
>        no longer has its pages dirty, |
>        it will no be written back.    |
> 
> Thus those tree blocks don't get written back and cause above fsid
> mismatch during scrub.
> 
> [FIX]
> Just bring back the check of the old code, with added commt on why we
> should not clear dirty flag unconditionally.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> This patch should just be used for testing.
> 
> The proper fix would be an update of the series "extent buffer dirty
> cleanups", which should remove the patch "btrfs: do not check header
> generation in btrfs_clean_tree_block" completely.

I've removed the sereis from misc-next and will add it to for-next later
so we have fewer errors in the testing runs until the end of the year.
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4f96f9603ec2..3e793d92d386 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -36,6 +36,7 @@ 
 #include "file.h"
 #include "dev-replace.h"
 #include "super.h"
+#include "transaction.h"
 
 static struct kmem_cache *extent_buffer_cache;
 
@@ -4704,6 +4705,19 @@  void btrfs_clear_buffer_dirty(struct extent_buffer *eb)
 
 	btrfs_assert_tree_write_locked(eb);
 
+	/*
+	 * Do not clear the dirty flag for an extent buffer which is not
+	 * modified in the current running transaction.
+	 *
+	 * The tree block may belong to previous transaction which is still
+	 * waiting for writeback.
+	 * Clearing the dirity bit unconditionally will make the previous
+	 * transaction to miss its tree blocks, screwing up anything relying
+	 * on commit root, from scrub to powerloss protection.
+	 */
+	if (btrfs_header_generation(eb) != fs_info->running_transaction->transid)
+		return;
+
 	if (!test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags))
 		return;