Message ID | ef73c4c67028f9e7d770dca236367f1ea0b56b55.1670451918.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | extent buffer dirty cleanups | expand |
On 2022/12/8 06:28, Josef Bacik wrote: > This check is from an era where we didn't have a per-extent buffer dirty > flag, we just messed with the page bits. All the places we call this > function are when we have a transaction open, so just skip this check > and rely on the dirty flag. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> This patch is failing a lot of test cases, mostly scrub related (btrfs/072, btrfs/074). Now we will report all kinds of errors during scrub. Which seems weird, as scrub doesn't use the regular extent buffer helpers at all... Maybe it's related to the generation we got during the extent tree search? As all the failures points to generation mismatch during scrub. Thanks, Qu > --- > fs/btrfs/disk-io.c | 15 ++++++--------- > 1 file changed, 6 insertions(+), 9 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index d0ed52cab304..267163e546a5 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -968,16 +968,13 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, > void btrfs_clean_tree_block(struct extent_buffer *buf) > { > struct btrfs_fs_info *fs_info = buf->fs_info; > - if (btrfs_header_generation(buf) == > - fs_info->running_transaction->transid) { > - btrfs_assert_tree_write_locked(buf); > + btrfs_assert_tree_write_locked(buf); > > - if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { > - percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, > - -buf->len, > - fs_info->dirty_metadata_batch); > - clear_extent_buffer_dirty(buf); > - } > + if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { > + percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, > + -buf->len, > + fs_info->dirty_metadata_batch); > + clear_extent_buffer_dirty(buf); > } > } >
On 2022/12/16 13:32, Qu Wenruo wrote: > > > On 2022/12/8 06:28, Josef Bacik wrote: >> This check is from an era where we didn't have a per-extent buffer dirty >> flag, we just messed with the page bits. All the places we call this >> function are when we have a transaction open, so just skip this check >> and rely on the dirty flag. >> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > This patch is failing a lot of test cases, mostly scrub related > (btrfs/072, btrfs/074). > > Now we will report all kinds of errors during scrub. > Which seems weird, as scrub doesn't use the regular extent buffer > helpers at all... > > Maybe it's related to the generation we got during the extent tree search? > As all the failures points to generation mismatch during scrub. I got some extra digging, and it turns out that, if we unconditionally clear the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks writeback for commit root. Here is some trace for one extent buffer: btrfs_init_new_buffer: eb 7110656 set generation 13 btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13 __btrfs_cow_block: eb 7110656 set generation 13 Above 3 lines are where the eb 7110656 got created as a cowed tree block. update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13 The eb 7110656 is cowed from 6930432, and that's created at transid 13. update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14 But that eb 7110656 got CoWed again in transaction 14. Which means, eb 7110656 is no longer referred in transid 14, but is still referred in transid 13. btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14 Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656. This has a consequence that, the tree block 7110656 will not be written back, even it's still referred in transid 13. This is where the problem is, previously we will continue writing back eb 7110656, as its transid is not the running transid, meaning some commit root still needs it. Especially note that, I have added trace output for any tree block write back (in btree_csum_one_bio()). But there is no such trace, meaning the tree block is never properly written back. This also exposed another problem, if we didn't properly writeback tree blocks in commit root, we break COW, thus no proper transactional protection for our metadata. scrub_simple_mirror: tree 7110656 mirror 1 wanted generation 13 running trans 14 scrub_checksum_tree_block: tree generation mismatch, tree 7110656 mirror 1 running trans 14, has 15 want 13 scrub_checksum_tree_block: tree generation mismatch, tree 7110656 mirror 0 running trans 14, has 15 want 13 The above lines just shows the scrub failure for it. As the tree block is not properly written back, we just read out some garbage. And unfortunately our scrub code only checks bytenr, then generation, not even checking the fsid, thus we got the generation mismatch error. ... btree_csum_one_bio: eb 7110656 gen 26 There is an example to prove that, I have added tree block writeback trace event. Thus I'd prefer to have at least a comment explaining why we can not just clean the dirty bit for a dirty eb which is not dirtied during the current running transaction. Thanks, Qu > > Thanks, > Qu > >> --- >> fs/btrfs/disk-io.c | 15 ++++++--------- >> 1 file changed, 6 insertions(+), 9 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index d0ed52cab304..267163e546a5 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -968,16 +968,13 @@ struct extent_buffer *read_tree_block(struct >> btrfs_fs_info *fs_info, u64 bytenr, >> void btrfs_clean_tree_block(struct extent_buffer *buf) >> { >> struct btrfs_fs_info *fs_info = buf->fs_info; >> - if (btrfs_header_generation(buf) == >> - fs_info->running_transaction->transid) { >> - btrfs_assert_tree_write_locked(buf); >> + btrfs_assert_tree_write_locked(buf); >> - if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { >> - percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, >> - -buf->len, >> - fs_info->dirty_metadata_batch); >> - clear_extent_buffer_dirty(buf); >> - } >> + if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { >> + percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, >> + -buf->len, >> + fs_info->dirty_metadata_batch); >> + clear_extent_buffer_dirty(buf); >> } >> }
On Sat, Dec 17, 2022 at 10:10:24AM +0800, Qu Wenruo wrote: > On 2022/12/16 13:32, Qu Wenruo wrote: > > On 2022/12/8 06:28, Josef Bacik wrote: > >> This check is from an era where we didn't have a per-extent buffer dirty > >> flag, we just messed with the page bits. All the places we call this > >> function are when we have a transaction open, so just skip this check > >> and rely on the dirty flag. > >> > >> Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > > > This patch is failing a lot of test cases, mostly scrub related > > (btrfs/072, btrfs/074). > > > > Now we will report all kinds of errors during scrub. > > Which seems weird, as scrub doesn't use the regular extent buffer > > helpers at all... > > > > Maybe it's related to the generation we got during the extent tree search? > > As all the failures points to generation mismatch during scrub. > > I got some extra digging, and it turns out that, if we unconditionally > clear the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks > writeback for commit root. > > Here is some trace for one extent buffer: > > btrfs_init_new_buffer: eb 7110656 set generation 13 > btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13 > __btrfs_cow_block: eb 7110656 set generation 13 > > Above 3 lines are where the eb 7110656 got created as a cowed tree block. > > update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13 > > The eb 7110656 is cowed from 6930432, and that's created at transid 13. > > update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14 > > But that eb 7110656 got CoWed again in transaction 14. Which means, eb > 7110656 is no longer referred in transid 14, but is still referred in > transid 13. > > btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14 > > Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656. > > This has a consequence that, the tree block 7110656 will not be written > back, even it's still referred in transid 13. > > This is where the problem is, previously we will continue writing back > eb 7110656, as its transid is not the running transid, meaning some > commit root still needs it. > > Especially note that, I have added trace output for any tree block write > back (in btree_csum_one_bio()). > But there is no such trace, meaning the tree block is never properly > written back. > > This also exposed another problem, if we didn't properly writeback tree > blocks in commit root, we break COW, thus no proper transactional > protection for our metadata. > > scrub_simple_mirror: tree 7110656 mirror 1 wanted generation 13 > running trans 14 > scrub_checksum_tree_block: tree generation mismatch, tree 7110656 > mirror 1 running trans 14, has 15 want 13 > scrub_checksum_tree_block: tree generation mismatch, tree 7110656 > mirror 0 running trans 14, has 15 want 13 > > The above lines just shows the scrub failure for it. > As the tree block is not properly written back, we just read out some > garbage. > > And unfortunately our scrub code only checks bytenr, then generation, > not even checking the fsid, thus we got the generation mismatch error. > > ... > btree_csum_one_bio: eb 7110656 gen 26 > > There is an example to prove that, I have added tree block writeback > trace event. > > Thus I'd prefer to have at least a comment explaining why we can not > just clean the dirty bit for a dirty eb which is not dirtied during the > current running transaction. I've temporarily removed the patchset from for-next so we don't have test failures over the holidays, now it's time to add it back, but based on the above there are changes needed, right? If yes, I'll wait for the update.
On 2023/1/10 23:33, David Sterba wrote: > On Sat, Dec 17, 2022 at 10:10:24AM +0800, Qu Wenruo wrote: >> On 2022/12/16 13:32, Qu Wenruo wrote: >>> On 2022/12/8 06:28, Josef Bacik wrote: >>>> This check is from an era where we didn't have a per-extent buffer dirty >>>> flag, we just messed with the page bits. All the places we call this >>>> function are when we have a transaction open, so just skip this check >>>> and rely on the dirty flag. >>>> >>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >>> >>> This patch is failing a lot of test cases, mostly scrub related >>> (btrfs/072, btrfs/074). >>> >>> Now we will report all kinds of errors during scrub. >>> Which seems weird, as scrub doesn't use the regular extent buffer >>> helpers at all... >>> >>> Maybe it's related to the generation we got during the extent tree search? >>> As all the failures points to generation mismatch during scrub. >> >> I got some extra digging, and it turns out that, if we unconditionally >> clear the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks >> writeback for commit root. >> >> Here is some trace for one extent buffer: >> >> btrfs_init_new_buffer: eb 7110656 set generation 13 >> btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13 >> __btrfs_cow_block: eb 7110656 set generation 13 >> >> Above 3 lines are where the eb 7110656 got created as a cowed tree block. >> >> update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13 >> >> The eb 7110656 is cowed from 6930432, and that's created at transid 13. >> >> update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14 >> >> But that eb 7110656 got CoWed again in transaction 14. Which means, eb >> 7110656 is no longer referred in transid 14, but is still referred in >> transid 13. >> >> btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14 >> >> Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656. >> >> This has a consequence that, the tree block 7110656 will not be written >> back, even it's still referred in transid 13. >> >> This is where the problem is, previously we will continue writing back >> eb 7110656, as its transid is not the running transid, meaning some >> commit root still needs it. >> >> Especially note that, I have added trace output for any tree block write >> back (in btree_csum_one_bio()). >> But there is no such trace, meaning the tree block is never properly >> written back. >> >> This also exposed another problem, if we didn't properly writeback tree >> blocks in commit root, we break COW, thus no proper transactional >> protection for our metadata. >> >> scrub_simple_mirror: tree 7110656 mirror 1 wanted generation 13 >> running trans 14 >> scrub_checksum_tree_block: tree generation mismatch, tree 7110656 >> mirror 1 running trans 14, has 15 want 13 >> scrub_checksum_tree_block: tree generation mismatch, tree 7110656 >> mirror 0 running trans 14, has 15 want 13 >> >> The above lines just shows the scrub failure for it. >> As the tree block is not properly written back, we just read out some >> garbage. >> >> And unfortunately our scrub code only checks bytenr, then generation, >> not even checking the fsid, thus we got the generation mismatch error. >> >> ... >> btree_csum_one_bio: eb 7110656 gen 26 >> >> There is an example to prove that, I have added tree block writeback >> trace event. >> >> Thus I'd prefer to have at least a comment explaining why we can not >> just clean the dirty bit for a dirty eb which is not dirtied during the >> current running transaction. > > I've temporarily removed the patchset from for-next so we don't have > test failures over the holidays, now it's time to add it back, but based > on the above there are changes needed, right? If yes, I'll wait for the > update. I believe we should drop this patch from the series. But since it's at the very beginning of the series, and a lot of later patches are depending on it, it needs some work to resolve it. Thanks, Qu
On Sat, Dec 17, 2022 at 10:10:24AM +0800, Qu Wenruo wrote: > > > On 2022/12/16 13:32, Qu Wenruo wrote: > > > > > > On 2022/12/8 06:28, Josef Bacik wrote: > > > This check is from an era where we didn't have a per-extent buffer dirty > > > flag, we just messed with the page bits. All the places we call this > > > function are when we have a transaction open, so just skip this check > > > and rely on the dirty flag. > > > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > > > This patch is failing a lot of test cases, mostly scrub related > > (btrfs/072, btrfs/074). > > > > Now we will report all kinds of errors during scrub. > > Which seems weird, as scrub doesn't use the regular extent buffer > > helpers at all... > > > > Maybe it's related to the generation we got during the extent tree search? > > As all the failures points to generation mismatch during scrub. > > I got some extra digging, and it turns out that, if we unconditionally clear > the EXTENT_BUFFER_DIRTY flags, we can miss some tree blocks writeback for > commit root. > > Here is some trace for one extent buffer: > > btrfs_init_new_buffer: eb 7110656 set generation 13 > btrfs_clean_tree_block: eb 7110656 gen 13 dirty 0 running trans 13 > __btrfs_cow_block: eb 7110656 set generation 13 > > Above 3 lines are where the eb 7110656 got created as a cowed tree block. > > update_ref_for_cow: root 1 buf 6930432 gen 12 cow 7110656 gen 13 > > The eb 7110656 is cowed from 6930432, and that's created at transid 13. > > update_ref_for_cow: root 1 buf 7110656 gen 13 cow 7192576 gen 14 > > But that eb 7110656 got CoWed again in transaction 14. Which means, eb > 7110656 is no longer referred in transid 14, but is still referred in > transid 13. > > btrfs_clean_tree_block: eb 7110656 gen 13 dirty 1 running trans 14 > > Here inside update_ref_for_cow(), we clear the dirty flag for eb 7110656. > Ooooh well that's not good, we shouldn't be calling btrfs_clean_tree_block in this case, that's the real bug. I'll fix that and update the series. Nice catch, thanks! Josef
On Wed, Jan 11, 2023 at 07:03:06AM +0800, Qu Wenruo wrote: > > I've temporarily removed the patchset from for-next so we don't have > > test failures over the holidays, now it's time to add it back, but based > > on the above there are changes needed, right? If yes, I'll wait for the > > update. > > I believe we should drop this patch from the series. > > But since it's at the very beginning of the series, and a lot of later > patches are depending on it, it needs some work to resolve it. Ok, so it's for Josef, please have a look.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d0ed52cab304..267163e546a5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -968,16 +968,13 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, void btrfs_clean_tree_block(struct extent_buffer *buf) { struct btrfs_fs_info *fs_info = buf->fs_info; - if (btrfs_header_generation(buf) == - fs_info->running_transaction->transid) { - btrfs_assert_tree_write_locked(buf); + btrfs_assert_tree_write_locked(buf); - if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { - percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, - -buf->len, - fs_info->dirty_metadata_batch); - clear_extent_buffer_dirty(buf); - } + if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { + percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, + -buf->len, + fs_info->dirty_metadata_batch); + clear_extent_buffer_dirty(buf); } }
This check is from an era where we didn't have a per-extent buffer dirty flag, we just messed with the page bits. All the places we call this function are when we have a transaction open, so just skip this check and rely on the dirty flag. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/disk-io.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)