mbox series

[v2,0/7] btrfs-progs: csum-change: add the initial support for offline csum type change

Message ID cover.1684375729.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs-progs: csum-change: add the initial support for offline csum type change | expand

Message

Qu Wenruo May 18, 2023, 2:10 a.m. UTC
[CHANGELOG]
v2:
- Skip csum item checks if the fs is under csum change
  Tree-checker can be too sensitive if the csum size doesn not match the
  old csum size, which can lead to false alerts on overlapping csum
  items.

  But we still want the tree checker functionality overall, so just
  disable csum item related checks for csum change.

[DESIGN]
The csum change workflow looks like this:

- Generate new data csums
  New data csums would have a different objectid (CSUM_CHANGE -13),
  other than the existing one (EXTENT_CSUM -10).

  The generation part would also verify the data contents, if any
  mismatch we would error out.

- Delete the old data csums

- Change data csums objectid

- Rewrite metadata csums in-place
  At this stage, we would check the checksum for both old and new algo.
  If the metadata matches the old csum, we rewrite using new csum.
  If the metadata matches the new csum, we skip it.
  If the metadata doesn't match either csum, we error out.

[TESTS]
For now it's only basically tested manually.

So far the result looks good, the result fs can pass "btrfs check
--check-data-csum".

[TODO]
- Support for resume
  Currently we won't resume an interrupted csum conversion.
  Although the design should be able to handle any interruption at data
  csum conversion part, and as long as metadata csum writes are atomic,
  the metadata rewrites should also be fine.

- Support for revert if errors are found
  If we hit data csum mismatch and can not repair from any copy, then
  we should revert back to the old csum.

- Suppot for precaustious metadata check
  We'd better read and very metadata before rewriting them.

- Extra test cases


Qu Wenruo (7):
  btrfs-progs: tune: rework the main idea of csum change
  btrfs-progs: tune: implement the prerequisite checks for csum change
  btrfs-progs: tune: add the ability to read and verify the data before
    generating new checksum
  btrfs-progs: tune: add the ability to generate new data checksums
  btrfs-progs: tune: add the ability to delete old data csums
  btrfs-progs: tune: add the ability to migrate the temporary csum items
    to regular csum items
  btrfs-progs: tune: add the ability to change metadata csums

 check/mode-common.c             |   11 +-
 convert/main.c                  |   12 +-
 kernel-shared/ctree.c           |    3 -
 kernel-shared/ctree.h           |   19 +-
 kernel-shared/disk-io.c         |    8 -
 kernel-shared/file-item.c       |   46 +-
 kernel-shared/file-item.h       |    4 +-
 kernel-shared/print-tree.c      |   11 +-
 kernel-shared/tree-checker.c    |    5 +
 kernel-shared/uapi/btrfs_tree.h |    1 +
 mkfs/rootdir.c                  |   11 +-
 tune/change-csum.c              | 1053 +++++++++++++++++--------------
 tune/main.c                     |    2 +-
 tune/tune.h                     |    3 +-
 14 files changed, 646 insertions(+), 543 deletions(-)

Comments

David Sterba May 22, 2023, 12:14 p.m. UTC | #1
On Thu, May 18, 2023 at 10:10:38AM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Skip csum item checks if the fs is under csum change
>   Tree-checker can be too sensitive if the csum size doesn not match the
>   old csum size, which can lead to false alerts on overlapping csum
>   items.
> 
>   But we still want the tree checker functionality overall, so just
>   disable csum item related checks for csum change.

I still see some errors with v2, the same test that rotates the checksum
types on an increasingly filled filesystem (the one I sent you before):

ERROR: failed to insert csum change item: File exists
ERROR: failed to generate new data csums: File exists
WARNING: reserved space leaked, flag=0x4 bytes_reserved=16384
extent buffer leak: start 610811904 len 16384
extent buffer leak: start 5242880 len 16384
WARNING: dirty eb leak (aborted trans): start 5242880 len 16384
David Sterba May 22, 2023, 8:19 p.m. UTC | #2
On Thu, May 18, 2023 at 10:10:38AM +0800, Qu Wenruo wrote:
> [TODO]
> - Support for resume
>   Currently we won't resume an interrupted csum conversion.
>   Although the design should be able to handle any interruption at data
>   csum conversion part, and as long as metadata csum writes are atomic,
>   the metadata rewrites should also be fine.
> 
> - Support for revert if errors are found
>   If we hit data csum mismatch and can not repair from any copy, then
>   we should revert back to the old csum.
> 
> - Suppot for precaustious metadata check
>   We'd better read and very metadata before rewriting them.
> 
> - Extra test cases

As the todo list for that feature is still long and it's behind the
experimental build I'll keep the patches in devel, please send
incremental fixes or further updates. Thanks.
Qu Wenruo May 22, 2023, 11:45 p.m. UTC | #3
On 2023/5/22 20:14, David Sterba wrote:
> On Thu, May 18, 2023 at 10:10:38AM +0800, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Skip csum item checks if the fs is under csum change
>>    Tree-checker can be too sensitive if the csum size doesn not match the
>>    old csum size, which can lead to false alerts on overlapping csum
>>    items.
>>
>>    But we still want the tree checker functionality overall, so just
>>    disable csum item related checks for csum change.
>
> I still see some errors with v2, the same test that rotates the checksum
> types on an increasingly filled filesystem (the one I sent you before):
>
> ERROR: failed to insert csum change item: File exists

Oh sh*t, my tests only do one csum type cycle, which is
CRC32->BLAKE2->SHA256->XXHASH, and moved to the next mkfs.

But your incremental tests do multiple cycles (the incremental part is
not that a big deal, as after a full conversion it's no different than a
new fs but filled to that state).

In that case, even my v2 patches forgot to delete the csum change item
in root tree.

And one cyclic run won't fail, because they all have different offset,
but multiple cyclic runs would fail as long as we hit the second time
for the same target csum type.

The only thing saved my backend is the detailed error messages...

Thanks,
Qu

> ERROR: failed to generate new data csums: File exists
> WARNING: reserved space leaked, flag=0x4 bytes_reserved=16384
> extent buffer leak: start 610811904 len 16384
> extent buffer leak: start 5242880 len 16384
> WARNING: dirty eb leak (aborted trans): start 5242880 len 16384