Message ID | cover.1667315100.git.fdmanana@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: make send scale and perform better with shared extents | expand |
On Tue, Nov 01, 2022 at 04:15:36PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > There are two problems with send regarding cloned extents: > > 1) Sometimes it ends up not cloning whole extents, but only a section of > the extents, reducing in less extent sharing at the receiver and extra > IO on the send side (reading data, issuing write commands) and on the > receiver side too (writing more data). This is not only not optimal > but it also surprises users and often gets reported (such as in the > thread referenced in patch 09/18); > > 2) When we find that a data extent is directly shared more than 64 times, > we don't attempt to clone it, because that requires backref walking to > determine from which inode and range we should clone from and for > extents with many backreferences, that can be too slow, specially if > we have many thousands of extents with a huge amount of sharing each. > > This patchset solves the first problem completely (patch 09/18), and for > the second issue while not fully eliminated, it's significantly improved. > In a test scenario with 50 000 files where each file is reflinked 50 times, > there's a performance improvement of ~70% to ~75% for both full and > incremental send operations. This test and results are in the changelog > of patch 17/18. > > After this we can now bump the limit from 64 max references to 1024, which > is still a conservative value, but the goal is to get rid of such limit in > the future (some more work required for that, but we're getting there). > > There's also a nice and simple performance optimization when processing > extents that are not shared and we are using only one clone source (the > send root itself, very common), with gains varying between ~9% to ~18% > in some small scale tests where there are no shared extents or the majority > of the extents are not shared. That's patch 08/18. > > The rest is just refactoring and cleanups in preparation for the optimization > work for send, and a few bug fixes for error paths in the backref walking > code and qgroup self tests. In particular the error paths for backref walking > are important because with the latest patches they are triggered not just in > case an error happens but also when the backref walking callbacks tell the > backref walking code to stop early. > > More details in the changelogs of the patches. > > I've also left this in a git tree at: > > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=send_clone_performance_scalability > > Filipe Manana (18): > btrfs: fix inode list leak during backref walking at resolve_indirect_refs() > btrfs: fix inode list leak during backref walking at find_parent_nodes() > btrfs: fix ulist leaks in error paths of qgroup self tests > btrfs: remove pointless and double ulist frees in error paths of qgroup tests > btrfs: send: avoid unnecessary path allocations when finding extent clone > btrfs: send: update comment at find_extent_clone() > btrfs: send: drop unnecessary backref context field initializations > btrfs: send: avoid unnecessary backref lookups when finding clone source > btrfs: send: optimize clone detection to increase extent sharing > btrfs: use a single argument for extent offset in backref walking functions > btrfs: use a structure to pass arguments to backref walking functions > btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() > btrfs: constify ulist parameter of ulist_next() > btrfs: send: cache leaf to roots mapping during backref walking > btrfs: send: skip unnecessary backref iterations > btrfs: send: avoid double extent tree search when finding clone source > btrfs: send: skip resolution of our own backref when finding clone source > btrfs: send: bump the extent reference count limit for backref walking Thanks a lot, the improvements look great. Added to misc-next.
From: Filipe Manana <fdmanana@suse.com> There are two problems with send regarding cloned extents: 1) Sometimes it ends up not cloning whole extents, but only a section of the extents, reducing in less extent sharing at the receiver and extra IO on the send side (reading data, issuing write commands) and on the receiver side too (writing more data). This is not only not optimal but it also surprises users and often gets reported (such as in the thread referenced in patch 09/18); 2) When we find that a data extent is directly shared more than 64 times, we don't attempt to clone it, because that requires backref walking to determine from which inode and range we should clone from and for extents with many backreferences, that can be too slow, specially if we have many thousands of extents with a huge amount of sharing each. This patchset solves the first problem completely (patch 09/18), and for the second issue while not fully eliminated, it's significantly improved. In a test scenario with 50 000 files where each file is reflinked 50 times, there's a performance improvement of ~70% to ~75% for both full and incremental send operations. This test and results are in the changelog of patch 17/18. After this we can now bump the limit from 64 max references to 1024, which is still a conservative value, but the goal is to get rid of such limit in the future (some more work required for that, but we're getting there). There's also a nice and simple performance optimization when processing extents that are not shared and we are using only one clone source (the send root itself, very common), with gains varying between ~9% to ~18% in some small scale tests where there are no shared extents or the majority of the extents are not shared. That's patch 08/18. The rest is just refactoring and cleanups in preparation for the optimization work for send, and a few bug fixes for error paths in the backref walking code and qgroup self tests. In particular the error paths for backref walking are important because with the latest patches they are triggered not just in case an error happens but also when the backref walking callbacks tell the backref walking code to stop early. More details in the changelogs of the patches. I've also left this in a git tree at: https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=send_clone_performance_scalability Filipe Manana (18): btrfs: fix inode list leak during backref walking at resolve_indirect_refs() btrfs: fix inode list leak during backref walking at find_parent_nodes() btrfs: fix ulist leaks in error paths of qgroup self tests btrfs: remove pointless and double ulist frees in error paths of qgroup tests btrfs: send: avoid unnecessary path allocations when finding extent clone btrfs: send: update comment at find_extent_clone() btrfs: send: drop unnecessary backref context field initializations btrfs: send: avoid unnecessary backref lookups when finding clone source btrfs: send: optimize clone detection to increase extent sharing btrfs: use a single argument for extent offset in backref walking functions btrfs: use a structure to pass arguments to backref walking functions btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes() btrfs: constify ulist parameter of ulist_next() btrfs: send: cache leaf to roots mapping during backref walking btrfs: send: skip unnecessary backref iterations btrfs: send: avoid double extent tree search when finding clone source btrfs: send: skip resolution of our own backref when finding clone source btrfs: send: bump the extent reference count limit for backref walking fs/btrfs/backref.c | 596 ++++++++++++++++++++-------------- fs/btrfs/backref.h | 137 +++++++- fs/btrfs/qgroup.c | 38 ++- fs/btrfs/relocation.c | 19 +- fs/btrfs/scrub.c | 18 +- fs/btrfs/send.c | 467 +++++++++++++++++++------- fs/btrfs/tests/qgroup-tests.c | 86 +++-- fs/btrfs/ulist.c | 2 +- fs/btrfs/ulist.h | 2 +- 9 files changed, 928 insertions(+), 437 deletions(-)