Message ID | 167149471987.336919.3277522603824048839.stgit@magnolia (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | xfs: random fixes for 6.2, part 3 | expand |
On Mon, Dec 19, 2022 at 04:05:19PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Lately I've been stress-testing extreme-sized rmap btrees by using the > (new) xfs_db bmap_inflate command to clone bmbt mappings billions of > times and then using xfs_repair to build new rmap and refcount btrees. > This of course is /much/ faster than actually FICLONEing a file billions > of times. > > Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with > EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not > sufficiently large for the test scenario. For a 1TB filesystem (~67 > million AG blocks, 4 AGs) the btheight command reports: > > $ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda > rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node) > level 0: 4400801200 records, 52390491 blocks > level 1: 52390491 records, 1164234 blocks > level 2: 1164234 records, 25872 blocks > level 3: 25872 records, 575 blocks > level 4: 575 records, 13 blocks > level 5: 13 records, 1 block > 6 levels, 53581186 blocks total > > The AG is sufficiently large to build this rmap btree. Unfortunately, > m_rmap_maxlevels is 5. Augmenting the loop in the space->height > function to report height, node blocks, and blocks remaining produces > this: > > ht 1 node_blocks 45 blockleft 67108863 > ht 2 node_blocks 2025 blockleft 67108818 > ht 3 node_blocks 91125 blockleft 67106793 > ht 4 node_blocks 4100625 blockleft 67015668 > final height: 5 > > The goal of this function is to compute the maximum height btree that > can be stored in the given number of ondisk fsblocks. Starting with the > top level of the tree, each iteration through the loop adds the fanout > factor of the next level down until we run out of blocks. IOWs, maximum > height is achieved by using the smallest fanout factor that can apply > to that level. > > However, the loop setup is not correct. Top level btree blocks are > allowed to contain fewer than minrecs items, so the computation is ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ah, that's the critical piece of information I was looking for. I couldn't work out from the code change below what was wrong with limits[1]. So.... > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c > index 4c16c8c31fcb..8d11d3f5e529 100644 > --- a/fs/xfs/libxfs/xfs_btree.c > +++ b/fs/xfs/libxfs/xfs_btree.c > @@ -4666,7 +4666,11 @@ xfs_btree_space_to_height( > const unsigned int *limits, > unsigned long long leaf_blocks) > { > - unsigned long long node_blocks = limits[1]; > + /* > + * The root btree block can have a fanout between 2 and maxrecs because > + * the tree might not be big enough to fill it. > + */ Can you change this comment to say something like: /* * The root btree block can have less than minrecs pointers * in it because the tree might not be big enough to require * that amount of fanout. Hence it has a minimum size of * 2 pointers, not limits[1]. */ Otherwise it looks good. Reviewed-by: Dave Chinner <dchinner@redhat.com> > + unsigned long long node_blocks = 2; > unsigned long long blocks_left = leaf_blocks - 1; > unsigned int height = 1; For future consideration, we don't use maxrecs in this calculation at all - should we just pass minrecs into the function rather than an array of limits? Cheers, Dave.
On Tue, Dec 20, 2022 at 04:00:01PM +1100, Dave Chinner wrote: > On Mon, Dec 19, 2022 at 04:05:19PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Lately I've been stress-testing extreme-sized rmap btrees by using the > > (new) xfs_db bmap_inflate command to clone bmbt mappings billions of > > times and then using xfs_repair to build new rmap and refcount btrees. > > This of course is /much/ faster than actually FICLONEing a file billions > > of times. > > > > Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with > > EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not > > sufficiently large for the test scenario. For a 1TB filesystem (~67 > > million AG blocks, 4 AGs) the btheight command reports: > > > > $ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda > > rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node) > > level 0: 4400801200 records, 52390491 blocks > > level 1: 52390491 records, 1164234 blocks > > level 2: 1164234 records, 25872 blocks > > level 3: 25872 records, 575 blocks > > level 4: 575 records, 13 blocks > > level 5: 13 records, 1 block > > 6 levels, 53581186 blocks total > > > > The AG is sufficiently large to build this rmap btree. Unfortunately, > > m_rmap_maxlevels is 5. Augmenting the loop in the space->height > > function to report height, node blocks, and blocks remaining produces > > this: > > > > ht 1 node_blocks 45 blockleft 67108863 > > ht 2 node_blocks 2025 blockleft 67108818 > > ht 3 node_blocks 91125 blockleft 67106793 > > ht 4 node_blocks 4100625 blockleft 67015668 > > final height: 5 > > > > The goal of this function is to compute the maximum height btree that > > can be stored in the given number of ondisk fsblocks. Starting with the > > top level of the tree, each iteration through the loop adds the fanout > > factor of the next level down until we run out of blocks. IOWs, maximum > > height is achieved by using the smallest fanout factor that can apply > > to that level. > > > > However, the loop setup is not correct. Top level btree blocks are > > allowed to contain fewer than minrecs items, so the computation is > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Ah, that's the critical piece of information I was looking for. I > couldn't work out from the code change below what was wrong with > limits[1]. So.... > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c > > index 4c16c8c31fcb..8d11d3f5e529 100644 > > --- a/fs/xfs/libxfs/xfs_btree.c > > +++ b/fs/xfs/libxfs/xfs_btree.c > > @@ -4666,7 +4666,11 @@ xfs_btree_space_to_height( > > const unsigned int *limits, > > unsigned long long leaf_blocks) > > { > > - unsigned long long node_blocks = limits[1]; > > + /* > > + * The root btree block can have a fanout between 2 and maxrecs because > > + * the tree might not be big enough to fill it. > > + */ > > Can you change this comment to say something like: > > /* > * The root btree block can have less than minrecs pointers > * in it because the tree might not be big enough to require > * that amount of fanout. Hence it has a minimum size of > * 2 pointers, not limits[1]. > */ Done. Thanks for the reviews! :) --D > > Otherwise it looks good. > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > + unsigned long long node_blocks = 2; > > unsigned long long blocks_left = leaf_blocks - 1; > > unsigned int height = 1; > > For future consideration, we don't use maxrecs in this calculation > at all - should we just pass minrecs into the function rather than > an array of limits? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Tue, Dec 20, 2022 at 08:10:08AM -0800, Darrick J. Wong wrote: > On Tue, Dec 20, 2022 at 04:00:01PM +1100, Dave Chinner wrote: > > On Mon, Dec 19, 2022 at 04:05:19PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > Lately I've been stress-testing extreme-sized rmap btrees by using the > > > (new) xfs_db bmap_inflate command to clone bmbt mappings billions of > > > times and then using xfs_repair to build new rmap and refcount btrees. > > > This of course is /much/ faster than actually FICLONEing a file billions > > > of times. > > > > > > Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with > > > EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not > > > sufficiently large for the test scenario. For a 1TB filesystem (~67 > > > million AG blocks, 4 AGs) the btheight command reports: > > > > > > $ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda > > > rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node) > > > level 0: 4400801200 records, 52390491 blocks > > > level 1: 52390491 records, 1164234 blocks > > > level 2: 1164234 records, 25872 blocks > > > level 3: 25872 records, 575 blocks > > > level 4: 575 records, 13 blocks > > > level 5: 13 records, 1 block > > > 6 levels, 53581186 blocks total > > > > > > The AG is sufficiently large to build this rmap btree. Unfortunately, > > > m_rmap_maxlevels is 5. Augmenting the loop in the space->height > > > function to report height, node blocks, and blocks remaining produces > > > this: > > > > > > ht 1 node_blocks 45 blockleft 67108863 > > > ht 2 node_blocks 2025 blockleft 67108818 > > > ht 3 node_blocks 91125 blockleft 67106793 > > > ht 4 node_blocks 4100625 blockleft 67015668 > > > final height: 5 > > > > > > The goal of this function is to compute the maximum height btree that > > > can be stored in the given number of ondisk fsblocks. Starting with the > > > top level of the tree, each iteration through the loop adds the fanout > > > factor of the next level down until we run out of blocks. IOWs, maximum > > > height is achieved by using the smallest fanout factor that can apply > > > to that level. > > > > > > However, the loop setup is not correct. Top level btree blocks are > > > allowed to contain fewer than minrecs items, so the computation is > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > Ah, that's the critical piece of information I was looking for. I > > couldn't work out from the code change below what was wrong with > > limits[1]. So.... > > > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c > > > index 4c16c8c31fcb..8d11d3f5e529 100644 > > > --- a/fs/xfs/libxfs/xfs_btree.c > > > +++ b/fs/xfs/libxfs/xfs_btree.c > > > @@ -4666,7 +4666,11 @@ xfs_btree_space_to_height( > > > const unsigned int *limits, > > > unsigned long long leaf_blocks) > > > { > > > - unsigned long long node_blocks = limits[1]; > > > + /* > > > + * The root btree block can have a fanout between 2 and maxrecs because > > > + * the tree might not be big enough to fill it. > > > + */ > > > > Can you change this comment to say something like: > > > > /* > > * The root btree block can have less than minrecs pointers > > * in it because the tree might not be big enough to require > > * that amount of fanout. Hence it has a minimum size of > > * 2 pointers, not limits[1]. > > */ > > Done. Thanks for the reviews! :) > > --D > > > > > Otherwise it looks good. > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > > + unsigned long long node_blocks = 2; > > > unsigned long long blocks_left = leaf_blocks - 1; > > > unsigned int height = 1; > > > > For future consideration, we don't use maxrecs in this calculation > > at all - should we just pass minrecs into the function rather than > > an array of limits? I prefer to replace all those mxr/mnr array constructs with an explicit structure: struct xfs_btblock_geometry { uint16_t leaf_maxrecs; uint16_t leaf_minrecs; uint16_t node_maxrecs; uint16_t node_minrecs; }; and get rid of all the mp->m_rmap_mxr[leaf != 0] stuff that slows me down every time I have to read it. --D > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com
On Tue, Dec 20, 2022 at 08:20:03AM -0800, Darrick J. Wong wrote: > On Tue, Dec 20, 2022 at 08:10:08AM -0800, Darrick J. Wong wrote: > > On Tue, Dec 20, 2022 at 04:00:01PM +1100, Dave Chinner wrote: > > > For future consideration, we don't use maxrecs in this calculation > > > at all - should we just pass minrecs into the function rather than > > > an array of limits? > > I prefer to replace all those mxr/mnr array constructs with an explicit > structure: > > struct xfs_btblock_geometry { > uint16_t leaf_maxrecs; > uint16_t leaf_minrecs; > > uint16_t node_maxrecs; > uint16_t node_minrecs; > }; > > and get rid of all the mp->m_rmap_mxr[leaf != 0] stuff that slows me > down every time I have to read it. Yup, that sounds like a great idea - I have to work it out from first principles every time, too. -Dave.
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 4c16c8c31fcb..8d11d3f5e529 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -4666,7 +4666,11 @@ xfs_btree_space_to_height( const unsigned int *limits, unsigned long long leaf_blocks) { - unsigned long long node_blocks = limits[1]; + /* + * The root btree block can have a fanout between 2 and maxrecs because + * the tree might not be big enough to fill it. + */ + unsigned long long node_blocks = 2; unsigned long long blocks_left = leaf_blocks - 1; unsigned int height = 1;