Message ID | 232abb666a6901f909aeb21dc6f5998f0250e073.1684967923.git.dsterba@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Parameter cleanups in extent state helpers | expand |
On Thu, May 25, 2023 at 01:04:34AM +0200, David Sterba wrote: > The __GFP_NOFAIL passed to set_extent_bit first appeared in 2010 > (commit f0486c68e4bd9a ("Btrfs: Introduce contexts for metadata > reservation")), without any explanation why it would be needed. > > Meanwhile we've updated the semantics of set_extent_bit to handle failed > allocations and do unlock, sleep and retry if needed. The use of the > NOFAIL flag is also an outlier, we never want any of the set/clear > extent bit helpers to fail, they're used for many critical changes like > extent locking, besides the extent state bit changes. Given how many of the callers do not check the return value, and that the trees store essential information, I think the right thing here is to always use __GFP_NOFAIL unless GFP_NOWAIT is passed. In practice this won't make a difference as currently small slab allocations never fail, but that's an undocumented assumption.
On 2023/5/25 07:04, David Sterba wrote: > The __GFP_NOFAIL passed to set_extent_bit first appeared in 2010 > (commit f0486c68e4bd9a ("Btrfs: Introduce contexts for metadata > reservation")), without any explanation why it would be needed. > > Meanwhile we've updated the semantics of set_extent_bit to handle failed > allocations and do unlock, sleep and retry if needed. The use of the > NOFAIL flag is also an outlier, we never want any of the set/clear > extent bit helpers to fail, they're used for many critical changes like > extent locking, besides the extent state bit changes. As I mentioned in the cover letter, if we really want to set/clear bits to not fail, and can accept the extra memory usage (as high as twice the number of extent states), then we should really consider the following changes: - Introduce hole extent_state For ranges without any bit set, there still needs to be an extent_state. - Make callers to pre-allocate 2 extent_state and pass them as mandatory parameters to set/clear bits - Make set/clear bits to use the preallocated 2 extent states By this, we should be able to completely get rid of the memory allocation inside the extent io tree set/clear calls. Thanks, Qu > > Signed-off-by: David Sterba <dsterba@suse.com> > --- > fs/btrfs/block-group.c | 3 +-- > fs/btrfs/extent-tree.c | 3 +-- > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index ec463f8d83ec..202e2aa949c5 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -3523,8 +3523,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > set_extent_bit(&trans->transaction->pinned_extents, > bytenr, bytenr + num_bytes - 1, > - EXTENT_DIRTY, NULL, > - GFP_NOFS | __GFP_NOFAIL); > + EXTENT_DIRTY, NULL, GFP_NOFS); > } > > spin_lock(&trans->transaction->dirty_bgs_lock); > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 03b2a7c508b9..6e319100e3a3 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2508,8 +2508,7 @@ static int pin_down_extent(struct btrfs_trans_handle *trans, > spin_unlock(&cache->space_info->lock); > > set_extent_bit(&trans->transaction->pinned_extents, bytenr, > - bytenr + num_bytes - 1, EXTENT_DIRTY, NULL, > - GFP_NOFS | __GFP_NOFAIL); > + bytenr + num_bytes - 1, EXTENT_DIRTY, NULL, GFP_NOFS); > return 0; > } >
On Thu, May 25, 2023 at 06:38:51PM +0800, Qu Wenruo wrote: > On 2023/5/25 07:04, David Sterba wrote: > > The __GFP_NOFAIL passed to set_extent_bit first appeared in 2010 > > (commit f0486c68e4bd9a ("Btrfs: Introduce contexts for metadata > > reservation")), without any explanation why it would be needed. > > > > Meanwhile we've updated the semantics of set_extent_bit to handle failed > > allocations and do unlock, sleep and retry if needed. The use of the > > NOFAIL flag is also an outlier, we never want any of the set/clear > > extent bit helpers to fail, they're used for many critical changes like > > extent locking, besides the extent state bit changes. > > As I mentioned in the cover letter, if we really want to set/clear bits > to not fail, and can accept the extra memory usage (as high as twice the > number of extent states), then we should really consider the following > changes: > > - Introduce hole extent_state > For ranges without any bit set, there still needs to be an > extent_state. > > - Make callers to pre-allocate 2 extent_state and pass them as mandatory > parameters to set/clear bits > > - Make set/clear bits to use the preallocated 2 extent states > > By this, we should be able to completely get rid of the memory > allocation inside the extent io tree set/clear calls. I did not understand what you mean from the cover letter comment but now I see it. The size of extent_state is 72 bytes on release build, which is not that much, doubling that size is still in acceptable range even if we'd never utilize the preallocated memory at some point. That we dont't see any failures now is 1) GFP_NOFS does not fail 2) we're using a slab cache and the objects get reused within the same allocated space due to frequent state changes. Even if 1) wasn't true we'd still have to hit a hard memory allocation error, i.e. no available pages for a slab extension. If the preallocation prevents any failure due to memory allocations then we can keep the current way of dealing with extent bit change error handling, i.e. none because we rely on the allocator.
On Thu, May 25, 2023 at 02:06:05AM -0700, Christoph Hellwig wrote: > On Thu, May 25, 2023 at 01:04:34AM +0200, David Sterba wrote: > > The __GFP_NOFAIL passed to set_extent_bit first appeared in 2010 > > (commit f0486c68e4bd9a ("Btrfs: Introduce contexts for metadata > > reservation")), without any explanation why it would be needed. > > > > Meanwhile we've updated the semantics of set_extent_bit to handle failed > > allocations and do unlock, sleep and retry if needed. The use of the > > NOFAIL flag is also an outlier, we never want any of the set/clear > > extent bit helpers to fail, they're used for many critical changes like > > extent locking, besides the extent state bit changes. > > Given how many of the callers do not check the return value, and that > the trees store essential information, I think the right thing here > is to always use __GFP_NOFAIL unless GFP_NOWAIT is passed. In practice > this won't make a difference as currently small slab allocations never > fail, but that's an undocumented assumption. Yeah, that "GFP_NOFS does not fail for < costly allocations" has been a deal with the allocator but there was some pressure to stop doing that eventually. Discussed at LSF/MM, __GFP_NOFAIL should be replaced by proper error handling if possible and minimized. I'm not sure if adding it for something that frequently done as the state bit changes is a good idea wrt memory management but I agree there's practically no change how things currently work. Qu suggested another way how we could preallocate some memory that would be used for extent holes and then reused for the cases where we need to allocate to do insert/split. I'd like to try this approach first and also check with MM guys that NOFAIL in this case is acceptable so we have more options to choose from.
On 2023/5/26 07:00, David Sterba wrote: > On Thu, May 25, 2023 at 06:38:51PM +0800, Qu Wenruo wrote: >> On 2023/5/25 07:04, David Sterba wrote: >>> The __GFP_NOFAIL passed to set_extent_bit first appeared in 2010 >>> (commit f0486c68e4bd9a ("Btrfs: Introduce contexts for metadata >>> reservation")), without any explanation why it would be needed. >>> >>> Meanwhile we've updated the semantics of set_extent_bit to handle failed >>> allocations and do unlock, sleep and retry if needed. The use of the >>> NOFAIL flag is also an outlier, we never want any of the set/clear >>> extent bit helpers to fail, they're used for many critical changes like >>> extent locking, besides the extent state bit changes. >> >> As I mentioned in the cover letter, if we really want to set/clear bits >> to not fail, and can accept the extra memory usage (as high as twice the >> number of extent states), then we should really consider the following >> changes: >> >> - Introduce hole extent_state >> For ranges without any bit set, there still needs to be an >> extent_state. >> >> - Make callers to pre-allocate 2 extent_state and pass them as mandatory >> parameters to set/clear bits >> >> - Make set/clear bits to use the preallocated 2 extent states >> >> By this, we should be able to completely get rid of the memory >> allocation inside the extent io tree set/clear calls. > > I did not understand what you mean from the cover letter comment but now > I see it. The size of extent_state is 72 bytes on release build, which > is not that much, doubling that size is still in acceptable range even > if we'd never utilize the preallocated memory at some point. That we > dont't see any failures now is 1) GFP_NOFS does not fail 2) we're using > a slab cache and the objects get reused within the same allocated space > due to frequent state changes. Even if 1) wasn't true we'd still have to > hit a hard memory allocation error, i.e. no available pages for a slab > extension. Well, I still remember an internal bugzilla that openSUSE on RPI4 triggered failure at the extent_state allocation. > > If the preallocation prevents any failure due to memory allocations then > we can keep the current way of dealing with extent bit change error > handling, i.e. none because we rely on the allocator. The preallocation is not going to help much after the first allocation is being used. After that, all later states are allocated using GFP_ATOMIC, which has a much smaller pool to fulfill from. And considering how many call sites don't check the return value at all, we'd either go forward changing all call sites, or really make the set/clear bits bulletproof (and only fail out of the set/clear call). Thanks, Qu
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index ec463f8d83ec..202e2aa949c5 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -3523,8 +3523,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, set_extent_bit(&trans->transaction->pinned_extents, bytenr, bytenr + num_bytes - 1, - EXTENT_DIRTY, NULL, - GFP_NOFS | __GFP_NOFAIL); + EXTENT_DIRTY, NULL, GFP_NOFS); } spin_lock(&trans->transaction->dirty_bgs_lock); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 03b2a7c508b9..6e319100e3a3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2508,8 +2508,7 @@ static int pin_down_extent(struct btrfs_trans_handle *trans, spin_unlock(&cache->space_info->lock); set_extent_bit(&trans->transaction->pinned_extents, bytenr, - bytenr + num_bytes - 1, EXTENT_DIRTY, NULL, - GFP_NOFS | __GFP_NOFAIL); + bytenr + num_bytes - 1, EXTENT_DIRTY, NULL, GFP_NOFS); return 0; }
The __GFP_NOFAIL passed to set_extent_bit first appeared in 2010 (commit f0486c68e4bd9a ("Btrfs: Introduce contexts for metadata reservation")), without any explanation why it would be needed. Meanwhile we've updated the semantics of set_extent_bit to handle failed allocations and do unlock, sleep and retry if needed. The use of the NOFAIL flag is also an outlier, we never want any of the set/clear extent bit helpers to fail, they're used for many critical changes like extent locking, besides the extent state bit changes. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/block-group.c | 3 +-- fs/btrfs/extent-tree.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-)