Message ID | 05e648ae04ec5b754207032823e9c1de9a54f87a.1685449706.git.ojaswin@linux.ibm.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | multiblock allocator improvements | expand |
Hello, On 5/30/23 20:33, Ojaswin Mujoo wrote: > Before this patch, the call stack in ext4_run_li_request is as follows: > > /* > * nr = no. of BGs we want to fetch (=s_mb_prefetch) > * prefetch_ios = no. of BGs not uptodate after > * ext4_read_block_bitmap_nowait() > */ > next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios); > ext4_mb_prefetch_fini(sb, next_group prefetch_ios); > > ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in > range [next_group - prefetch_ios, next_group). This is incorrect since > sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to > incorrectly ignore some of the BGs that might need initialization. This > issue is more notable now with the previous patch enabling "fetching" of > BLOCK_UNINIT BGs which are marked buffer_uptodate by default. > > Fix this by passing nr to ext4_mb_prefetch_fini() instead of > prefetch_ios so that it considers the right range of groups. Thanks for the series. > Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in > ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT > groups that would need buddy initialization. Seems ext4_mb_prefetch_fini can't be called by ext4_mb_regular_allocator if nr is 0. https://elixir.bootlin.com/linux/v6.4-rc5/source/fs/ext4/mballoc.c#L2816 Am I miss something? Thanks, Guoqing > Signed-off-by: Ojaswin Mujoo<ojaswin@linux.ibm.com> > Reviewed-by: Ritesh Harjani (IBM)<ritesh.list@gmail.com> > Reviewed-by: Jan Kara<jack@suse.cz> > --- > fs/ext4/mballoc.c | 4 ---- > fs/ext4/super.c | 11 ++++------- > 2 files changed, 4 insertions(+), 11 deletions(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 79455c7e645b..6775d73dfc68 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -2735,8 +2735,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) > if ((prefetch_grp == group) && > (cr > CR1 || > prefetch_ios < sbi->s_mb_prefetch_limit)) { > - unsigned int curr_ios = prefetch_ios; > - > nr = sbi->s_mb_prefetch; > if (ext4_has_feature_flex_bg(sb)) { > nr = 1 << sbi->s_log_groups_per_flex; > @@ -2745,8 +2743,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) > } > prefetch_grp = ext4_mb_prefetch(sb, group, > nr, &prefetch_ios); > - if (prefetch_ios == curr_ios) > - nr = 0; > } > > /* This now checks without needing the buddy page */ > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 2da5476fa48b..27c1dabacd43 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -3692,16 +3692,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr) > ext4_group_t group = elr->lr_next_group; > unsigned int prefetch_ios = 0; > int ret = 0; > + int nr = EXT4_SB(sb)->s_mb_prefetch; > u64 start_time; > > if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) { > - elr->lr_next_group = ext4_mb_prefetch(sb, group, > - EXT4_SB(sb)->s_mb_prefetch, &prefetch_ios); > - if (prefetch_ios) > - ext4_mb_prefetch_fini(sb, elr->lr_next_group, > - prefetch_ios); > - trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, > - prefetch_ios); > + elr->lr_next_group = ext4_mb_prefetch(sb, group, nr, &prefetch_ios); > + ext4_mb_prefetch_fini(sb, elr->lr_next_group, nr); > + trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, nr); > if (group >= elr->lr_next_group) { > ret = 1; > if (elr->lr_first_not_zeroed != ngroups &&
On Tue, Jun 06, 2023 at 10:00:57PM +0800, Guoqing Jiang wrote: > Hello, > > On 5/30/23 20:33, Ojaswin Mujoo wrote: > > Before this patch, the call stack in ext4_run_li_request is as follows: > > > > /* > > * nr = no. of BGs we want to fetch (=s_mb_prefetch) > > * prefetch_ios = no. of BGs not uptodate after > > * ext4_read_block_bitmap_nowait() > > */ > > next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios); > > ext4_mb_prefetch_fini(sb, next_group prefetch_ios); > > > > ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in > > range [next_group - prefetch_ios, next_group). This is incorrect since > > sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to > > incorrectly ignore some of the BGs that might need initialization. This > > issue is more notable now with the previous patch enabling "fetching" of > > BLOCK_UNINIT BGs which are marked buffer_uptodate by default. > > > > Fix this by passing nr to ext4_mb_prefetch_fini() instead of > > prefetch_ios so that it considers the right range of groups. > > Thanks for the series. > > > Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in > > ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT > > groups that would need buddy initialization. > > Seems ext4_mb_prefetch_fini can't be called by ext4_mb_regular_allocator > if nr is 0. Hi Guoqing, Sorry I was on vacation so didn't get a chance to reply to this sooner. Let me explain what I meant by that statement in the commit message. So basically, the prefetch_ios output argument is incremented whenever ext4_mb_prefetch() reads a block group with !buffer_uptodate(bh). However, for BLOCK_UNINIT BGs the buffer is marked uptodate after initialization and hence prefetch_ios is not incremented when such BGs are prefetched. This leads to nr becoming 0 due to the following line (removed in this patch): if (prefetch_ios == curr_ios) nr = 0; hence ext4_mb_prefetch_fini() would never pre initialise the corresponding buddy structures. Instead, these structures would then get initialized probably at a later point during the slower allocation criterias. The motivation of making sure the BLOCK_UNINIT BGs' buddies are pre initialized is so the faster allocation criterias can utilize the data to make better decisions. Regards, ojaswin > > https://elixir.bootlin.com/linux/v6.4-rc5/source/fs/ext4/mballoc.c#L2816 > > Am I miss something? > > Thanks, > Guoqing >
Hi Ojaswin, On 6/27/23 14:51, Ojaswin Mujoo wrote: > On Tue, Jun 06, 2023 at 10:00:57PM +0800, Guoqing Jiang wrote: >> Hello, >> >> On 5/30/23 20:33, Ojaswin Mujoo wrote: >>> Before this patch, the call stack in ext4_run_li_request is as follows: >>> >>> /* >>> * nr = no. of BGs we want to fetch (=s_mb_prefetch) >>> * prefetch_ios = no. of BGs not uptodate after >>> * ext4_read_block_bitmap_nowait() >>> */ >>> next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios); >>> ext4_mb_prefetch_fini(sb, next_group prefetch_ios); >>> >>> ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in >>> range [next_group - prefetch_ios, next_group). This is incorrect since >>> sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to >>> incorrectly ignore some of the BGs that might need initialization. This >>> issue is more notable now with the previous patch enabling "fetching" of >>> BLOCK_UNINIT BGs which are marked buffer_uptodate by default. >>> >>> Fix this by passing nr to ext4_mb_prefetch_fini() instead of >>> prefetch_ios so that it considers the right range of groups. >> Thanks for the series. >> >>> Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in >>> ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT >>> groups that would need buddy initialization. >> Seems ext4_mb_prefetch_fini can't be called by ext4_mb_regular_allocator >> if nr is 0. > Hi Guoqing, > > Sorry I was on vacation so didn't get a chance to reply to this sooner. > Let me explain what I meant by that statement in the commit message. > > So basically, the prefetch_ios output argument is incremented whenever > ext4_mb_prefetch() reads a block group with !buffer_uptodate(bh). > However, for BLOCK_UNINIT BGs the buffer is marked uptodate after > initialization and hence prefetch_ios is not incremented when such BGs > are prefetched. > > This leads to nr becoming 0 due to the following line (removed in this patch): > > if (prefetch_ios == curr_ios) > nr = 0; > > hence ext4_mb_prefetch_fini() would never pre initialise the corresponding > buddy structures. Instead, these structures would then get initialized > probably at a later point during the slower allocation criterias. The > motivation of making sure the BLOCK_UNINIT BGs' buddies are pre > initialized is so the faster allocation criterias can utilize the data > to make better decisions. Got it, appreciate for the detailed explanation! Thanks, Guoqing
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 79455c7e645b..6775d73dfc68 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2735,8 +2735,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) if ((prefetch_grp == group) && (cr > CR1 || prefetch_ios < sbi->s_mb_prefetch_limit)) { - unsigned int curr_ios = prefetch_ios; - nr = sbi->s_mb_prefetch; if (ext4_has_feature_flex_bg(sb)) { nr = 1 << sbi->s_log_groups_per_flex; @@ -2745,8 +2743,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) } prefetch_grp = ext4_mb_prefetch(sb, group, nr, &prefetch_ios); - if (prefetch_ios == curr_ios) - nr = 0; } /* This now checks without needing the buddy page */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 2da5476fa48b..27c1dabacd43 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3692,16 +3692,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr) ext4_group_t group = elr->lr_next_group; unsigned int prefetch_ios = 0; int ret = 0; + int nr = EXT4_SB(sb)->s_mb_prefetch; u64 start_time; if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) { - elr->lr_next_group = ext4_mb_prefetch(sb, group, - EXT4_SB(sb)->s_mb_prefetch, &prefetch_ios); - if (prefetch_ios) - ext4_mb_prefetch_fini(sb, elr->lr_next_group, - prefetch_ios); - trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, - prefetch_ios); + elr->lr_next_group = ext4_mb_prefetch(sb, group, nr, &prefetch_ios); + ext4_mb_prefetch_fini(sb, elr->lr_next_group, nr); + trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, nr); if (group >= elr->lr_next_group) { ret = 1; if (elr->lr_first_not_zeroed != ngroups &&