diff mbox series

[v2,01/12] Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"

Message ID ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series multiblock allocator improvements | expand

Commit Message

Ojaswin Mujoo May 30, 2023, 12:33 p.m. UTC
This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.

The reverted commit was intended to remove a dead check however it was observed
that this check was actually being used to exit early instead of looping
sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
the goal extent. Due to this, a my performance tests (fsmark, parallel file
writes in a highly fragmented FS) were seeing a 2x-3x regression.

Example, the default value of the following variables is:

sbi->s_mb_max_to_scan = 200
sbi->s_mb_min_to_scan = 10

In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
early and try again. This loop will go on until we have processed
sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
just use whatever we have even if it is smaller than goal extent.

Now, the regression comes when we find an extent bigger than goal. Earlier, in
this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
the bigger extent. However with commit 32c08693 that check was removed and hence
we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
free extent to satisfy the request. The only time we would exit early would be
when the free extent is *exactly* the size of our goal, which is pretty uncommon
occurrence and so we would almost always end up looping 200 times.

Hence, revert the commit by adding the check back to fix the regression. Also
add a comment to outline this policy.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Sedat Dilek May 30, 2023, 4:28 p.m. UTC | #1
On Tue, May 30, 2023 at 3:25 PM Ojaswin Mujoo <ojaswin@linux.ibm.com> wrote:
>
> This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.
>
> The reverted commit was intended to remove a dead check however it was observed
> that this check was actually being used to exit early instead of looping
> sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
> the goal extent. Due to this, a my performance tests (fsmark, parallel file
> writes in a highly fragmented FS) were seeing a 2x-3x regression.
>
> Example, the default value of the following variables is:
>
> sbi->s_mb_max_to_scan = 200
> sbi->s_mb_min_to_scan = 10
>
> In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
> early and try again. This loop will go on until we have processed
> sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
> just use whatever we have even if it is smaller than goal extent.
>
> Now, the regression comes when we find an extent bigger than goal. Earlier, in
> this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
> the bigger extent. However with commit 32c08693 that check was removed and hence
> we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
> free extent to satisfy the request. The only time we would exit early would be
> when the free extent is *exactly* the size of our goal, which is pretty uncommon
> occurrence and so we would almost always end up looping 200 times.
>
> Hence, revert the commit by adding the check back to fix the regression. Also
> add a comment to outline this policy.
>

Hi,

I applied this single patch of your series v2 on top of Linux v6.4-rc4.

So, if this is a regression I ask myself if this is material for Linux 6.4?

Can you comment on this, please?

Thanks.

Regards,
-Sedat-


> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d4b6a2c1881d..7ac6d3524f29 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2063,7 +2063,7 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
>         if (bex->fe_len < gex->fe_len)
>                 return;
>
> -       if (finish_group)
> +       if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
>                 ext4_mb_use_best_found(ac, e4b);
>  }
>
> @@ -2075,6 +2075,20 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
>   * in the context. Later, the best found extent will be used, if
>   * mballoc can't find good enough extent.
>   *
> + * The algorithm used is roughly as follows:
> + *
> + * * If free extent found is exactly as big as goal, then
> + *   stop the scan and use it immediately
> + *
> + * * If free extent found is smaller than goal, then keep retrying
> + *   upto a max of sbi->s_mb_max_to_scan times (default 200). After
> + *   that stop scanning and use whatever we have.
> + *
> + * * If free extent found is bigger than goal, then keep retrying
> + *   upto a max of sbi->s_mb_min_to_scan times (default 10) before
> + *   stopping the scan and using the extent.
> + *
> + *
>   * FIXME: real allocation policy is to be designed yet!
>   */
>  static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
> --
> 2.31.1
>
Ojaswin Mujoo May 31, 2023, 8:57 a.m. UTC | #2
On Tue, May 30, 2023 at 06:28:22PM +0200, Sedat Dilek wrote:
> On Tue, May 30, 2023 at 3:25 PM Ojaswin Mujoo <ojaswin@linux.ibm.com> wrote:
> >
> > This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.
> >
> > The reverted commit was intended to remove a dead check however it was observed
> > that this check was actually being used to exit early instead of looping
> > sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
> > the goal extent. Due to this, a my performance tests (fsmark, parallel file
> > writes in a highly fragmented FS) were seeing a 2x-3x regression.
> >
> > Example, the default value of the following variables is:
> >
> > sbi->s_mb_max_to_scan = 200
> > sbi->s_mb_min_to_scan = 10
> >
> > In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
> > early and try again. This loop will go on until we have processed
> > sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
> > just use whatever we have even if it is smaller than goal extent.
> >
> > Now, the regression comes when we find an extent bigger than goal. Earlier, in
> > this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
> > the bigger extent. However with commit 32c08693 that check was removed and hence
> > we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
> > free extent to satisfy the request. The only time we would exit early would be
> > when the free extent is *exactly* the size of our goal, which is pretty uncommon
> > occurrence and so we would almost always end up looping 200 times.
> >
> > Hence, revert the commit by adding the check back to fix the regression. Also
> > add a comment to outline this policy.
> >
> 
> Hi,
> 
> I applied this single patch of your series v2 on top of Linux v6.4-rc4.
> 
> So, if this is a regression I ask myself if this is material for Linux 6.4?
> 
> Can you comment on this, please?
> 
> Thanks.
> 
> Regards,
> -Sedat-

Hi Sedat,

Since this patch fixes a regression I think it should ideally go in
Linux 6.4

Regards,
ojaswin
> 
> 
> > Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> > Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
> > ---
> >  fs/ext4/mballoc.c | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index d4b6a2c1881d..7ac6d3524f29 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -2063,7 +2063,7 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
> >         if (bex->fe_len < gex->fe_len)
> >                 return;
> >
> > -       if (finish_group)
> > +       if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
> >                 ext4_mb_use_best_found(ac, e4b);
> >  }
> >
> > @@ -2075,6 +2075,20 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
> >   * in the context. Later, the best found extent will be used, if
> >   * mballoc can't find good enough extent.
> >   *
> > + * The algorithm used is roughly as follows:
> > + *
> > + * * If free extent found is exactly as big as goal, then
> > + *   stop the scan and use it immediately
> > + *
> > + * * If free extent found is smaller than goal, then keep retrying
> > + *   upto a max of sbi->s_mb_max_to_scan times (default 200). After
> > + *   that stop scanning and use whatever we have.
> > + *
> > + * * If free extent found is bigger than goal, then keep retrying
> > + *   upto a max of sbi->s_mb_min_to_scan times (default 10) before
> > + *   stopping the scan and using the extent.
> > + *
> > + *
> >   * FIXME: real allocation policy is to be designed yet!
> >   */
> >  static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
> > --
> > 2.31.1
> >
Thorsten Leemhuis June 2, 2023, 1:45 p.m. UTC | #3
On 31.05.23 10:57, Ojaswin Mujoo wrote:
> On Tue, May 30, 2023 at 06:28:22PM +0200, Sedat Dilek wrote:
>> On Tue, May 30, 2023 at 3:25 PM Ojaswin Mujoo <ojaswin@linux.ibm.com> wrote:
>>>
>>> This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.
>>>
>>> The reverted commit was intended to remove a dead check however it was observed
>>> that this check was actually being used to exit early instead of looping
>>> sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
>>> the goal extent. Due to this, a my performance tests (fsmark, parallel file
>>> writes in a highly fragmented FS) were seeing a 2x-3x regression.
>>>
>>> Example, the default value of the following variables is:
>>>
>>> sbi->s_mb_max_to_scan = 200
>>> sbi->s_mb_min_to_scan = 10
>>>
>>> In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
>>> early and try again. This loop will go on until we have processed
>>> sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
>>> just use whatever we have even if it is smaller than goal extent.
>>>
>>> Now, the regression comes when we find an extent bigger than goal. Earlier, in
>>> this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
>>> the bigger extent. However with commit 32c08693 that check was removed and hence
>>> we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
>>> free extent to satisfy the request. The only time we would exit early would be
>>> when the free extent is *exactly* the size of our goal, which is pretty uncommon
>>> occurrence and so we would almost always end up looping 200 times.
>>>
>>> Hence, revert the commit by adding the check back to fix the regression. Also
>>> add a comment to outline this policy.
>>
>> I applied this single patch of your series v2 on top of Linux v6.4-rc4.
>>
>> So, if this is a regression I ask myself if this is material for Linux 6.4?
>>
>> Can you comment on this, please?
> 
> Since this patch fixes a regression I think it should ideally go in
> Linux 6.4

Ted can speak up for himself, but maybe this might speed things up:

A lot of maintainers in a case like this want fixes (like this)
submitted separately from other changes (like the rest of this series).

/me hopes this will help and not confuse anything

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
Theodore Ts'o June 2, 2023, 4:45 p.m. UTC | #4
On Fri, Jun 02, 2023 at 03:45:52PM +0200, Thorsten Leemhuis wrote:
> > 
> > Since this patch fixes a regression I think it should ideally go in
> > Linux 6.4
> 
> Ted can speak up for himself, but maybe this might speed things up:
> 
> A lot of maintainers in a case like this want fixes (like this)
> submitted separately from other changes (like the rest of this series).

While it's nice to do that in the future (since I would have noticed
this earlier, it could have gone into my regression fixes push to
Linus last week), in this particular case I've already noted this
particular issue, and per the discussion in the last weekly ext4 video
conference chat, I will be reordering the pashes so I can send a
secondary regression fix to Linus very shortly.

Thanks,

						- Ted
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d4b6a2c1881d..7ac6d3524f29 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2063,7 +2063,7 @@  static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
 	if (bex->fe_len < gex->fe_len)
 		return;
 
-	if (finish_group)
+	if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
 		ext4_mb_use_best_found(ac, e4b);
 }
 
@@ -2075,6 +2075,20 @@  static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
  * in the context. Later, the best found extent will be used, if
  * mballoc can't find good enough extent.
  *
+ * The algorithm used is roughly as follows:
+ *
+ * * If free extent found is exactly as big as goal, then
+ *   stop the scan and use it immediately
+ *
+ * * If free extent found is smaller than goal, then keep retrying
+ *   upto a max of sbi->s_mb_max_to_scan times (default 200). After
+ *   that stop scanning and use whatever we have.
+ *
+ * * If free extent found is bigger than goal, then keep retrying
+ *   upto a max of sbi->s_mb_min_to_scan times (default 10) before
+ *   stopping the scan and using the extent.
+ *
+ *
  * FIXME: real allocation policy is to be designed yet!
  */
 static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,