diff mbox series

blk-throttle: Calculate allowed value only when the throttle is enabled

Message ID 20230928015858.1809934-1-linan666@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series blk-throttle: Calculate allowed value only when the throttle is enabled | expand

Commit Message

Li Nan Sept. 28, 2023, 1:58 a.m. UTC
From: Li Nan <linan122@huawei.com>

When the throttle of bps is not enabled, tg_bps_limit() returns U64_MAX,
which is be used in calculate_bytes_allowed(), and divide 0 error will
happen.

To fix it, only calculate allowed value when the throttle of bps/iops is
enabled and the value will be used.

Fixes: e8368b57c006 ("blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()")
Reported-by: Changhui Zhong <czhong@redhat.com>
Closes: https://lore.kernel.org/all/CAGVVp+Vt6idZtxfU9jF=VSbu145Wi-d-WnAZx_hEfOL8yLZgBA@mail.gmail.com
Signed-off-by: Li Nan <linan122@huawei.com>
---
 block/blk-throttle.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

Comments

Tejun Heo Oct. 4, 2023, 7:31 p.m. UTC | #1
Hello,

On Thu, Sep 28, 2023 at 09:58:58AM +0800, linan666@huaweicloud.com wrote:
> From: Li Nan <linan122@huawei.com>
> 
> When the throttle of bps is not enabled, tg_bps_limit() returns U64_MAX,
> which is be used in calculate_bytes_allowed(), and divide 0 error will
> happen.

calculate_bytes_allowed() is just

  return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); 

The only division is by HZ. How does divide by 0 happen?

Thanks.
Khazhy Kumykov Oct. 4, 2023, 8:30 p.m. UTC | #2
On Wed, Oct 4, 2023 at 12:32 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Thu, Sep 28, 2023 at 09:58:58AM +0800, linan666@huaweicloud.com wrote:
> > From: Li Nan <linan122@huawei.com>
> >
> > When the throttle of bps is not enabled, tg_bps_limit() returns U64_MAX,
> > which is be used in calculate_bytes_allowed(), and divide 0 error will
> > happen.
>
> calculate_bytes_allowed() is just
>
>   return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
>
> The only division is by HZ. How does divide by 0 happen?

We've also noticed this - haven't looked too deeply but I don't think
it's a divide by zero, but an overflow (bps_limit * jiffy_elapsed / HZ
will overflow for jiffies > HZ). mul_u64_u64_div_u64 does say it will
throw DE if the mul overflows

>
> Thanks.
>
> --
> tejun
Khazhy Kumykov Oct. 4, 2023, 8:44 p.m. UTC | #3
On Wed, Sep 27, 2023 at 7:05 PM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> When the throttle of bps is not enabled, tg_bps_limit() returns U64_MAX,
> which is be used in calculate_bytes_allowed(), and divide 0 error will
> happen.
>
> To fix it, only calculate allowed value when the throttle of bps/iops is
> enabled and the value will be used.
>
> Fixes: e8368b57c006 ("blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()")
> Reported-by: Changhui Zhong <czhong@redhat.com>
> Closes: https://lore.kernel.org/all/CAGVVp+Vt6idZtxfU9jF=VSbu145Wi-d-WnAZx_hEfOL8yLZgBA@mail.gmail.com
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>  block/blk-throttle.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 38a881cf97d0..3c9a74ab9f0e 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -730,8 +730,10 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
>  static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
>  {
>         unsigned long time_elapsed;
> -       long long bytes_trim;
> -       int io_trim;
> +       long long bytes_trim = 0;
> +       int io_trim = 0;
> +       u64 bps_limit;
> +       u32 iops_limit;
>
>         BUG_ON(time_before(tg->slice_end[rw], tg->slice_start[rw]));
>
> @@ -758,11 +760,14 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
>         if (!time_elapsed)
>                 return;
>
> -       bytes_trim = calculate_bytes_allowed(tg_bps_limit(tg, rw),
> -                                            time_elapsed) +
> -                    tg->carryover_bytes[rw];
> -       io_trim = calculate_io_allowed(tg_iops_limit(tg, rw), time_elapsed) +
> -                 tg->carryover_ios[rw];
> +       bps_limit = tg_bps_limit(tg, rw);
> +       iops_limit = tg_iops_limit(tg, rw);
> +       if (tg->bytes_disp[rw] > 0 && bps_limit != U64_MAX)
I don't think this change is sufficient to prevent kernel crash, as a
"clever" user could still set the bps_limit to U64_MAX - 1 (or another
large value), which probably would still result in the same crash. The
comment in mul_u64_u64_div_u64 suggests there's something we can do to
better handle the overflow case, but I'm not sure what it's referring
to. ("Will generate an #DE when the result doesn't fit u64, could fix
with an __ex_table[] entry when it becomes an issue.") Otherwise, we
probably need to remove the mul_u64_u64_div_u64 and check for
overflow/potential overflow ourselves?

Khazhy
> +               bytes_trim = calculate_bytes_allowed(bps_limit,
> +                            time_elapsed) + tg->carryover_bytes[rw];
> +       if (tg->io_disp[rw] > 0 && iops_limit != UINT_MAX)
> +               io_trim = calculate_io_allowed(iops_limit, time_elapsed) +
> +                         tg->carryover_ios[rw];
>         if (bytes_trim <= 0 && io_trim <= 0)
>                 return;
>
> --
> 2.39.2
>
Oleg Nesterov Oct. 5, 2023, 4:24 p.m. UTC | #4
Hi Li,

On 10/05, Li Nan wrote:
>
> >I don't think this change is sufficient to prevent kernel crash, as a
> >"clever" user could still set the bps_limit to U64_MAX - 1 (or another
> >large value), which probably would still result in the same crash. The
> >comment in mul_u64_u64_div_u64 suggests there's something we can do to
> >better handle the overflow case, but I'm not sure what it's referring
> >to. ("Will generate an #DE when the result doesn't fit u64, could fix
> >with an __ex_table[] entry when it becomes an issue.") Otherwise, we
>
> When (a * mul) overflows, a divide 0 error occurs in
> mul_u64_u64_div_u64(). Commit 3dc167ba5729 ("sched/cputime: Improve
> cputime_adjust()") changed func and said: "Will generate an #DE when the
> result doesn't fit u64, could fix with an __ex_table[] entry when it
> becomes an issue." But we are unsure of how to fix it. Could you please
> explain how to fix this issue.

Not sure I understand the question...

OK, we can change mul_u64_u64_div_u64() to trap the exception, say,

	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
	{
		u64 q;

		asm ("mulq %2; 1: divq %3; 2:\n"
		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_DEFAULT|EX_FLAG_CLEAR_AX)
					: "=a" (q)
					: "a" (a), "rm" (mul), "rm" (div)
					: "rdx");

		return q;
	}

should (iiuc) return 0 if the result doesn't fit u64 or div == 0.

But even if we forget that this is x86-specific, how can this help?
What should calculate_bytes_allowed() do/return in this case?

> >probably need to remove the mul_u64_u64_div_u64 and check for
> >overflow/potential overflow ourselves?

probably yes...

Oleg.
Oleg Nesterov Oct. 5, 2023, 5:04 p.m. UTC | #5
sorry, didn't notice this part before.

I am not a asm expert (to say at least;) but

On 10/05, Li Nan wrote:
>
> When (a * mul) overflows, a divide 0 error occurs in
> mul_u64_u64_div_u64().

Just in case... No, iirc it is divq which triggers #DE when the
result of division doesn't fit u64.

(a * mul) can't overflow, the result is 128-bit rax:rdx number.

Oleg.
Khazhy Kumykov Oct. 5, 2023, 5:15 p.m. UTC | #6
On Thu, Oct 5, 2023 at 10:05 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> sorry, didn't notice this part before.
>
> I am not a asm expert (to say at least;) but
>
> On 10/05, Li Nan wrote:
> >
> > When (a * mul) overflows, a divide 0 error occurs in
> > mul_u64_u64_div_u64().
>
> Just in case... No, iirc it is divq which triggers #DE when the
> result of division doesn't fit u64.
Yeah, sorry for my incorrect wording here - but we're probably seeing
exactly that the final result doesn't fit in u64. (I wasn't familiar
with the intermediary registers here, thanks for explaining)
>
> (a * mul) can't overflow, the result is 128-bit rax:rdx number.
>
> Oleg.
>
Yu Kuai Oct. 7, 2023, 1:23 a.m. UTC | #7
Hi,

在 2023/10/06 0:24, Oleg Nesterov 写道:
> Hi Li,
> 
> On 10/05, Li Nan wrote:
>>
>>> I don't think this change is sufficient to prevent kernel crash, as a
>>> "clever" user could still set the bps_limit to U64_MAX - 1 (or another
>>> large value), which probably would still result in the same crash. The
>>> comment in mul_u64_u64_div_u64 suggests there's something we can do to
>>> better handle the overflow case, but I'm not sure what it's referring
>>> to. ("Will generate an #DE when the result doesn't fit u64, could fix
>>> with an __ex_table[] entry when it becomes an issue.") Otherwise, we
>>
>> When (a * mul) overflows, a divide 0 error occurs in
>> mul_u64_u64_div_u64(). Commit 3dc167ba5729 ("sched/cputime: Improve
>> cputime_adjust()") changed func and said: "Will generate an #DE when the
>> result doesn't fit u64, could fix with an __ex_table[] entry when it
>> becomes an issue." But we are unsure of how to fix it. Could you please
>> explain how to fix this issue.
> 
> Not sure I understand the question...
> 
> OK, we can change mul_u64_u64_div_u64() to trap the exception, say,
> 
> 	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> 	{
> 		u64 q;
> 
> 		asm ("mulq %2; 1: divq %3; 2:\n"
> 		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_DEFAULT|EX_FLAG_CLEAR_AX)
> 					: "=a" (q)
> 					: "a" (a), "rm" (mul), "rm" (div)
> 					: "rdx");
> 
> 		return q;
> 	}
> 
> should (iiuc) return 0 if the result doesn't fit u64 or div == 0.
> 
> But even if we forget that this is x86-specific, how can this help?
> What should calculate_bytes_allowed() do/return in this case?

I believe, U64_MAX should be returned if result doesn't fit u64;
> 
>>> probably need to remove the mul_u64_u64_div_u64 and check for
>>> overflow/potential overflow ourselves?
> 
> probably yes...

How about this?

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 1101fb6f6cc8..5482c316a103 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -723,6 +723,10 @@ static unsigned int calculate_io_allowed(u32 
iops_limit,

  static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long 
jiffy_elapsed)
  {
+       if (jiffy_elapsed > HZ &&
+           bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ, 
(u64)jiffy_elapsed);
+               return U64_MAX;
+
         return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
  }

> 
> Oleg.
> 
> .
>
Oleg Nesterov Oct. 7, 2023, 3:16 p.m. UTC | #8
On 10/07, Yu Kuai wrote:
>
> >>>probably need to remove the mul_u64_u64_div_u64 and check for
> >>>overflow/potential overflow ourselves?
> >
> >probably yes...
>
> How about this?
>
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 1101fb6f6cc8..5482c316a103 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -723,6 +723,10 @@ static unsigned int calculate_io_allowed(u32
> iops_limit,
>
>  static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long
> jiffy_elapsed)
>  {
> +       if (jiffy_elapsed > HZ &&
> +           bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ,
> (u64)jiffy_elapsed);
> +               return U64_MAX;
> +

I can't suggest anything better...

but I do not know if it is possible that HZ > jiffy_elapsed. If yes, then
mul_u64_u64_div_u64() above is not safe too.

Oleg.
Yu Kuai Oct. 8, 2023, 1:26 a.m. UTC | #9
Hi,

在 2023/10/07 23:16, Oleg Nesterov 写道:
> On 10/07, Yu Kuai wrote:
>>
>>>>> probably need to remove the mul_u64_u64_div_u64 and check for
>>>>> overflow/potential overflow ourselves?
>>>
>>> probably yes...
>>
>> How about this?
>>
>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> index 1101fb6f6cc8..5482c316a103 100644
>> --- a/block/blk-throttle.c
>> +++ b/block/blk-throttle.c
>> @@ -723,6 +723,10 @@ static unsigned int calculate_io_allowed(u32
>> iops_limit,
>>
>>   static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long
>> jiffy_elapsed)
>>   {
>> +       if (jiffy_elapsed > HZ &&
>> +           bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ,
>> (u64)jiffy_elapsed);
>> +               return U64_MAX;
>> +
> 
> I can't suggest anything better...
> 
> but I do not know if it is possible that HZ > jiffy_elapsed. If yes, then
> mul_u64_u64_div_u64() above is not safe too.

Well, 'jiffy_elapsed > HZ' is judged before mul_u64_u64_div_u64().

Overflow can only happen if the above 2 conditions passed, and U64_MAX
is returned.

Thanks,
Kuai

> 
> Oleg.
> 
> .
>
Oleg Nesterov Oct. 8, 2023, 11:36 a.m. UTC | #10
On 10/08, Yu Kuai wrote:
>
> >>  static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long
> >>jiffy_elapsed)
> >>  {
> >>+       if (jiffy_elapsed > HZ &&
> >>+           bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ,
> >>(u64)jiffy_elapsed);
> >>+               return U64_MAX;
> >>+
> >
> >I can't suggest anything better...
> >
> >but I do not know if it is possible that HZ > jiffy_elapsed. If yes, then
> >mul_u64_u64_div_u64() above is not safe too.
>
> Well, 'jiffy_elapsed > HZ' is judged before mul_u64_u64_div_u64().

Yes, sorry, somehow I didn't notice this check.

Oleg.
Khazhy Kumykov Oct. 13, 2023, 9:51 p.m. UTC | #11
Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle
overflow of the final result either, as far as I can tell. So while on
x86 we get a DE, on non-x86 we just get the wrong result.

(Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while
calculating wait time"), setting a very-high bps_limit would probably
also cause this crash, no?)

Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()",
where if the result doesn't fit in u64, we indicate (and let the
caller choose what to do? Here we should just return U64_MAX)?

Absent that, maybe we can take inspiration from the generic
mul_u64_u64_div_u64? (Forgive the paste)

 static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
 {
+       /* Final result probably won't fit in u64 */
+       if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62)
+               return U64_MAX;
        return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
 }
Yu Kuai Oct. 16, 2023, 1:46 a.m. UTC | #12
Hi,

在 2023/10/14 5:51, Khazhy Kumykov 写道:
> Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle
> overflow of the final result either, as far as I can tell. So while on
> x86 we get a DE, on non-x86 we just get the wrong result.
> 
> (Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while
> calculating wait time"), setting a very-high bps_limit would probably
> also cause this crash, no?)
> 
> Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()",
> where if the result doesn't fit in u64, we indicate (and let the
> caller choose what to do? Here we should just return U64_MAX)?
> 
> Absent that, maybe we can take inspiration from the generic
> mul_u64_u64_div_u64? (Forgive the paste)
> 
>   static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
>   {
> +       /* Final result probably won't fit in u64 */
> +       if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62)

I'm not sure, but this condition looks necessary, but doesn't look
sufficient, for example, jiffy_elapsed cound be greater than HZ, while
ilog2(jiffy_elapsed) is equal to ilog2(HZ).

Thanks,
Kuai

> +               return U64_MAX;
>          return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
>   }
> 
> .
>
Khazhy Kumykov Oct. 16, 2023, 8:06 p.m. UTC | #13
On Sun, Oct 15, 2023 at 6:47 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/10/14 5:51, Khazhy Kumykov 写道:
> > Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle
> > overflow of the final result either, as far as I can tell. So while on
> > x86 we get a DE, on non-x86 we just get the wrong result.
> >
> > (Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while
> > calculating wait time"), setting a very-high bps_limit would probably
> > also cause this crash, no?)
> >
> > Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()",
> > where if the result doesn't fit in u64, we indicate (and let the
> > caller choose what to do? Here we should just return U64_MAX)?
> >
> > Absent that, maybe we can take inspiration from the generic
> > mul_u64_u64_div_u64? (Forgive the paste)
> >
> >   static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
> >   {
> > +       /* Final result probably won't fit in u64 */
> > +       if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62)
>
> I'm not sure, but this condition looks necessary, but doesn't look
> sufficient, for example, jiffy_elapsed cound be greater than HZ, while
> ilog2(jiffy_elapsed) is equal to ilog2(HZ).
I believe 62 is correct, although admittedly it's less "intuitive"
than the check in mul_u64_u64_div_u64()....

The result overflows if log2(A * B / C) >= 64, so we want to ensure that:
log2(A) + log2(B) - log2(C) < 64

Given that:
ilog2(A) <= log2(A) < ilog2(A) + 1  // truncation defn
It follows that:
-log2(A) <= -ilog2(A)  // Inverse rule
log2(A) - 1 < ilog2(A)

Starting from:
ilog2(A) + ilog2(B) - ilog2(C) <= X

We can show:
(log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < ilog2(A) + ilog2(B) +
(-ilog2(C)) // strict inequality here since the substitutions for A
and B terms are strictly less
(log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < X
log2(A) + log2(B) - log2(C) < X + 2

So for X = 62, log2(A) + log2(B) - log2(C) < 64 must be true, and we
must be safe from overflow.

So... by converse, if ilog2(A) + ilog2(B) - ilog2(C) > 62, we cannot
guarantee that the result will not overflow - thus we bail out.

// end math

It /is/ less exact than your proposal (sufficient, but not necessary),
but saves an extra 128bit mul.

I mostly just want us to pick /something/, since 6.6-rc and the LTSs
with the patch in question are busted currently. :)



>
> Thanks,
> Kuai
>
> > +               return U64_MAX;
> >          return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
> >   }
> >
> > .
> >
>
Khazhy Kumykov Oct. 16, 2023, 9:48 p.m. UTC | #14
On Mon, Oct 16, 2023 at 1:06 PM Khazhy Kumykov <khazhy@chromium.org> wrote:
>
> I mostly just want us to pick /something/, since 6.6-rc and the LTSs
> with the patch in question are busted currently. :)
>
And just to make it explicit: I believe Kaui's proposal is correct.

Khazhy
Yu Kuai Oct. 17, 2023, 2:38 a.m. UTC | #15
Hi,

在 2023/10/17 4:06, Khazhy Kumykov 写道:
> On Sun, Oct 15, 2023 at 6:47 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/10/14 5:51, Khazhy Kumykov 写道:
>>> Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle
>>> overflow of the final result either, as far as I can tell. So while on
>>> x86 we get a DE, on non-x86 we just get the wrong result.
>>>
>>> (Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while
>>> calculating wait time"), setting a very-high bps_limit would probably
>>> also cause this crash, no?)
>>>
>>> Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()",
>>> where if the result doesn't fit in u64, we indicate (and let the
>>> caller choose what to do? Here we should just return U64_MAX)?
>>>
>>> Absent that, maybe we can take inspiration from the generic
>>> mul_u64_u64_div_u64? (Forgive the paste)
>>>
>>>    static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
>>>    {
>>> +       /* Final result probably won't fit in u64 */
>>> +       if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62)
>>
>> I'm not sure, but this condition looks necessary, but doesn't look
>> sufficient, for example, jiffy_elapsed cound be greater than HZ, while
>> ilog2(jiffy_elapsed) is equal to ilog2(HZ).
> I believe 62 is correct, although admittedly it's less "intuitive"
> than the check in mul_u64_u64_div_u64()....
> 
> The result overflows if log2(A * B / C) >= 64, so we want to ensure that:
> log2(A) + log2(B) - log2(C) < 64
> 
> Given that:
> ilog2(A) <= log2(A) < ilog2(A) + 1  // truncation defn
> It follows that:
> -log2(A) <= -ilog2(A)  // Inverse rule
> log2(A) - 1 < ilog2(A)
> 
> Starting from:
> ilog2(A) + ilog2(B) - ilog2(C) <= X
> 
> We can show:
> (log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < ilog2(A) + ilog2(B) +
> (-ilog2(C)) // strict inequality here since the substitutions for A
> and B terms are strictly less
> (log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < X
> log2(A) + log2(B) - log2(C) < X + 2
> 
> So for X = 62, log2(A) + log2(B) - log2(C) < 64 must be true, and we
> must be safe from overflow.
> 
> So... by converse, if ilog2(A) + ilog2(B) - ilog2(C) > 62, we cannot
> guarantee that the result will not overflow - thus we bail out.
> 
> // end math

Thanks for the explanation, I understand that, so the problem is that
if the above condition(>62) match, the result may not overflow, but
U64_MAX is returned here, this is not correct but I guess this doesn't
matter in real word, it's impossible that user will issue more than
1<<63 bytes IO in an extended slice.

I'm good with this fix with some comments.

Thanks,
Kuai

> 
> It /is/ less exact than your proposal (sufficient, but not necessary),
> but saves an extra 128bit mul.
> 
> I mostly just want us to pick /something/, since 6.6-rc and the LTSs
> with the patch in question are busted currently. :)
> 
> 
> 
>>
>> Thanks,
>> Kuai
>>
>>> +               return U64_MAX;
>>>           return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
>>>    }
>>>
>>> .
>>>
>>
> 
> .
>
diff mbox series

Patch

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 38a881cf97d0..3c9a74ab9f0e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -730,8 +730,10 @@  static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
 static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
 {
 	unsigned long time_elapsed;
-	long long bytes_trim;
-	int io_trim;
+	long long bytes_trim = 0;
+	int io_trim = 0;
+	u64 bps_limit;
+	u32 iops_limit;
 
 	BUG_ON(time_before(tg->slice_end[rw], tg->slice_start[rw]));
 
@@ -758,11 +760,14 @@  static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
 	if (!time_elapsed)
 		return;
 
-	bytes_trim = calculate_bytes_allowed(tg_bps_limit(tg, rw),
-					     time_elapsed) +
-		     tg->carryover_bytes[rw];
-	io_trim = calculate_io_allowed(tg_iops_limit(tg, rw), time_elapsed) +
-		  tg->carryover_ios[rw];
+	bps_limit = tg_bps_limit(tg, rw);
+	iops_limit = tg_iops_limit(tg, rw);
+	if (tg->bytes_disp[rw] > 0 && bps_limit != U64_MAX)
+		bytes_trim = calculate_bytes_allowed(bps_limit,
+			     time_elapsed) + tg->carryover_bytes[rw];
+	if (tg->io_disp[rw] > 0 && iops_limit != UINT_MAX)
+		io_trim = calculate_io_allowed(iops_limit, time_elapsed) +
+			  tg->carryover_ios[rw];
 	if (bytes_trim <= 0 && io_trim <= 0)
 		return;