Message ID | 20230928015858.1809934-1-linan666@huaweicloud.com (mailing list archive) |
---|---|

State | New, archived |

Headers | show |

Series | blk-throttle: Calculate allowed value only when the throttle is enabled | expand |

Hello, On Thu, Sep 28, 2023 at 09:58:58AM +0800, linan666@huaweicloud.com wrote: > From: Li Nan <linan122@huawei.com> > > When the throttle of bps is not enabled, tg_bps_limit() returns U64_MAX, > which is be used in calculate_bytes_allowed(), and divide 0 error will > happen. calculate_bytes_allowed() is just return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); The only division is by HZ. How does divide by 0 happen? Thanks.

On Wed, Oct 4, 2023 at 12:32 PM Tejun Heo <tj@kernel.org> wrote: > > Hello, > > On Thu, Sep 28, 2023 at 09:58:58AM +0800, linan666@huaweicloud.com wrote: > > From: Li Nan <linan122@huawei.com> > > > > When the throttle of bps is not enabled, tg_bps_limit() returns U64_MAX, > > which is be used in calculate_bytes_allowed(), and divide 0 error will > > happen. > > calculate_bytes_allowed() is just > > return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); > > The only division is by HZ. How does divide by 0 happen? We've also noticed this - haven't looked too deeply but I don't think it's a divide by zero, but an overflow (bps_limit * jiffy_elapsed / HZ will overflow for jiffies > HZ). mul_u64_u64_div_u64 does say it will throw DE if the mul overflows > > Thanks. > > -- > tejun

On Wed, Sep 27, 2023 at 7:05 PM <linan666@huaweicloud.com> wrote: > > From: Li Nan <linan122@huawei.com> > > When the throttle of bps is not enabled, tg_bps_limit() returns U64_MAX, > which is be used in calculate_bytes_allowed(), and divide 0 error will > happen. > > To fix it, only calculate allowed value when the throttle of bps/iops is > enabled and the value will be used. > > Fixes: e8368b57c006 ("blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()") > Reported-by: Changhui Zhong <czhong@redhat.com> > Closes: https://lore.kernel.org/all/CAGVVp+Vt6idZtxfU9jF=VSbu145Wi-d-WnAZx_hEfOL8yLZgBA@mail.gmail.com > Signed-off-by: Li Nan <linan122@huawei.com> > --- > block/blk-throttle.c | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 38a881cf97d0..3c9a74ab9f0e 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -730,8 +730,10 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) > static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) > { > unsigned long time_elapsed; > - long long bytes_trim; > - int io_trim; > + long long bytes_trim = 0; > + int io_trim = 0; > + u64 bps_limit; > + u32 iops_limit; > > BUG_ON(time_before(tg->slice_end[rw], tg->slice_start[rw])); > > @@ -758,11 +760,14 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) > if (!time_elapsed) > return; > > - bytes_trim = calculate_bytes_allowed(tg_bps_limit(tg, rw), > - time_elapsed) + > - tg->carryover_bytes[rw]; > - io_trim = calculate_io_allowed(tg_iops_limit(tg, rw), time_elapsed) + > - tg->carryover_ios[rw]; > + bps_limit = tg_bps_limit(tg, rw); > + iops_limit = tg_iops_limit(tg, rw); > + if (tg->bytes_disp[rw] > 0 && bps_limit != U64_MAX) I don't think this change is sufficient to prevent kernel crash, as a "clever" user could still set the bps_limit to U64_MAX - 1 (or another large value), which probably would still result in the same crash. The comment in mul_u64_u64_div_u64 suggests there's something we can do to better handle the overflow case, but I'm not sure what it's referring to. ("Will generate an #DE when the result doesn't fit u64, could fix with an __ex_table[] entry when it becomes an issue.") Otherwise, we probably need to remove the mul_u64_u64_div_u64 and check for overflow/potential overflow ourselves? Khazhy > + bytes_trim = calculate_bytes_allowed(bps_limit, > + time_elapsed) + tg->carryover_bytes[rw]; > + if (tg->io_disp[rw] > 0 && iops_limit != UINT_MAX) > + io_trim = calculate_io_allowed(iops_limit, time_elapsed) + > + tg->carryover_ios[rw]; > if (bytes_trim <= 0 && io_trim <= 0) > return; > > -- > 2.39.2 >

Hi Li, On 10/05, Li Nan wrote: > > >I don't think this change is sufficient to prevent kernel crash, as a > >"clever" user could still set the bps_limit to U64_MAX - 1 (or another > >large value), which probably would still result in the same crash. The > >comment in mul_u64_u64_div_u64 suggests there's something we can do to > >better handle the overflow case, but I'm not sure what it's referring > >to. ("Will generate an #DE when the result doesn't fit u64, could fix > >with an __ex_table[] entry when it becomes an issue.") Otherwise, we > > When (a * mul) overflows, a divide 0 error occurs in > mul_u64_u64_div_u64(). Commit 3dc167ba5729 ("sched/cputime: Improve > cputime_adjust()") changed func and said: "Will generate an #DE when the > result doesn't fit u64, could fix with an __ex_table[] entry when it > becomes an issue." But we are unsure of how to fix it. Could you please > explain how to fix this issue. Not sure I understand the question... OK, we can change mul_u64_u64_div_u64() to trap the exception, say, static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) { u64 q; asm ("mulq %2; 1: divq %3; 2:\n" _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_DEFAULT|EX_FLAG_CLEAR_AX) : "=a" (q) : "a" (a), "rm" (mul), "rm" (div) : "rdx"); return q; } should (iiuc) return 0 if the result doesn't fit u64 or div == 0. But even if we forget that this is x86-specific, how can this help? What should calculate_bytes_allowed() do/return in this case? > >probably need to remove the mul_u64_u64_div_u64 and check for > >overflow/potential overflow ourselves? probably yes... Oleg.

sorry, didn't notice this part before. I am not a asm expert (to say at least;) but On 10/05, Li Nan wrote: > > When (a * mul) overflows, a divide 0 error occurs in > mul_u64_u64_div_u64(). Just in case... No, iirc it is divq which triggers #DE when the result of division doesn't fit u64. (a * mul) can't overflow, the result is 128-bit rax:rdx number. Oleg.

On Thu, Oct 5, 2023 at 10:05 AM Oleg Nesterov <oleg@redhat.com> wrote: > > sorry, didn't notice this part before. > > I am not a asm expert (to say at least;) but > > On 10/05, Li Nan wrote: > > > > When (a * mul) overflows, a divide 0 error occurs in > > mul_u64_u64_div_u64(). > > Just in case... No, iirc it is divq which triggers #DE when the > result of division doesn't fit u64. Yeah, sorry for my incorrect wording here - but we're probably seeing exactly that the final result doesn't fit in u64. (I wasn't familiar with the intermediary registers here, thanks for explaining) > > (a * mul) can't overflow, the result is 128-bit rax:rdx number. > > Oleg. >

Hi, 在 2023/10/06 0:24, Oleg Nesterov 写道: > Hi Li, > > On 10/05, Li Nan wrote: >> >>> I don't think this change is sufficient to prevent kernel crash, as a >>> "clever" user could still set the bps_limit to U64_MAX - 1 (or another >>> large value), which probably would still result in the same crash. The >>> comment in mul_u64_u64_div_u64 suggests there's something we can do to >>> better handle the overflow case, but I'm not sure what it's referring >>> to. ("Will generate an #DE when the result doesn't fit u64, could fix >>> with an __ex_table[] entry when it becomes an issue.") Otherwise, we >> >> When (a * mul) overflows, a divide 0 error occurs in >> mul_u64_u64_div_u64(). Commit 3dc167ba5729 ("sched/cputime: Improve >> cputime_adjust()") changed func and said: "Will generate an #DE when the >> result doesn't fit u64, could fix with an __ex_table[] entry when it >> becomes an issue." But we are unsure of how to fix it. Could you please >> explain how to fix this issue. > > Not sure I understand the question... > > OK, we can change mul_u64_u64_div_u64() to trap the exception, say, > > static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) > { > u64 q; > > asm ("mulq %2; 1: divq %3; 2:\n" > _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_DEFAULT|EX_FLAG_CLEAR_AX) > : "=a" (q) > : "a" (a), "rm" (mul), "rm" (div) > : "rdx"); > > return q; > } > > should (iiuc) return 0 if the result doesn't fit u64 or div == 0. > > But even if we forget that this is x86-specific, how can this help? > What should calculate_bytes_allowed() do/return in this case? I believe, U64_MAX should be returned if result doesn't fit u64; > >>> probably need to remove the mul_u64_u64_div_u64 and check for >>> overflow/potential overflow ourselves? > > probably yes... How about this? diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 1101fb6f6cc8..5482c316a103 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -723,6 +723,10 @@ static unsigned int calculate_io_allowed(u32 iops_limit, static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) { + if (jiffy_elapsed > HZ && + bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ, (u64)jiffy_elapsed); + return U64_MAX; + return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); } > > Oleg. > > . >

On 10/07, Yu Kuai wrote: > > >>>probably need to remove the mul_u64_u64_div_u64 and check for > >>>overflow/potential overflow ourselves? > > > >probably yes... > > How about this? > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 1101fb6f6cc8..5482c316a103 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -723,6 +723,10 @@ static unsigned int calculate_io_allowed(u32 > iops_limit, > > static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long > jiffy_elapsed) > { > + if (jiffy_elapsed > HZ && > + bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ, > (u64)jiffy_elapsed); > + return U64_MAX; > + I can't suggest anything better... but I do not know if it is possible that HZ > jiffy_elapsed. If yes, then mul_u64_u64_div_u64() above is not safe too. Oleg.

Hi, 在 2023/10/07 23:16, Oleg Nesterov 写道: > On 10/07, Yu Kuai wrote: >> >>>>> probably need to remove the mul_u64_u64_div_u64 and check for >>>>> overflow/potential overflow ourselves? >>> >>> probably yes... >> >> How about this? >> >> diff --git a/block/blk-throttle.c b/block/blk-throttle.c >> index 1101fb6f6cc8..5482c316a103 100644 >> --- a/block/blk-throttle.c >> +++ b/block/blk-throttle.c >> @@ -723,6 +723,10 @@ static unsigned int calculate_io_allowed(u32 >> iops_limit, >> >> static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long >> jiffy_elapsed) >> { >> + if (jiffy_elapsed > HZ && >> + bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ, >> (u64)jiffy_elapsed); >> + return U64_MAX; >> + > > I can't suggest anything better... > > but I do not know if it is possible that HZ > jiffy_elapsed. If yes, then > mul_u64_u64_div_u64() above is not safe too. Well, 'jiffy_elapsed > HZ' is judged before mul_u64_u64_div_u64(). Overflow can only happen if the above 2 conditions passed, and U64_MAX is returned. Thanks, Kuai > > Oleg. > > . >

On 10/08, Yu Kuai wrote: > > >> static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long > >>jiffy_elapsed) > >> { > >>+ if (jiffy_elapsed > HZ && > >>+ bps_limit > mul_u64_u64_div_u64(U64_MAX, (u64)HZ, > >>(u64)jiffy_elapsed); > >>+ return U64_MAX; > >>+ > > > >I can't suggest anything better... > > > >but I do not know if it is possible that HZ > jiffy_elapsed. If yes, then > >mul_u64_u64_div_u64() above is not safe too. > > Well, 'jiffy_elapsed > HZ' is judged before mul_u64_u64_div_u64(). Yes, sorry, somehow I didn't notice this check. Oleg.

Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle overflow of the final result either, as far as I can tell. So while on x86 we get a DE, on non-x86 we just get the wrong result. (Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while calculating wait time"), setting a very-high bps_limit would probably also cause this crash, no?) Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()", where if the result doesn't fit in u64, we indicate (and let the caller choose what to do? Here we should just return U64_MAX)? Absent that, maybe we can take inspiration from the generic mul_u64_u64_div_u64? (Forgive the paste) static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) { + /* Final result probably won't fit in u64 */ + if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62) + return U64_MAX; return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); }

Hi, 在 2023/10/14 5:51, Khazhy Kumykov 写道: > Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle > overflow of the final result either, as far as I can tell. So while on > x86 we get a DE, on non-x86 we just get the wrong result. > > (Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while > calculating wait time"), setting a very-high bps_limit would probably > also cause this crash, no?) > > Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()", > where if the result doesn't fit in u64, we indicate (and let the > caller choose what to do? Here we should just return U64_MAX)? > > Absent that, maybe we can take inspiration from the generic > mul_u64_u64_div_u64? (Forgive the paste) > > static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) > { > + /* Final result probably won't fit in u64 */ > + if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62) I'm not sure, but this condition looks necessary, but doesn't look sufficient, for example, jiffy_elapsed cound be greater than HZ, while ilog2(jiffy_elapsed) is equal to ilog2(HZ). Thanks, Kuai > + return U64_MAX; > return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); > } > > . >

On Sun, Oct 15, 2023 at 6:47 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/10/14 5:51, Khazhy Kumykov 写道: > > Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle > > overflow of the final result either, as far as I can tell. So while on > > x86 we get a DE, on non-x86 we just get the wrong result. > > > > (Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while > > calculating wait time"), setting a very-high bps_limit would probably > > also cause this crash, no?) > > > > Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()", > > where if the result doesn't fit in u64, we indicate (and let the > > caller choose what to do? Here we should just return U64_MAX)? > > > > Absent that, maybe we can take inspiration from the generic > > mul_u64_u64_div_u64? (Forgive the paste) > > > > static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) > > { > > + /* Final result probably won't fit in u64 */ > > + if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62) > > I'm not sure, but this condition looks necessary, but doesn't look > sufficient, for example, jiffy_elapsed cound be greater than HZ, while > ilog2(jiffy_elapsed) is equal to ilog2(HZ). I believe 62 is correct, although admittedly it's less "intuitive" than the check in mul_u64_u64_div_u64().... The result overflows if log2(A * B / C) >= 64, so we want to ensure that: log2(A) + log2(B) - log2(C) < 64 Given that: ilog2(A) <= log2(A) < ilog2(A) + 1 // truncation defn It follows that: -log2(A) <= -ilog2(A) // Inverse rule log2(A) - 1 < ilog2(A) Starting from: ilog2(A) + ilog2(B) - ilog2(C) <= X We can show: (log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < ilog2(A) + ilog2(B) + (-ilog2(C)) // strict inequality here since the substitutions for A and B terms are strictly less (log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < X log2(A) + log2(B) - log2(C) < X + 2 So for X = 62, log2(A) + log2(B) - log2(C) < 64 must be true, and we must be safe from overflow. So... by converse, if ilog2(A) + ilog2(B) - ilog2(C) > 62, we cannot guarantee that the result will not overflow - thus we bail out. // end math It /is/ less exact than your proposal (sufficient, but not necessary), but saves an extra 128bit mul. I mostly just want us to pick /something/, since 6.6-rc and the LTSs with the patch in question are busted currently. :) > > Thanks, > Kuai > > > + return U64_MAX; > > return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); > > } > > > > . > > >

On Mon, Oct 16, 2023 at 1:06 PM Khazhy Kumykov <khazhy@chromium.org> wrote: > > I mostly just want us to pick /something/, since 6.6-rc and the LTSs > with the patch in question are busted currently. :) > And just to make it explicit: I believe Kaui's proposal is correct. Khazhy

Hi, 在 2023/10/17 4:06, Khazhy Kumykov 写道: > On Sun, Oct 15, 2023 at 6:47 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/10/14 5:51, Khazhy Kumykov 写道: >>> Looking at the generic mul_u64_u64_div_u64 impl, it doesn't handle >>> overflow of the final result either, as far as I can tell. So while on >>> x86 we get a DE, on non-x86 we just get the wrong result. >>> >>> (Aside: after 8d6bbaada2e0 ("blk-throttle: prevent overflow while >>> calculating wait time"), setting a very-high bps_limit would probably >>> also cause this crash, no?) >>> >>> Would it be possible to have a "check_mul_u64_u64_div_u64_overflow()", >>> where if the result doesn't fit in u64, we indicate (and let the >>> caller choose what to do? Here we should just return U64_MAX)? >>> >>> Absent that, maybe we can take inspiration from the generic >>> mul_u64_u64_div_u64? (Forgive the paste) >>> >>> static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) >>> { >>> + /* Final result probably won't fit in u64 */ >>> + if (ilog2(bps_limit) + ilog2(jiffy_elapsed) - ilog2(HZ) > 62) >> >> I'm not sure, but this condition looks necessary, but doesn't look >> sufficient, for example, jiffy_elapsed cound be greater than HZ, while >> ilog2(jiffy_elapsed) is equal to ilog2(HZ). > I believe 62 is correct, although admittedly it's less "intuitive" > than the check in mul_u64_u64_div_u64().... > > The result overflows if log2(A * B / C) >= 64, so we want to ensure that: > log2(A) + log2(B) - log2(C) < 64 > > Given that: > ilog2(A) <= log2(A) < ilog2(A) + 1 // truncation defn > It follows that: > -log2(A) <= -ilog2(A) // Inverse rule > log2(A) - 1 < ilog2(A) > > Starting from: > ilog2(A) + ilog2(B) - ilog2(C) <= X > > We can show: > (log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < ilog2(A) + ilog2(B) + > (-ilog2(C)) // strict inequality here since the substitutions for A > and B terms are strictly less > (log2(A) - 1) + (log2(B) - 1) + (-log2(C)) < X > log2(A) + log2(B) - log2(C) < X + 2 > > So for X = 62, log2(A) + log2(B) - log2(C) < 64 must be true, and we > must be safe from overflow. > > So... by converse, if ilog2(A) + ilog2(B) - ilog2(C) > 62, we cannot > guarantee that the result will not overflow - thus we bail out. > > // end math Thanks for the explanation, I understand that, so the problem is that if the above condition(>62) match, the result may not overflow, but U64_MAX is returned here, this is not correct but I guess this doesn't matter in real word, it's impossible that user will issue more than 1<<63 bytes IO in an extended slice. I'm good with this fix with some comments. Thanks, Kuai > > It /is/ less exact than your proposal (sufficient, but not necessary), > but saves an extra 128bit mul. > > I mostly just want us to pick /something/, since 6.6-rc and the LTSs > with the patch in question are busted currently. :) > > > >> >> Thanks, >> Kuai >> >>> + return U64_MAX; >>> return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ); >>> } >>> >>> . >>> >> > > . >

diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 38a881cf97d0..3c9a74ab9f0e 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -730,8 +730,10 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed) static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) { unsigned long time_elapsed; - long long bytes_trim; - int io_trim; + long long bytes_trim = 0; + int io_trim = 0; + u64 bps_limit; + u32 iops_limit; BUG_ON(time_before(tg->slice_end[rw], tg->slice_start[rw])); @@ -758,11 +760,14 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) if (!time_elapsed) return; - bytes_trim = calculate_bytes_allowed(tg_bps_limit(tg, rw), - time_elapsed) + - tg->carryover_bytes[rw]; - io_trim = calculate_io_allowed(tg_iops_limit(tg, rw), time_elapsed) + - tg->carryover_ios[rw]; + bps_limit = tg_bps_limit(tg, rw); + iops_limit = tg_iops_limit(tg, rw); + if (tg->bytes_disp[rw] > 0 && bps_limit != U64_MAX) + bytes_trim = calculate_bytes_allowed(bps_limit, + time_elapsed) + tg->carryover_bytes[rw]; + if (tg->io_disp[rw] > 0 && iops_limit != UINT_MAX) + io_trim = calculate_io_allowed(iops_limit, time_elapsed) + + tg->carryover_ios[rw]; if (bytes_trim <= 0 && io_trim <= 0) return;