Message ID | 20250220111735.1187999-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | block: throttle: don't add one extra jiffy mistakenly for bps limit | expand |
Hi, 在 2025/02/20 19:17, Ming Lei 写道: > When the current bio needs to be throttled because of bps limit, the wait > time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit() > adds one extra 1 jiffy. > > However, when taking roundup time into account, the extra 1 jiffy > may become not necessary, then bps limit becomes not accurate. This way > causes blktests throtl/001 failure in case of CONFIG_HZ_100=y. > > Fix it by not adding the 1 jiffy in case that the roundup time can > cover it. > > Cc: Tejun Heo <tj@kernel.org> > Cc: Yu Kuai <yukuai3@huawei.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-throttle.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 8d149aff9fd0..8348972c517b 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > > - if (!jiffy_wait) > - jiffy_wait = 1; > - > /* > * This wait time is without taking into consideration the rounding > * up we did. Add that time also. > */ > jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed); > + if (!jiffy_wait) > + jiffy_wait = 1; Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1) more jiffies. How about following changes? Thanks, Kuai diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 8d149aff9fd0..f8430baf3544 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, u64 bps_limit) { bool rw = bio_data_dir(bio); + long long carryover_bytes; long long bytes_allowed; u64 extra_bytes; unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, /* Calc approx time to dispatch */ extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, carryover_bytes); + /* carryover_bytes is dispatched without waiting */ if (!jiffy_wait) - jiffy_wait = 1; + tg->carryover_bytes[rw] -= carryover_bytes; /* * This wait time is without taking into consideration the rounding > + > return jiffy_wait; > } > >
Hi Yukuai, On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote: > Hi, > > 在 2025/02/20 19:17, Ming Lei 写道: > > When the current bio needs to be throttled because of bps limit, the wait > > time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit() > > adds one extra 1 jiffy. > > > > However, when taking roundup time into account, the extra 1 jiffy > > may become not necessary, then bps limit becomes not accurate. This way > > causes blktests throtl/001 failure in case of CONFIG_HZ_100=y. > > > > Fix it by not adding the 1 jiffy in case that the roundup time can > > cover it. > > > > Cc: Tejun Heo <tj@kernel.org> > > Cc: Yu Kuai <yukuai3@huawei.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-throttle.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > > index 8d149aff9fd0..8348972c517b 100644 > > --- a/block/blk-throttle.c > > +++ b/block/blk-throttle.c > > @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, > > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > > jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > > - if (!jiffy_wait) > > - jiffy_wait = 1; > > - > > /* > > * This wait time is without taking into consideration the rounding > > * up we did. Add that time also. > > */ > > jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed); > > + if (!jiffy_wait) > > + jiffy_wait = 1; > > Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1) > more jiffies. > > How about following changes? > > Thanks, > Kuai > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 8d149aff9fd0..f8430baf3544 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct > throtl_grp *tg, struct bio *bio, > u64 bps_limit) > { > bool rw = bio_data_dir(bio); > + long long carryover_bytes; > long long bytes_allowed; > u64 extra_bytes; > unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; > @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct > throtl_grp *tg, struct bio *bio, > > /* Calc approx time to dispatch */ > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, > carryover_bytes); > &carryover_bytes > + /* carryover_bytes is dispatched without waiting */ > if (!jiffy_wait) > - jiffy_wait = 1; > + tg->carryover_bytes[rw] -= carryover_bytes; > > /* > * This wait time is without taking into consideration the rounding > > > + > > return jiffy_wait; Looks result is worse with your patch: throtl/001 (basic functionality) [failed] runtime 6.488s ... 28.862s --- tests/throtl/001.out 2024-11-21 09:20:47.514353642 +0000 +++ /root/git/blktests/results/nodev/throtl/001.out.bad 2025-02-21 02:51:36.723754146 +0000 @@ -1,6 +1,6 @@ Running throtl/001 +13 1 -1 -1 +13 1 ... (Run 'diff -u tests/throtl/001.out /root/git/blktests/results/nodev/throtl/001.out.bad' to see the entire diff) thanks, Ming
On Fri, Feb 21, 2025 at 10:55:23AM +0800, Ming Lei wrote: > Hi Yukuai, > > On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote: > > Hi, > > > > 在 2025/02/20 19:17, Ming Lei 写道: > > > When the current bio needs to be throttled because of bps limit, the wait > > > time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit() > > > adds one extra 1 jiffy. > > > > > > However, when taking roundup time into account, the extra 1 jiffy > > > may become not necessary, then bps limit becomes not accurate. This way > > > causes blktests throtl/001 failure in case of CONFIG_HZ_100=y. > > > > > > Fix it by not adding the 1 jiffy in case that the roundup time can > > > cover it. > > > > > > Cc: Tejun Heo <tj@kernel.org> > > > Cc: Yu Kuai <yukuai3@huawei.com> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > --- > > > block/blk-throttle.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > > > index 8d149aff9fd0..8348972c517b 100644 > > > --- a/block/blk-throttle.c > > > +++ b/block/blk-throttle.c > > > @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, > > > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > > > jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > > > - if (!jiffy_wait) > > > - jiffy_wait = 1; > > > - > > > /* > > > * This wait time is without taking into consideration the rounding > > > * up we did. Add that time also. > > > */ > > > jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed); > > > + if (!jiffy_wait) > > > + jiffy_wait = 1; > > > > Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1) > > more jiffies. > > > > How about following changes? > > > > Thanks, > > Kuai > > > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > > index 8d149aff9fd0..f8430baf3544 100644 > > --- a/block/blk-throttle.c > > +++ b/block/blk-throttle.c > > @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct > > throtl_grp *tg, struct bio *bio, > > u64 bps_limit) > > { > > bool rw = bio_data_dir(bio); > > + long long carryover_bytes; > > long long bytes_allowed; > > u64 extra_bytes; > > unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; > > @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct > > throtl_grp *tg, struct bio *bio, > > > > /* Calc approx time to dispatch */ > > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > > - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > > + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, > > carryover_bytes); > > > > &carryover_bytes > > > + /* carryover_bytes is dispatched without waiting */ > > if (!jiffy_wait) > > - jiffy_wait = 1; > > + tg->carryover_bytes[rw] -= carryover_bytes; Not sure ->carryover_bytes[] can be used here, the comment said clearly it is only for updating config. Neither it is good to add one extra, nor add one less, maybe DIV64_U64_ROUND_CLOSEST() is better? diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 8d149aff9fd0..5791612b3543 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -727,16 +727,16 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, /* Calc approx time to dispatch */ extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); - - if (!jiffy_wait) - jiffy_wait = 1; + jiffy_wait = DIV64_U64_ROUND_CLOSEST(extra_bytes * HZ, bps_limit); /* * This wait time is without taking into consideration the rounding * up we did. Add that time also. */ jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed); + if (!jiffy_wait) + jiffy_wait = 1; + return jiffy_wait; } Thanks, Ming
Hi, 在 2025/02/21 10:55, Ming Lei 写道: > Hi Yukuai, > > On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2025/02/20 19:17, Ming Lei 写道: >>> When the current bio needs to be throttled because of bps limit, the wait >>> time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit() >>> adds one extra 1 jiffy. >>> >>> However, when taking roundup time into account, the extra 1 jiffy >>> may become not necessary, then bps limit becomes not accurate. This way >>> causes blktests throtl/001 failure in case of CONFIG_HZ_100=y. >>> >>> Fix it by not adding the 1 jiffy in case that the roundup time can >>> cover it. >>> >>> Cc: Tejun Heo <tj@kernel.org> >>> Cc: Yu Kuai <yukuai3@huawei.com> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com> >>> --- >>> block/blk-throttle.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c >>> index 8d149aff9fd0..8348972c517b 100644 >>> --- a/block/blk-throttle.c >>> +++ b/block/blk-throttle.c >>> @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, >>> extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; >>> jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); >>> - if (!jiffy_wait) >>> - jiffy_wait = 1; >>> - >>> /* >>> * This wait time is without taking into consideration the rounding >>> * up we did. Add that time also. >>> */ >>> jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed); >>> + if (!jiffy_wait) >>> + jiffy_wait = 1; >> >> Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1) >> more jiffies. >> >> How about following changes? >> >> Thanks, >> Kuai >> >> diff --git a/block/blk-throttle.c b/block/blk-throttle.c >> index 8d149aff9fd0..f8430baf3544 100644 >> --- a/block/blk-throttle.c >> +++ b/block/blk-throttle.c >> @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct >> throtl_grp *tg, struct bio *bio, >> u64 bps_limit) >> { >> bool rw = bio_data_dir(bio); >> + long long carryover_bytes; >> long long bytes_allowed; >> u64 extra_bytes; >> unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; >> @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct >> throtl_grp *tg, struct bio *bio, >> >> /* Calc approx time to dispatch */ >> extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; >> - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); >> + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, >> carryover_bytes); Hi, Thanks for the test. This is a mistake, carryover_bytes is much bigger than expected :( That's why the result is much worse. My bad. >> > > &carryover_bytes > >> + /* carryover_bytes is dispatched without waiting */ >> if (!jiffy_wait) The if condition shound be removed. >> - jiffy_wait = 1; >> + tg->carryover_bytes[rw] -= carryover_bytes; >> >> /* >> * This wait time is without taking into consideration the rounding >> >>> + >>> return jiffy_wait; > > Looks result is worse with your patch: > > throtl/001 (basic functionality) [failed] > runtime 6.488s ... 28.862s > --- tests/throtl/001.out 2024-11-21 09:20:47.514353642 +0000 > +++ /root/git/blktests/results/nodev/throtl/001.out.bad 2025-02-21 02:51:36.723754146 +0000 > @@ -1,6 +1,6 @@ > Running throtl/001 > +13 > 1 > -1 > -1 > +13 > 1 > ... > (Run 'diff -u tests/throtl/001.out /root/git/blktests/results/nodev/throtl/001.out.bad' to see the entire diff) And I realize now that throtl_start_new_slice() will just clear the carryover_bytes, I tested in my VM and with following changes, throtl/001 never fail with CONFIG_HZ_100. Thanks, Kuai diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 8d149aff9fd0..4fc005af82e0 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, u64 bps_limit) { bool rw = bio_data_dir(bio); + long long carryover_bytes; long long bytes_allowed; u64 extra_bytes; unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; @@ -727,10 +728,8 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, /* Calc approx time to dispatch */ extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); - - if (!jiffy_wait) - jiffy_wait = 1; + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, &carryover_bytes); + tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ); /* * This wait time is without taking into consideration the rounding @@ -775,10 +774,14 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, * long since now. New slice is started only for empty throttle group. * If there is queued bio, that means there should be an active * slice and it should be extended instead. + * + * If throtl_trim_slice() doesn't clear carryover_bytes, which means + * debt is still not paid, don't start new slice in this case. */ - if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw])) + if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]) && + tg->carryover_bytes[rw] >= 0) { throtl_start_new_slice(tg, rw, true); - else { + } else { if (time_before(tg->slice_end[rw], jiffies + tg->td->throtl_slice)) throtl_extend_slice(tg, rw, > > > thanks, > Ming > > > . >
On Fri, Feb 21, 2025 at 11:39:17AM +0800, Yu Kuai wrote: > Hi, > > 在 2025/02/21 10:55, Ming Lei 写道: > > Hi Yukuai, > > > > On Thu, Feb 20, 2025 at 09:38:12PM +0800, Yu Kuai wrote: > > > Hi, > > > > > > 在 2025/02/20 19:17, Ming Lei 写道: > > > > When the current bio needs to be throttled because of bps limit, the wait > > > > time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit() > > > > adds one extra 1 jiffy. > > > > > > > > However, when taking roundup time into account, the extra 1 jiffy > > > > may become not necessary, then bps limit becomes not accurate. This way > > > > causes blktests throtl/001 failure in case of CONFIG_HZ_100=y. > > > > > > > > Fix it by not adding the 1 jiffy in case that the roundup time can > > > > cover it. > > > > > > > > Cc: Tejun Heo <tj@kernel.org> > > > > Cc: Yu Kuai <yukuai3@huawei.com> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > > --- > > > > block/blk-throttle.c | 6 +++--- > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > > > > index 8d149aff9fd0..8348972c517b 100644 > > > > --- a/block/blk-throttle.c > > > > +++ b/block/blk-throttle.c > > > > @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, > > > > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > > > > jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > > > > - if (!jiffy_wait) > > > > - jiffy_wait = 1; > > > > - > > > > /* > > > > * This wait time is without taking into consideration the rounding > > > > * up we did. Add that time also. > > > > */ > > > > jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed); > > > > + if (!jiffy_wait) > > > > + jiffy_wait = 1; > > > > > > Just wonder, will wait (0, 1) less jiffies is better than wait (0, 1) > > > more jiffies. > > > > > > How about following changes? > > > > > > Thanks, > > > Kuai > > > > > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > > > index 8d149aff9fd0..f8430baf3544 100644 > > > --- a/block/blk-throttle.c > > > +++ b/block/blk-throttle.c > > > @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct > > > throtl_grp *tg, struct bio *bio, > > > u64 bps_limit) > > > { > > > bool rw = bio_data_dir(bio); > > > + long long carryover_bytes; > > > long long bytes_allowed; > > > u64 extra_bytes; > > > unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; > > > @@ -727,10 +728,11 @@ static unsigned long tg_within_bps_limit(struct > > > throtl_grp *tg, struct bio *bio, > > > > > > /* Calc approx time to dispatch */ > > > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > > > - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > > > + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, > > > carryover_bytes); > Hi, Thanks for the test. > > This is a mistake, carryover_bytes is much bigger than expected :( > That's why the result is much worse. My bad. > > > > > > > &carryover_bytes > > > > > + /* carryover_bytes is dispatched without waiting */ > > > if (!jiffy_wait) > The if condition shound be removed. > > > - jiffy_wait = 1; > > > + tg->carryover_bytes[rw] -= carryover_bytes; > > > > > > /* > > > * This wait time is without taking into consideration the rounding > > > > > > > + > > > > return jiffy_wait; > > > > Looks result is worse with your patch: > > > > throtl/001 (basic functionality) [failed] > > runtime 6.488s ... 28.862s > > --- tests/throtl/001.out 2024-11-21 09:20:47.514353642 +0000 > > +++ /root/git/blktests/results/nodev/throtl/001.out.bad 2025-02-21 02:51:36.723754146 +0000 > > @@ -1,6 +1,6 @@ > > Running throtl/001 > > +13 > > 1 > > -1 > > -1 > > +13 > > 1 > > ... > > (Run 'diff -u tests/throtl/001.out /root/git/blktests/results/nodev/throtl/001.out.bad' to see the entire diff) > > And I realize now that throtl_start_new_slice() will just clear > the carryover_bytes, I tested in my VM and with following changes, > throtl/001 never fail with CONFIG_HZ_100. If carryover_bytes can cover this issue, I think it is preferred. > > Thanks, > Kuai > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 8d149aff9fd0..4fc005af82e0 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -703,6 +703,7 @@ static unsigned long tg_within_bps_limit(struct > throtl_grp *tg, struct bio *bio, > u64 bps_limit) > { > bool rw = bio_data_dir(bio); > + long long carryover_bytes; > long long bytes_allowed; > u64 extra_bytes; > unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; > @@ -727,10 +728,8 @@ static unsigned long tg_within_bps_limit(struct > throtl_grp *tg, struct bio *bio, > > /* Calc approx time to dispatch */ > extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; > - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > - > - if (!jiffy_wait) > - jiffy_wait = 1; > + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, > &carryover_bytes); > + tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ); Can you explain a bit why `carryover_bytes/HZ` is subtracted instead of carryover_bytes? Also tg_within_bps_limit() may return 0 now, which isn't expected. Thanks, Ming
Hi, 在 2025/02/21 12:18, Ming Lei 写道: >> - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); >> - >> - if (!jiffy_wait) >> - jiffy_wait = 1; >> + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, >> &carryover_bytes); >> + tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ); > Can you explain a bit why `carryover_bytes/HZ` is subtracted instead of > carryover_bytes? For example, if bps_limit is 1000, extra_bytes is 9, then: jiffy_wait = (9 * 100) / 1000 = 0; carryover_bytes = (9 * 100) % 1000 = 900; Hence we need to divide it by HZ: tg->carryover_bytes = 0 - 900/100 = -9; -9 can be considered debt, and for the next IO, the bytes_allowed will consider the carryover_bytes. > > Also tg_within_bps_limit() may return 0 now, which isn't expected. I think it's expected, this IO will now be dispatched directly instead of wait for one more jiffies, and debt will be paid if there are following IOs. And if the tg idle for a long time before dispatching the next IO, tg_trim_slice() should handle this case and avoid long slice end. This is not quite handy, perhaps it's better to add a helper like tg_in_debt() before throtl_start_new_slice() to hande this case. BTW, we must update the comment about carryover_bytes/ios, it's alredy used as debt. Thanks, Kuai > > > Thanks, > Ming
On Fri, Feb 21, 2025 at 02:29:30PM +0800, Yu Kuai wrote: > Hi, > > 在 2025/02/21 12:18, Ming Lei 写道: > > > - jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); > > > - > > > - if (!jiffy_wait) > > > - jiffy_wait = 1; > > > + jiffy_wait = div64_u64_rem(extra_bytes * HZ, bps_limit, > > > &carryover_bytes); > > > + tg->carryover_bytes[rw] -= div64_u64(carryover_bytes, HZ); > > Can you explain a bit why `carryover_bytes/HZ` is subtracted instead of > > carryover_bytes? > > For example, if bps_limit is 1000, extra_bytes is 9, then: > > jiffy_wait = (9 * 100) / 1000 = 0; > carryover_bytes = (9 * 100) % 1000 = 900; > > Hence we need to divide it by HZ: > tg->carryover_bytes = 0 - 900/100 = -9; > > -9 can be considered debt, and for the next IO, the bytes_allowed will > consider the carryover_bytes. Got it, it is just because the dividend is 'extra_bytes * HZ', so the remainder need to be divided by HZ. > > > > Also tg_within_bps_limit() may return 0 now, which isn't expected. > > I think it's expected, this IO will now be dispatched directly instead > of wait for one more jiffies, and debt will be paid if there are > following IOs. OK. > > And if the tg idle for a long time before dispatching the next IO, > tg_trim_slice() should handle this case and avoid long slice end. This > is not quite handy, perhaps it's better to add a helper like > tg_in_debt() before throtl_start_new_slice() to hande this case. > > BTW, we must update the comment about carryover_bytes/ios, it's alredy > used as debt. Indeed, the approach is similar with the handling for bio_issue_as_root_blkg(). Tested-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming
Hi, 在 2025/02/21 16:59, Ming Lei 写道: >> And if the tg idle for a long time before dispatching the next IO, >> tg_trim_slice() should handle this case and avoid long slice end. This >> is not quite handy, perhaps it's better to add a helper like >> tg_in_debt() before throtl_start_new_slice() to hande this case. >> >> BTW, we must update the comment about carryover_bytes/ios, it's alredy >> used as debt. > Indeed, the approach is similar with the handling for > bio_issue_as_root_blkg(). > > > Tested-by: Ming Lei<ming.lei@redhat.com> While cooking the patch, I found a better way than tg_in_debt() before throtl_start_new_slice(), by making sure the current slice has 1 more jiffies to repay the debt. I'll send the patch soon, let me know if you don't want me to kee the test tag. Thanks, Kuai
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 8d149aff9fd0..8348972c517b 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -729,14 +729,14 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; jiffy_wait = div64_u64(extra_bytes * HZ, bps_limit); - if (!jiffy_wait) - jiffy_wait = 1; - /* * This wait time is without taking into consideration the rounding * up we did. Add that time also. */ jiffy_wait = jiffy_wait + (jiffy_elapsed_rnd - jiffy_elapsed); + if (!jiffy_wait) + jiffy_wait = 1; + return jiffy_wait; }
When the current bio needs to be throttled because of bps limit, the wait time for the extra bytes may be less than 1 jiffy, tg_within_bps_limit() adds one extra 1 jiffy. However, when taking roundup time into account, the extra 1 jiffy may become not necessary, then bps limit becomes not accurate. This way causes blktests throtl/001 failure in case of CONFIG_HZ_100=y. Fix it by not adding the 1 jiffy in case that the roundup time can cover it. Cc: Tejun Heo <tj@kernel.org> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-throttle.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)