Message ID | 67ffcf14c2d15622b84c60a493b590dd81a07f51.1503068984.git.shli@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/18/2017 09:13 AM, Shaohua Li wrote: > discard request usually is very big and easily use all bandwidth budget > of a cgroup. discard request size doesn't really mean the size of data > written, so it doesn't make sense to account it into bandwidth budget. > This patch ignores discard requests size. It makes sense to account > discard request into iops budget though. Some (most) devices to touch media for a discard operation, but the cost tends to be fairly constant and independent of discard size. Would it make sense to just treat it as a constant cost? Zero cost seems wrong.
On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote: > On 08/18/2017 09:13 AM, Shaohua Li wrote: > > discard request usually is very big and easily use all bandwidth budget > > of a cgroup. discard request size doesn't really mean the size of data > > written, so it doesn't make sense to account it into bandwidth budget. > > This patch ignores discard requests size. It makes sense to account > > discard request into iops budget though. > > Some (most) devices to touch media for a discard operation, but the > cost tends to be fairly constant and independent of discard size. > Would it make sense to just treat it as a constant cost? Zero > cost seems wrong. that would be hard to find the cost. Would this make sense? min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9)
On 08/18/2017 10:28 AM, Shaohua Li wrote: > On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote: >> On 08/18/2017 09:13 AM, Shaohua Li wrote: >>> discard request usually is very big and easily use all bandwidth budget >>> of a cgroup. discard request size doesn't really mean the size of data >>> written, so it doesn't make sense to account it into bandwidth budget. >>> This patch ignores discard requests size. It makes sense to account >>> discard request into iops budget though. >> >> Some (most) devices to touch media for a discard operation, but the >> cost tends to be fairly constant and independent of discard size. >> Would it make sense to just treat it as a constant cost? Zero >> cost seems wrong. > > that would be hard to find the cost. Would this make sense? > > min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9) It's all going to be approximations, for sure, unfortunately it isn't an exact science. Why not just use a constant small value? If we assume that a 4k and 8M discard end up writing roughly the same to media, then it would follow that just using a smaller constant value (regardless of actual discard command size) would be useful.
On Fri, Aug 18, 2017 at 01:06:46PM -0600, Jens Axboe wrote: > On 08/18/2017 10:28 AM, Shaohua Li wrote: > > On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote: > >> On 08/18/2017 09:13 AM, Shaohua Li wrote: > >>> discard request usually is very big and easily use all bandwidth budget > >>> of a cgroup. discard request size doesn't really mean the size of data > >>> written, so it doesn't make sense to account it into bandwidth budget. > >>> This patch ignores discard requests size. It makes sense to account > >>> discard request into iops budget though. > >> > >> Some (most) devices to touch media for a discard operation, but the > >> cost tends to be fairly constant and independent of discard size. > >> Would it make sense to just treat it as a constant cost? Zero > >> cost seems wrong. > > > > that would be hard to find the cost. Would this make sense? > > > > min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9) > > It's all going to be approximations, for sure, unfortunately it isn't > an exact science. Why not just use a constant small value? If we assume > that a 4k and 8M discard end up writing roughly the same to media, then > it would follow that just using a smaller constant value (regardless of > actual discard command size) would be useful. Sounds good. what number do you suggest? queue_max_sectors or a random number? Thanks, Shaohua
On 08/18/2017 01:12 PM, Shaohua Li wrote: > On Fri, Aug 18, 2017 at 01:06:46PM -0600, Jens Axboe wrote: >> On 08/18/2017 10:28 AM, Shaohua Li wrote: >>> On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote: >>>> On 08/18/2017 09:13 AM, Shaohua Li wrote: >>>>> discard request usually is very big and easily use all bandwidth budget >>>>> of a cgroup. discard request size doesn't really mean the size of data >>>>> written, so it doesn't make sense to account it into bandwidth budget. >>>>> This patch ignores discard requests size. It makes sense to account >>>>> discard request into iops budget though. >>>> >>>> Some (most) devices to touch media for a discard operation, but the >>>> cost tends to be fairly constant and independent of discard size. >>>> Would it make sense to just treat it as a constant cost? Zero >>>> cost seems wrong. >>> >>> that would be hard to find the cost. Would this make sense? >>> >>> min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9) >> >> It's all going to be approximations, for sure, unfortunately it isn't >> an exact science. Why not just use a constant small value? If we assume >> that a 4k and 8M discard end up writing roughly the same to media, then >> it would follow that just using a smaller constant value (regardless of >> actual discard command size) would be useful. > > Sounds good. what number do you suggest? queue_max_sectors or a > random number? Not sure why you want to go that large? Isn't the idea to throttle on actual device bandwidth used? In which case a much smaller number should be a lot closer to reality, say like 64 bytes per discard, regardless of actual size. That still gives you some throttling instead of just ignoring it, but at a more reasonable rate.
On Fri, Aug 18, 2017 at 01:15:15PM -0600, Jens Axboe wrote: > On 08/18/2017 01:12 PM, Shaohua Li wrote: > > On Fri, Aug 18, 2017 at 01:06:46PM -0600, Jens Axboe wrote: > >> On 08/18/2017 10:28 AM, Shaohua Li wrote: > >>> On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote: > >>>> On 08/18/2017 09:13 AM, Shaohua Li wrote: > >>>>> discard request usually is very big and easily use all bandwidth budget > >>>>> of a cgroup. discard request size doesn't really mean the size of data > >>>>> written, so it doesn't make sense to account it into bandwidth budget. > >>>>> This patch ignores discard requests size. It makes sense to account > >>>>> discard request into iops budget though. > >>>> > >>>> Some (most) devices to touch media for a discard operation, but the > >>>> cost tends to be fairly constant and independent of discard size. > >>>> Would it make sense to just treat it as a constant cost? Zero > >>>> cost seems wrong. > >>> > >>> that would be hard to find the cost. Would this make sense? > >>> > >>> min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9) > >> > >> It's all going to be approximations, for sure, unfortunately it isn't > >> an exact science. Why not just use a constant small value? If we assume > >> that a 4k and 8M discard end up writing roughly the same to media, then > >> it would follow that just using a smaller constant value (regardless of > >> actual discard command size) would be useful. > > > > Sounds good. what number do you suggest? queue_max_sectors or a > > random number? > > Not sure why you want to go that large? Isn't the idea to throttle on > actual device bandwidth used? In which case a much smaller number should > be a lot closer to reality, say like 64 bytes per discard, regardless > of actual size. That still gives you some throttling instead of just > ignoring it, but at a more reasonable rate. hmm, my guess is discard is more costly than normal write in some drivers, but that's just my guess. I'll make it 512B then to make sure nothing is blown. Thanks, Shaohua
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 6a4c4c4..f80acc1 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -380,6 +380,13 @@ static unsigned int tg_iops_limit(struct throtl_grp *tg, int rw) } \ } while (0) +static inline unsigned int throtl_bio_data_size(struct bio *bio) +{ + if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) + return 0; + return bio->bi_iter.bi_size; +} + static void throtl_qnode_init(struct throtl_qnode *qn, struct throtl_grp *tg) { INIT_LIST_HEAD(&qn->node); @@ -932,6 +939,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, bool rw = bio_data_dir(bio); u64 bytes_allowed, extra_bytes, tmp; unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; + unsigned int bio_size = throtl_bio_data_size(bio); jiffy_elapsed = jiffy_elapsed_rnd = jiffies - tg->slice_start[rw]; @@ -945,14 +953,14 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, do_div(tmp, HZ); bytes_allowed = tmp; - if (tg->bytes_disp[rw] + bio->bi_iter.bi_size <= bytes_allowed) { + if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) { if (wait) *wait = 0; return true; } /* Calc approx time to dispatch */ - extra_bytes = tg->bytes_disp[rw] + bio->bi_iter.bi_size - bytes_allowed; + extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; jiffy_wait = div64_u64(extra_bytes * HZ, tg_bps_limit(tg, rw)); if (!jiffy_wait) @@ -1032,11 +1040,12 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) { bool rw = bio_data_dir(bio); + unsigned int bio_size = throtl_bio_data_size(bio); /* Charge the bio to the group */ - tg->bytes_disp[rw] += bio->bi_iter.bi_size; + tg->bytes_disp[rw] += bio_size; tg->io_disp[rw]++; - tg->last_bytes_disp[rw] += bio->bi_iter.bi_size; + tg->last_bytes_disp[rw] += bio_size; tg->last_io_disp[rw]++; /*
discard request usually is very big and easily use all bandwidth budget of a cgroup. discard request size doesn't really mean the size of data written, so it doesn't make sense to account it into bandwidth budget. This patch ignores discard requests size. It makes sense to account discard request into iops budget though. Signed-off-by: Shaohua Li <shli@fb.com> --- block/blk-throttle.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)