blk-throttle: ignore discard request size

Message ID	67ffcf14c2d15622b84c60a493b590dd81a07f51.1503068984.git.shli@fb.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> Smtp-Origin-Hostprefix: devbig From: Shaohua Li <shli@fb.com> Smtp-Origin-Hostname: devbig638.prn2.facebook.com To: <linux-block@vger.kernel.org> CC: <Kernel-team@fb.com>, <axboe@kernel.dk> Smtp-Origin-Cluster: prn2c22 Subject: [PATCH] blk-throttle: ignore discard request size Date: Fri, 18 Aug 2017 08:13:06 -0700 Message-ID: <67ffcf14c2d15622b84c60a493b590dd81a07f51.1503068984.git.shli@fb.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Shaohua Li Aug. 18, 2017, 3:13 p.m. UTC

discard request usually is very big and easily use all bandwidth budget
of a cgroup. discard request size doesn't really mean the size of data
written, so it doesn't make sense to account it into bandwidth budget.
This patch ignores discard requests size. It makes sense to account
discard request into iops budget though.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 block/blk-throttle.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Jens Axboe Aug. 18, 2017, 3:35 p.m. UTC | #1

On 08/18/2017 09:13 AM, Shaohua Li wrote:
> discard request usually is very big and easily use all bandwidth budget
> of a cgroup. discard request size doesn't really mean the size of data
> written, so it doesn't make sense to account it into bandwidth budget.
> This patch ignores discard requests size. It makes sense to account
> discard request into iops budget though.

Some (most) devices to touch media for a discard operation, but the
cost tends to be fairly constant and independent of discard size.
Would it make sense to just treat it as a constant cost? Zero
cost seems wrong.

Shaohua Li Aug. 18, 2017, 4:28 p.m. UTC | #2

On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote:
> On 08/18/2017 09:13 AM, Shaohua Li wrote:
> > discard request usually is very big and easily use all bandwidth budget
> > of a cgroup. discard request size doesn't really mean the size of data
> > written, so it doesn't make sense to account it into bandwidth budget.
> > This patch ignores discard requests size. It makes sense to account
> > discard request into iops budget though.
> 
> Some (most) devices to touch media for a discard operation, but the
> cost tends to be fairly constant and independent of discard size.
> Would it make sense to just treat it as a constant cost? Zero
> cost seems wrong.

that would be hard to find the cost. Would this make sense?

min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9)

Jens Axboe Aug. 18, 2017, 7:06 p.m. UTC | #3

On 08/18/2017 10:28 AM, Shaohua Li wrote:
> On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote:
>> On 08/18/2017 09:13 AM, Shaohua Li wrote:
>>> discard request usually is very big and easily use all bandwidth budget
>>> of a cgroup. discard request size doesn't really mean the size of data
>>> written, so it doesn't make sense to account it into bandwidth budget.
>>> This patch ignores discard requests size. It makes sense to account
>>> discard request into iops budget though.
>>
>> Some (most) devices to touch media for a discard operation, but the
>> cost tends to be fairly constant and independent of discard size.
>> Would it make sense to just treat it as a constant cost? Zero
>> cost seems wrong.
> 
> that would be hard to find the cost. Would this make sense?
> 
> min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9)

It's all going to be approximations, for sure, unfortunately it isn't
an exact science. Why not just use a constant small value? If we assume
that a 4k and 8M discard end up writing roughly the same to media, then
it would follow that just using a smaller constant value (regardless of
actual discard command size) would be useful.

Shaohua Li Aug. 18, 2017, 7:12 p.m. UTC | #4

On Fri, Aug 18, 2017 at 01:06:46PM -0600, Jens Axboe wrote:
> On 08/18/2017 10:28 AM, Shaohua Li wrote:
> > On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote:
> >> On 08/18/2017 09:13 AM, Shaohua Li wrote:
> >>> discard request usually is very big and easily use all bandwidth budget
> >>> of a cgroup. discard request size doesn't really mean the size of data
> >>> written, so it doesn't make sense to account it into bandwidth budget.
> >>> This patch ignores discard requests size. It makes sense to account
> >>> discard request into iops budget though.
> >>
> >> Some (most) devices to touch media for a discard operation, but the
> >> cost tends to be fairly constant and independent of discard size.
> >> Would it make sense to just treat it as a constant cost? Zero
> >> cost seems wrong.
> > 
> > that would be hard to find the cost. Would this make sense?
> > 
> > min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9)
> 
> It's all going to be approximations, for sure, unfortunately it isn't
> an exact science. Why not just use a constant small value? If we assume
> that a 4k and 8M discard end up writing roughly the same to media, then
> it would follow that just using a smaller constant value (regardless of
> actual discard command size) would be useful.

Sounds good. what number do you suggest? queue_max_sectors or a
random number?

Thanks,
Shaohua

Jens Axboe Aug. 18, 2017, 7:15 p.m. UTC | #5

On 08/18/2017 01:12 PM, Shaohua Li wrote:
> On Fri, Aug 18, 2017 at 01:06:46PM -0600, Jens Axboe wrote:
>> On 08/18/2017 10:28 AM, Shaohua Li wrote:
>>> On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote:
>>>> On 08/18/2017 09:13 AM, Shaohua Li wrote:
>>>>> discard request usually is very big and easily use all bandwidth budget
>>>>> of a cgroup. discard request size doesn't really mean the size of data
>>>>> written, so it doesn't make sense to account it into bandwidth budget.
>>>>> This patch ignores discard requests size. It makes sense to account
>>>>> discard request into iops budget though.
>>>>
>>>> Some (most) devices to touch media for a discard operation, but the
>>>> cost tends to be fairly constant and independent of discard size.
>>>> Would it make sense to just treat it as a constant cost? Zero
>>>> cost seems wrong.
>>>
>>> that would be hard to find the cost. Would this make sense?
>>>
>>> min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9)
>>
>> It's all going to be approximations, for sure, unfortunately it isn't
>> an exact science. Why not just use a constant small value? If we assume
>> that a 4k and 8M discard end up writing roughly the same to media, then
>> it would follow that just using a smaller constant value (regardless of
>> actual discard command size) would be useful.
> 
> Sounds good. what number do you suggest? queue_max_sectors or a
> random number?

Not sure why you want to go that large? Isn't the idea to throttle on
actual device bandwidth used? In which case a much smaller number should
be a lot closer to reality, say like 64 bytes per discard, regardless
of actual size. That still gives you some throttling instead of just
ignoring it, but at a more reasonable rate.

Shaohua Li Aug. 18, 2017, 7:19 p.m. UTC | #6

On Fri, Aug 18, 2017 at 01:15:15PM -0600, Jens Axboe wrote:
> On 08/18/2017 01:12 PM, Shaohua Li wrote:
> > On Fri, Aug 18, 2017 at 01:06:46PM -0600, Jens Axboe wrote:
> >> On 08/18/2017 10:28 AM, Shaohua Li wrote:
> >>> On Fri, Aug 18, 2017 at 09:35:01AM -0600, Jens Axboe wrote:
> >>>> On 08/18/2017 09:13 AM, Shaohua Li wrote:
> >>>>> discard request usually is very big and easily use all bandwidth budget
> >>>>> of a cgroup. discard request size doesn't really mean the size of data
> >>>>> written, so it doesn't make sense to account it into bandwidth budget.
> >>>>> This patch ignores discard requests size. It makes sense to account
> >>>>> discard request into iops budget though.
> >>>>
> >>>> Some (most) devices to touch media for a discard operation, but the
> >>>> cost tends to be fairly constant and independent of discard size.
> >>>> Would it make sense to just treat it as a constant cost? Zero
> >>>> cost seems wrong.
> >>>
> >>> that would be hard to find the cost. Would this make sense?
> >>>
> >>> min_t(unsigned int, bio->bi_iter.bi_size, queue_max_sectors(q) << 9)
> >>
> >> It's all going to be approximations, for sure, unfortunately it isn't
> >> an exact science. Why not just use a constant small value? If we assume
> >> that a 4k and 8M discard end up writing roughly the same to media, then
> >> it would follow that just using a smaller constant value (regardless of
> >> actual discard command size) would be useful.
> > 
> > Sounds good. what number do you suggest? queue_max_sectors or a
> > random number?
> 
> Not sure why you want to go that large? Isn't the idea to throttle on
> actual device bandwidth used? In which case a much smaller number should
> be a lot closer to reality, say like 64 bytes per discard, regardless
> of actual size. That still gives you some throttling instead of just
> ignoring it, but at a more reasonable rate.

hmm, my guess is discard is more costly than normal write in some drivers, but
that's just my guess. I'll make it 512B then to make sure nothing is blown.

Thanks,
Shaohua

blk-throttle: ignore discard request size

Commit Message

Comments

Patch