[2/2] block: cache current nsec time in struct blk_plug

Message ID	20240115215840.54432-3-axboe@kernel.dk (mailing list archive)
State	New, archived
Headers	show Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EE841B598 for <linux-block@vger.kernel.org>; Mon, 15 Jan 2024 21:58:48 +0000 (UTC) From: Jens Axboe <axboe@kernel.dk> To: linux-block@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Subject: [PATCH 2/2] block: cache current nsec time in struct blk_plug Date: Mon, 15 Jan 2024 14:53:55 -0700 Message-ID: <20240115215840.54432-3-axboe@kernel.dk> In-Reply-To: <20240115215840.54432-1-axboe@kernel.dk> References: <20240115215840.54432-1-axboe@kernel.dk> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Cache issue side time querying \| expand [PATCHSET,RFC,0/2] Cache issue side time querying [1/2] block: add blk_time_get_ns() helper [2/2] block: cache current nsec time in struct blk_plug

Jens Axboe Jan. 15, 2024, 9:53 p.m. UTC

Querying the current time is the most costly thing we do in the block
layer per IO, and depending on kernel config settings, we may do it
many times per IO.

None of the callers actually need nsec granularity. Take advantage of
that by caching the current time in the plug, with the assumption here
being that any time checking will be temporally close enough that the
slight loss of precision doesn't matter.

If the block plug gets flushed, eg on preempt or schedule out, then
we invalidate the cached clock.

On a basic peak IOPS test case with iostats enabled, this changes
the performance from:

IOPS=108.41M, BW=52.93GiB/s, IOS/call=31/31
IOPS=108.43M, BW=52.94GiB/s, IOS/call=32/32
IOPS=108.29M, BW=52.88GiB/s, IOS/call=31/32
IOPS=108.35M, BW=52.91GiB/s, IOS/call=32/32
IOPS=108.42M, BW=52.94GiB/s, IOS/call=31/31
IOPS=108.40M, BW=52.93GiB/s, IOS/call=32/32
IOPS=108.31M, BW=52.89GiB/s, IOS/call=32/31

to

IOPS=115.57M, BW=56.43GiB/s, IOS/call=32/32
IOPS=115.61M, BW=56.45GiB/s, IOS/call=32/32
IOPS=115.54M, BW=56.41GiB/s, IOS/call=32/31
IOPS=115.51M, BW=56.40GiB/s, IOS/call=31/31
IOPS=115.59M, BW=56.44GiB/s, IOS/call=32/32
IOPS=115.08M, BW=56.19GiB/s, IOS/call=31/31

which is more than a 6% improvement in performance. Looking at perf diff,
we can see a huge reduction in time overhead:

    10.55%     -9.88%  [kernel.vmlinux]  [k] read_tsc
     1.31%     -1.22%  [kernel.vmlinux]  [k] ktime_get

Note that since this relies on blk_plug for the caching, it's only
applicable to the issue side. But this is where most of the time calls
happen anyway. It's also worth nothing that the above testing doesn't
enable any of the higher cost CPU items on the block layer side, like
wbt, cgroups, iocost, etc, which all would add additional time querying.
IOW, results would likely look even better in comparison with those
enabled, as distros would do.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       |  1 +
 include/linux/blkdev.h | 11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

Kanchan Joshi Jan. 16, 2024, 9:51 a.m. UTC | #1

On 1/16/2024 3:23 AM, Jens Axboe wrote:

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 11342af420d0..cc4db4d92c75 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>   	if (tsk->plug)
>   		return;
>   
> +	plug->cur_ktime = 0;
>   	plug->mq_list = NULL;
>   	plug->cached_rq = NULL;
>   	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2f9ceea0e23b..23c237b22071 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -942,6 +942,7 @@ struct blk_plug {
>   
>   	/* if ios_left is > 1, we can batch tag/rq allocations */
>   	struct request *cached_rq;
> +	u64 cur_ktime;
>   	unsigned short nr_ios;
>   
>   	unsigned short rq_count;
> @@ -977,7 +978,15 @@ long nr_blockdev_pages(void);
>   
>   static inline u64 blk_time_get_ns(void)
>   {
> -	return ktime_get_ns();
> +	struct blk_plug *plug = current->plug;
> +
> +	if (!plug)
> +		return ktime_get_ns();
> +	if (!(plug->cur_ktime & 1ULL)) {
> +		plug->cur_ktime = ktime_get_ns();
> +		plug->cur_ktime |= 1ULL;
> +	}
> +	return plug->cur_ktime;

I did not understand the relevance of 1ULL here.
If ktime_get_ns() returns even value, it will turn that into an odd 
value before caching. And that value will be returned for the subsequent 
calls.
But how is that better compared to just caching whatever ktime_get_ns() 
returned.

Johannes Thumshirn Jan. 16, 2024, 10:25 a.m. UTC | #2

On 16.01.24 10:52, Kanchan Joshi wrote:
>> +	if (!plug)
>> +		return ktime_get_ns();
>> +	if (!(plug->cur_ktime & 1ULL)) {
>> +		plug->cur_ktime = ktime_get_ns();
>> +		plug->cur_ktime |= 1ULL;
>> +	}
>> +	return plug->cur_ktime;
> 
> I did not understand the relevance of 1ULL here.
> If ktime_get_ns() returns even value, it will turn that into an odd
> value before caching. And that value will be returned for the subsequent
> calls.
> But how is that better compared to just caching whatever ktime_get_ns()
> returned.
> 
> 

IIUC it's an indicator if plug->cur_ktime has been set or not.
But I don't understand why we can't compare with 0?

Kanchan Joshi Jan. 16, 2024, 10:47 a.m. UTC | #3

On 1/16/2024 3:55 PM, Johannes Thumshirn wrote:
> On 16.01.24 10:52, Kanchan Joshi wrote:
>>> +	if (!plug)
>>> +		return ktime_get_ns();
>>> +	if (!(plug->cur_ktime & 1ULL)) {
>>> +		plug->cur_ktime = ktime_get_ns();
>>> +		plug->cur_ktime |= 1ULL;
>>> +	}
>>> +	return plug->cur_ktime;
>>
>> I did not understand the relevance of 1ULL here.
>> If ktime_get_ns() returns even value, it will turn that into an odd
>> value before caching. And that value will be returned for the subsequent
>> calls.
>> But how is that better compared to just caching whatever ktime_get_ns()
>> returned.
>>
>>
> 
> IIUC it's an indicator if plug->cur_ktime has been set or not.
> But I don't understand why we can't compare with 0?

Yes, that's what I meant by 'just caching whatever ktime_get_ns() returned'.
if (!plug->cur_ktime)
	plug->cur_ktime = ktime_get_ns();

An indicator should not alter the return (unless there is a reason).

Jens Axboe Jan. 16, 2024, 2:43 p.m. UTC | #4

On 1/16/24 2:51 AM, Kanchan Joshi wrote:
> On 1/16/2024 3:23 AM, Jens Axboe wrote:
> 
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 11342af420d0..cc4db4d92c75 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>>   	if (tsk->plug)
>>   		return;
>>   
>> +	plug->cur_ktime = 0;
>>   	plug->mq_list = NULL;
>>   	plug->cached_rq = NULL;
>>   	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 2f9ceea0e23b..23c237b22071 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -942,6 +942,7 @@ struct blk_plug {
>>   
>>   	/* if ios_left is > 1, we can batch tag/rq allocations */
>>   	struct request *cached_rq;
>> +	u64 cur_ktime;
>>   	unsigned short nr_ios;
>>   
>>   	unsigned short rq_count;
>> @@ -977,7 +978,15 @@ long nr_blockdev_pages(void);
>>   
>>   static inline u64 blk_time_get_ns(void)
>>   {
>> -	return ktime_get_ns();
>> +	struct blk_plug *plug = current->plug;
>> +
>> +	if (!plug)
>> +		return ktime_get_ns();
>> +	if (!(plug->cur_ktime & 1ULL)) {
>> +		plug->cur_ktime = ktime_get_ns();
>> +		plug->cur_ktime |= 1ULL;
>> +	}
>> +	return plug->cur_ktime;
> 
> I did not understand the relevance of 1ULL here. If ktime_get_ns()
> returns even value, it will turn that into an odd value before
> caching.

Right, it's potentially round it up by 1 nsec.

> And that value will be returned for the subsequent calls. But
> how is that better compared to just caching whatever ktime_get_ns()
> returned.

0 could be a valid time. You could argue that this doesn't matter, we'd
just do an extra ktime_get_ns() in that case. And that's probably fine.
The LSB was meant to indicate "time stamp is valid".

But not that important imho, I'll either add a comment or just use 0 as
both the initializer (as it is now) and non-zero to indicate if the
timestamp is valid or not.

Keith Busch Jan. 16, 2024, 4:13 p.m. UTC | #5

On Mon, Jan 15, 2024 at 02:53:55PM -0700, Jens Axboe wrote:
> If the block plug gets flushed, eg on preempt or schedule out, then
> we invalidate the cached clock.

There must be something implicitly happening that I am missing. Where is
the 'cur_time' cached clock invalidated on a plug flush?
 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 11342af420d0..cc4db4d92c75 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>  	if (tsk->plug)
>  		return;
>  
> +	plug->cur_ktime = 0;
>  	plug->mq_list = NULL;
>  	plug->cached_rq = NULL;
>  	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2f9ceea0e23b..23c237b22071 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -942,6 +942,7 @@ struct blk_plug {
>  
>  	/* if ios_left is > 1, we can batch tag/rq allocations */
>  	struct request *cached_rq;
> +	u64 cur_ktime;
>  	unsigned short nr_ios;
>  
>  	unsigned short rq_count;
> @@ -977,7 +978,15 @@ long nr_blockdev_pages(void);
>  
>  static inline u64 blk_time_get_ns(void)
>  {
> -	return ktime_get_ns();
> +	struct blk_plug *plug = current->plug;
> +
> +	if (!plug)
> +		return ktime_get_ns();
> +	if (!(plug->cur_ktime & 1ULL)) {
> +		plug->cur_ktime = ktime_get_ns();
> +		plug->cur_ktime |= 1ULL;
> +	}
> +	return plug->cur_ktime;
>  }
>  #else /* CONFIG_BLOCK */
>  struct blk_plug {

Jens Axboe Jan. 16, 2024, 4:18 p.m. UTC | #6

On 1/16/24 9:13 AM, Keith Busch wrote:
> On Mon, Jan 15, 2024 at 02:53:55PM -0700, Jens Axboe wrote:
>> If the block plug gets flushed, eg on preempt or schedule out, then
>> we invalidate the cached clock.
> 
> There must be something implicitly happening that I am missing. Where is
> the 'cur_time' cached clock invalidated on a plug flush?

It's not just yet, and it's also missing on preemption. I have those in
my current tree. The current test doesn't hit any of these, so it didn't
impact the testing.

[2/2] block: cache current nsec time in struct blk_plug

Commit Message

Comments

Patch