Message ID | 20240115215840.54432-3-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Cache issue side time querying | expand |
On 1/16/2024 3:23 AM, Jens Axboe wrote: > diff --git a/block/blk-core.c b/block/blk-core.c > index 11342af420d0..cc4db4d92c75 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios) > if (tsk->plug) > return; > > + plug->cur_ktime = 0; > plug->mq_list = NULL; > plug->cached_rq = NULL; > plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 2f9ceea0e23b..23c237b22071 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -942,6 +942,7 @@ struct blk_plug { > > /* if ios_left is > 1, we can batch tag/rq allocations */ > struct request *cached_rq; > + u64 cur_ktime; > unsigned short nr_ios; > > unsigned short rq_count; > @@ -977,7 +978,15 @@ long nr_blockdev_pages(void); > > static inline u64 blk_time_get_ns(void) > { > - return ktime_get_ns(); > + struct blk_plug *plug = current->plug; > + > + if (!plug) > + return ktime_get_ns(); > + if (!(plug->cur_ktime & 1ULL)) { > + plug->cur_ktime = ktime_get_ns(); > + plug->cur_ktime |= 1ULL; > + } > + return plug->cur_ktime; I did not understand the relevance of 1ULL here. If ktime_get_ns() returns even value, it will turn that into an odd value before caching. And that value will be returned for the subsequent calls. But how is that better compared to just caching whatever ktime_get_ns() returned.
On 16.01.24 10:52, Kanchan Joshi wrote: >> + if (!plug) >> + return ktime_get_ns(); >> + if (!(plug->cur_ktime & 1ULL)) { >> + plug->cur_ktime = ktime_get_ns(); >> + plug->cur_ktime |= 1ULL; >> + } >> + return plug->cur_ktime; > > I did not understand the relevance of 1ULL here. > If ktime_get_ns() returns even value, it will turn that into an odd > value before caching. And that value will be returned for the subsequent > calls. > But how is that better compared to just caching whatever ktime_get_ns() > returned. > > IIUC it's an indicator if plug->cur_ktime has been set or not. But I don't understand why we can't compare with 0?
On 1/16/2024 3:55 PM, Johannes Thumshirn wrote: > On 16.01.24 10:52, Kanchan Joshi wrote: >>> + if (!plug) >>> + return ktime_get_ns(); >>> + if (!(plug->cur_ktime & 1ULL)) { >>> + plug->cur_ktime = ktime_get_ns(); >>> + plug->cur_ktime |= 1ULL; >>> + } >>> + return plug->cur_ktime; >> >> I did not understand the relevance of 1ULL here. >> If ktime_get_ns() returns even value, it will turn that into an odd >> value before caching. And that value will be returned for the subsequent >> calls. >> But how is that better compared to just caching whatever ktime_get_ns() >> returned. >> >> > > IIUC it's an indicator if plug->cur_ktime has been set or not. > But I don't understand why we can't compare with 0? Yes, that's what I meant by 'just caching whatever ktime_get_ns() returned'. if (!plug->cur_ktime) plug->cur_ktime = ktime_get_ns(); An indicator should not alter the return (unless there is a reason).
On 1/16/24 2:51 AM, Kanchan Joshi wrote: > On 1/16/2024 3:23 AM, Jens Axboe wrote: > >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 11342af420d0..cc4db4d92c75 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios) >> if (tsk->plug) >> return; >> >> + plug->cur_ktime = 0; >> plug->mq_list = NULL; >> plug->cached_rq = NULL; >> plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT); >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >> index 2f9ceea0e23b..23c237b22071 100644 >> --- a/include/linux/blkdev.h >> +++ b/include/linux/blkdev.h >> @@ -942,6 +942,7 @@ struct blk_plug { >> >> /* if ios_left is > 1, we can batch tag/rq allocations */ >> struct request *cached_rq; >> + u64 cur_ktime; >> unsigned short nr_ios; >> >> unsigned short rq_count; >> @@ -977,7 +978,15 @@ long nr_blockdev_pages(void); >> >> static inline u64 blk_time_get_ns(void) >> { >> - return ktime_get_ns(); >> + struct blk_plug *plug = current->plug; >> + >> + if (!plug) >> + return ktime_get_ns(); >> + if (!(plug->cur_ktime & 1ULL)) { >> + plug->cur_ktime = ktime_get_ns(); >> + plug->cur_ktime |= 1ULL; >> + } >> + return plug->cur_ktime; > > I did not understand the relevance of 1ULL here. If ktime_get_ns() > returns even value, it will turn that into an odd value before > caching. Right, it's potentially round it up by 1 nsec. > And that value will be returned for the subsequent calls. But > how is that better compared to just caching whatever ktime_get_ns() > returned. 0 could be a valid time. You could argue that this doesn't matter, we'd just do an extra ktime_get_ns() in that case. And that's probably fine. The LSB was meant to indicate "time stamp is valid". But not that important imho, I'll either add a comment or just use 0 as both the initializer (as it is now) and non-zero to indicate if the timestamp is valid or not.
On Mon, Jan 15, 2024 at 02:53:55PM -0700, Jens Axboe wrote: > If the block plug gets flushed, eg on preempt or schedule out, then > we invalidate the cached clock. There must be something implicitly happening that I am missing. Where is the 'cur_time' cached clock invalidated on a plug flush? > diff --git a/block/blk-core.c b/block/blk-core.c > index 11342af420d0..cc4db4d92c75 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios) > if (tsk->plug) > return; > > + plug->cur_ktime = 0; > plug->mq_list = NULL; > plug->cached_rq = NULL; > plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 2f9ceea0e23b..23c237b22071 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -942,6 +942,7 @@ struct blk_plug { > > /* if ios_left is > 1, we can batch tag/rq allocations */ > struct request *cached_rq; > + u64 cur_ktime; > unsigned short nr_ios; > > unsigned short rq_count; > @@ -977,7 +978,15 @@ long nr_blockdev_pages(void); > > static inline u64 blk_time_get_ns(void) > { > - return ktime_get_ns(); > + struct blk_plug *plug = current->plug; > + > + if (!plug) > + return ktime_get_ns(); > + if (!(plug->cur_ktime & 1ULL)) { > + plug->cur_ktime = ktime_get_ns(); > + plug->cur_ktime |= 1ULL; > + } > + return plug->cur_ktime; > } > #else /* CONFIG_BLOCK */ > struct blk_plug {
On 1/16/24 9:13 AM, Keith Busch wrote: > On Mon, Jan 15, 2024 at 02:53:55PM -0700, Jens Axboe wrote: >> If the block plug gets flushed, eg on preempt or schedule out, then >> we invalidate the cached clock. > > There must be something implicitly happening that I am missing. Where is > the 'cur_time' cached clock invalidated on a plug flush? It's not just yet, and it's also missing on preemption. I have those in my current tree. The current test doesn't hit any of these, so it didn't impact the testing.
diff --git a/block/blk-core.c b/block/blk-core.c index 11342af420d0..cc4db4d92c75 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios) if (tsk->plug) return; + plug->cur_ktime = 0; plug->mq_list = NULL; plug->cached_rq = NULL; plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2f9ceea0e23b..23c237b22071 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -942,6 +942,7 @@ struct blk_plug { /* if ios_left is > 1, we can batch tag/rq allocations */ struct request *cached_rq; + u64 cur_ktime; unsigned short nr_ios; unsigned short rq_count; @@ -977,7 +978,15 @@ long nr_blockdev_pages(void); static inline u64 blk_time_get_ns(void) { - return ktime_get_ns(); + struct blk_plug *plug = current->plug; + + if (!plug) + return ktime_get_ns(); + if (!(plug->cur_ktime & 1ULL)) { + plug->cur_ktime = ktime_get_ns(); + plug->cur_ktime |= 1ULL; + } + return plug->cur_ktime; } #else /* CONFIG_BLOCK */ struct blk_plug {
Querying the current time is the most costly thing we do in the block layer per IO, and depending on kernel config settings, we may do it many times per IO. None of the callers actually need nsec granularity. Take advantage of that by caching the current time in the plug, with the assumption here being that any time checking will be temporally close enough that the slight loss of precision doesn't matter. If the block plug gets flushed, eg on preempt or schedule out, then we invalidate the cached clock. On a basic peak IOPS test case with iostats enabled, this changes the performance from: IOPS=108.41M, BW=52.93GiB/s, IOS/call=31/31 IOPS=108.43M, BW=52.94GiB/s, IOS/call=32/32 IOPS=108.29M, BW=52.88GiB/s, IOS/call=31/32 IOPS=108.35M, BW=52.91GiB/s, IOS/call=32/32 IOPS=108.42M, BW=52.94GiB/s, IOS/call=31/31 IOPS=108.40M, BW=52.93GiB/s, IOS/call=32/32 IOPS=108.31M, BW=52.89GiB/s, IOS/call=32/31 to IOPS=115.57M, BW=56.43GiB/s, IOS/call=32/32 IOPS=115.61M, BW=56.45GiB/s, IOS/call=32/32 IOPS=115.54M, BW=56.41GiB/s, IOS/call=32/31 IOPS=115.51M, BW=56.40GiB/s, IOS/call=31/31 IOPS=115.59M, BW=56.44GiB/s, IOS/call=32/32 IOPS=115.08M, BW=56.19GiB/s, IOS/call=31/31 which is more than a 6% improvement in performance. Looking at perf diff, we can see a huge reduction in time overhead: 10.55% -9.88% [kernel.vmlinux] [k] read_tsc 1.31% -1.22% [kernel.vmlinux] [k] ktime_get Note that since this relies on blk_plug for the caching, it's only applicable to the issue side. But this is where most of the time calls happen anyway. It's also worth nothing that the above testing doesn't enable any of the higher cost CPU items on the block layer side, like wbt, cgroups, iocost, etc, which all would add additional time querying. IOW, results would likely look even better in comparison with those enabled, as distros would do. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/blk-core.c | 1 + include/linux/blkdev.h | 11 ++++++++++- 2 files changed, 11 insertions(+), 1 deletion(-)