diff mbox series

[v2] timekeeping: move multigrain timestamp floor handling into timekeeper

Message ID 20240912-mgtime-v2-1-54db84afb7a7@kernel.org (mailing list archive)
State New
Headers show
Series [v2] timekeeping: move multigrain timestamp floor handling into timekeeper | expand

Commit Message

Jeff Layton Sept. 12, 2024, 6:02 p.m. UTC
The kernel test robot reported a performance hit in some will-it-scale
tests due to the multigrain timestamp patches.  My own testing showed
about a 7% drop in performance on the pipe1_threads test, and the data
showed that coarse_ctime() was slowing down current_time().

Move the multigrain timestamp floor tracking word into timekeeper.c. Add
two new public interfaces: The first fills a timespec64 with the later
of the coarse-grained clock and the floor time, and the second gets a
fine-grained time and tries to swap it into the floor and fills a
timespec64 with the result.

The first function returns an opaque cookie that is suitable for passing
to the second, which will use it as the "old" value in the cmpxchg.

With this patch on top of the multigrain series, the will-it-scale
pipe1_threads microbenchmark shows these averages on my test rig:

	v6.11-rc7:			103561295 (baseline)
	v6.11-rc7 + mgtime + this:	101357203 (~2% performance drop)

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409091303.31b2b713-oliver.sang@intel.com
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
This version moves more of the floor handling into the timekeeper code.

Some of the earlier concern was about mixing different time value types
in the same interface. This sort of still does that, but it does it
using an opaque cookie value instead, which I'm hoping will make the
interfaces a little cleaner. Using an opaque cookie also means that we
can change the underlying implementation at will, without breaking the
callers.

If this approach looks OK, it's probably best for me to just respin the
entire series and have Christian drop the old and pick up the new. That
should reduce some of the unnecessary churn in fs/inode.c.
---
Changes in v2:
- move floor handling completely into timekeeper code
- add new interfaces for multigrain timestamp handling
- Link to v1: https://lore.kernel.org/r/20240911-mgtime-v1-1-e4aedf1d0d15@kernel.org
---
 fs/inode.c                  | 105 +++++++++++++-------------------------------
 include/linux/timekeeping.h |   4 ++
 kernel/time/timekeeping.c   |  77 ++++++++++++++++++++++++++++++++
 3 files changed, 111 insertions(+), 75 deletions(-)


---
base-commit: 18348a38102a4efca57186740afb33f08e5f4609
change-id: 20240912-mgtime-c1280600a9a3

Best regards,

Comments

John Stultz Sept. 12, 2024, 8:11 p.m. UTC | #1
On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> The kernel test robot reported a performance hit in some will-it-scale
> tests due to the multigrain timestamp patches.  My own testing showed
> about a 7% drop in performance on the pipe1_threads test, and the data
> showed that coarse_ctime() was slowing down current_time().

So, you provided some useful detail about why coarse_ctime() was slow
in your reply earlier, but it would be good to preserve that in the
commit message here.


> Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> two new public interfaces: The first fills a timespec64 with the later
> of the coarse-grained clock and the floor time, and the second gets a
> fine-grained time and tries to swap it into the floor and fills a
> timespec64 with the result.
>
> The first function returns an opaque cookie that is suitable for passing
> to the second, which will use it as the "old" value in the cmpxchg.

The cookie usage isn't totally clear to me right off.  It feels a bit
more subtle then I'd expect.


> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5391e4167d60..bb039c9d525e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
>         .base[1] = FAST_TK_INIT,
>  };
>
> +/*
> + * This represents the latest fine-grained time that we have handed out as a
> + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> + * realtime clock on an as-needed basis.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +

So I do really like this general approach of having an internal floor
value combined with special coarse/fine grained assessors that work
with the floor, so we're not impacting the normal hotpath logic
(basically I was writing up a suggestion to this effect to the thread
with Arnd when I realized you had follow up patch in my inbox).


>  static inline void tk_normalize_xtime(struct timekeeper *tk)
>  {
>         while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
>  }
>  EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>
> +/**
> + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Adjust floor to realtime and compare it to the coarse time. Fill
> + * @ts with the latest one. Returns opaque cookie suitable to pass
> + * to ktime_get_real_ts64_mg.
> + */
> +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> +       struct timekeeper *tk = &tk_core.timekeeper;
> +       u64 floor = atomic64_read(&mg_floor);
> +       ktime_t f_real, offset, coarse;
> +       unsigned int seq;
> +
> +       WARN_ON(timekeeping_suspended);
> +
> +       do {
> +               seq = read_seqcount_begin(&tk_core.seq);
> +               *ts = tk_xtime(tk);
> +               offset = *offsets[TK_OFFS_REAL];
> +       } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +       coarse = timespec64_to_ktime(*ts);
> +       f_real = ktime_add(floor, offset);
> +       if (ktime_after(f_real, coarse))
> +               *ts = ktime_to_timespec64(f_real);
> +       return floor;
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);

Generally this looks ok to me.


> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts:                pointer to the timespec to be set
> + * @cookie:    opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> + *
> + * Get a current monotonic fine-grained time value and attempt to swap
> + * it into the floor using @cookie as the "old" value. @ts will be
> + * filled with the resulting floor value, regardless of the outcome of
> + * the swap.
> + */

Again this cookie argument usage and the behavior of this function
isn't very clear to me.

> +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> +{
> +       struct timekeeper *tk = &tk_core.timekeeper;
> +       ktime_t offset, mono, old = (ktime_t)cookie;
> +       unsigned int seq;
> +       u64 nsecs;
> +
> +       WARN_ON(timekeeping_suspended);
> +
> +       do {
> +               seq = read_seqcount_begin(&tk_core.seq);
> +
> +               ts->tv_sec = tk->xtime_sec;
> +               mono = tk->tkr_mono.base;
> +               nsecs = timekeeping_get_ns(&tk->tkr_mono);
> +               offset = *offsets[TK_OFFS_REAL];
> +       } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +       mono = ktime_add_ns(mono, nsecs);
> +       if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> +               ts->tv_nsec = 0;
> +               timespec64_add_ns(ts, nsecs);
> +       } else {
> +               *ts = ktime_to_timespec64(ktime_add(old, offset));
> +       }
> +
> +}
> +EXPORT_SYMBOL(ktime_get_real_ts64_mg);


So initially I was expecting this to look something like (sorry for
the whitespace damage here):
{
    do {
        seq = read_seqcount_begin(&tk_core.seq);
        ts->tv_sec = tk->xtime_sec;
        mono = tk->tkr_mono.base;
        nsecs = timekeeping_get_ns(&tk->tkr_mono);
        offset = *offsets[TK_OFFS_REAL];
    } while (read_seqcount_retry(&tk_core.seq, seq));

    mono = ktime_add_ns(mono, nsecs);
    do {
        old = atomic64_read(&mg_floor);
        if (floor >= mono)
            break;
    } while(!atomic64_try_cmpxchg(&mg_floor, old, mono);
    ts->tv_nsec = 0;
    timespec64_add_ns(ts, nsecs);
}

Where you read the tk data, atomically update the floor (assuming it's
not in the future) and then return the finegrained value, not needing
to manage a cookie value.

But instead, it seems like if something has happened since the cookie
value was saved (another cpu getting a fine grained timestamp), your
ktime_get_real_ts64_mg() will fall back to returning the same coarse
grained time saved to the cookie, as if no time had past?

It seems like that could cause problems:

cpu1                                     cpu2
--------------------------------------------------------------------------
                                         t2a = ktime_get_coarse_real_ts64_mg
t1a = ktime_get_coarse_real_ts64_mg()
t1b = ktime_get_real_ts64_mg(t1a)

                                         t2b = ktime_get_real_ts64_mg(t2a)

Where t2b will seem to be before t1b, even though it happened afterwards.


thanks
-john
John Stultz Sept. 12, 2024, 8:15 p.m. UTC | #2
On Thu, Sep 12, 2024 at 1:11 PM John Stultz <jstultz@google.com> wrote:
> So initially I was expecting this to look something like (sorry for
> the whitespace damage here):
> {
>     do {
>         seq = read_seqcount_begin(&tk_core.seq);
>         ts->tv_sec = tk->xtime_sec;
>         mono = tk->tkr_mono.base;
>         nsecs = timekeeping_get_ns(&tk->tkr_mono);
>         offset = *offsets[TK_OFFS_REAL];
>     } while (read_seqcount_retry(&tk_core.seq, seq));
>
>     mono = ktime_add_ns(mono, nsecs);
>     do {
>         old = atomic64_read(&mg_floor);
>         if (floor >= mono)
>             break;

Apologies, that should be
         if (old >= mono)
             break;

thanks
-john
Jeff Layton Sept. 12, 2024, 8:18 p.m. UTC | #3
On Thu, 2024-09-12 at 13:11 -0700, John Stultz wrote:
> On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > The kernel test robot reported a performance hit in some will-it-scale
> > tests due to the multigrain timestamp patches.  My own testing showed
> > about a 7% drop in performance on the pipe1_threads test, and the data
> > showed that coarse_ctime() was slowing down current_time().
> 
> So, you provided some useful detail about why coarse_ctime() was slow
> in your reply earlier, but it would be good to preserve that in the
> commit message here.
> 
> 
> > Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> > two new public interfaces: The first fills a timespec64 with the later
> > of the coarse-grained clock and the floor time, and the second gets a
> > fine-grained time and tries to swap it into the floor and fills a
> > timespec64 with the result.
> > 
> > The first function returns an opaque cookie that is suitable for passing
> > to the second, which will use it as the "old" value in the cmpxchg.
> 
> The cookie usage isn't totally clear to me right off.  It feels a bit
> more subtle then I'd expect.
> 
> 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 5391e4167d60..bb039c9d525e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
> >         .base[1] = FAST_TK_INIT,
> >  };
> > 
> > +/*
> > + * This represents the latest fine-grained time that we have handed out as a
> > + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> > + * realtime clock on an as-needed basis.
> > + */
> > +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> > +
> 
> So I do really like this general approach of having an internal floor
> value combined with special coarse/fine grained assessors that work
> with the floor, so we're not impacting the normal hotpath logic
> (basically I was writing up a suggestion to this effect to the thread
> with Arnd when I realized you had follow up patch in my inbox).
> 
> 
> >  static inline void tk_normalize_xtime(struct timekeeper *tk)
> >  {
> >         while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> > @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> >  }
> >  EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
> > 
> > +/**
> > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> > + * @ts: timespec64 to be filled
> > + *
> > + * Adjust floor to realtime and compare it to the coarse time. Fill
> > + * @ts with the latest one. Returns opaque cookie suitable to pass
> > + * to ktime_get_real_ts64_mg.
> > + */
> > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> > +{
> > +       struct timekeeper *tk = &tk_core.timekeeper;
> > +       u64 floor = atomic64_read(&mg_floor);
> > +       ktime_t f_real, offset, coarse;
> > +       unsigned int seq;
> > +
> > +       WARN_ON(timekeeping_suspended);
> > +
> > +       do {
> > +               seq = read_seqcount_begin(&tk_core.seq);
> > +               *ts = tk_xtime(tk);
> > +               offset = *offsets[TK_OFFS_REAL];
> > +       } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > +       coarse = timespec64_to_ktime(*ts);
> > +       f_real = ktime_add(floor, offset);
> > +       if (ktime_after(f_real, coarse))
> > +               *ts = ktime_to_timespec64(f_real);
> > +       return floor;
> > +}
> > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> 
> Generally this looks ok to me.
> 
> 
> > +/**
> > + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> > + * @ts:                pointer to the timespec to be set
> > + * @cookie:    opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> > + *
> > + * Get a current monotonic fine-grained time value and attempt to swap
> > + * it into the floor using @cookie as the "old" value. @ts will be
> > + * filled with the resulting floor value, regardless of the outcome of
> > + * the swap.
> > + */
> 
> Again this cookie argument usage and the behavior of this function
> isn't very clear to me.
> 
> > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> > +{
> > +       struct timekeeper *tk = &tk_core.timekeeper;
> > +       ktime_t offset, mono, old = (ktime_t)cookie;
> > +       unsigned int seq;
> > +       u64 nsecs;
> > +
> > +       WARN_ON(timekeeping_suspended);
> > +
> > +       do {
> > +               seq = read_seqcount_begin(&tk_core.seq);
> > +
> > +               ts->tv_sec = tk->xtime_sec;
> > +               mono = tk->tkr_mono.base;
> > +               nsecs = timekeeping_get_ns(&tk->tkr_mono);
> > +               offset = *offsets[TK_OFFS_REAL];
> > +       } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > +       mono = ktime_add_ns(mono, nsecs);
> > +       if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> > +               ts->tv_nsec = 0;
> > +               timespec64_add_ns(ts, nsecs);
> > +       } else {
> > +               *ts = ktime_to_timespec64(ktime_add(old, offset));
> > +       }
> > +
> > +}
> > +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
> 
> 
> So initially I was expecting this to look something like (sorry for
> the whitespace damage here):
> {
>     do {
>         seq = read_seqcount_begin(&tk_core.seq);
>         ts->tv_sec = tk->xtime_sec;
>         mono = tk->tkr_mono.base;
>         nsecs = timekeeping_get_ns(&tk->tkr_mono);
>         offset = *offsets[TK_OFFS_REAL];
>     } while (read_seqcount_retry(&tk_core.seq, seq));
> 
>     mono = ktime_add_ns(mono, nsecs);
>     do {
>         old = atomic64_read(&mg_floor);
>         if (floor >= mono)
>             break;
>     } while(!atomic64_try_cmpxchg(&mg_floor, old, mono);
>     ts->tv_nsec = 0;
>     timespec64_add_ns(ts, nsecs);
> }
> 
> Where you read the tk data, atomically update the floor (assuming it's
> not in the future) and then return the finegrained value, not needing
> to manage a cookie value.
> 
> But instead, it seems like if something has happened since the cookie
> value was saved (another cpu getting a fine grained timestamp), your
> ktime_get_real_ts64_mg() will fall back to returning the same coarse
> grained time saved to the cookie, as if no time had past?
>
> It seems like that could cause problems:
> 
> cpu1                                     cpu2
> --------------------------------------------------------------------------
>                                          t2a = ktime_get_coarse_real_ts64_mg
> t1a = ktime_get_coarse_real_ts64_mg()
> t1b = ktime_get_real_ts64_mg(t1a)
> 
>                                          t2b = ktime_get_real_ts64_mg(t2a)
> 
> Where t2b will seem to be before t1b, even though it happened afterwards.
> 

Ahh no, the subtle thing about atomic64_try_cmpxchg is that it
overwrites "old" with the value that was currently there in the event
that the cmp fails.

So, the try_cmpxchg there will either swap the new value into place, or
if it was updated in the meantime, "old" will now refer to the value
that's currently in the floor word. Either is fine in this case, so we
don't need to retry anything.
John Stultz Sept. 12, 2024, 8:33 p.m. UTC | #4
On Thu, Sep 12, 2024 at 1:18 PM Jeff Layton <jlayton@kernel.org> wrote:
> On Thu, 2024-09-12 at 13:11 -0700, John Stultz wrote:
> > On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@kernel.org> wrote:
> > But instead, it seems like if something has happened since the cookie
> > value was saved (another cpu getting a fine grained timestamp), your
> > ktime_get_real_ts64_mg() will fall back to returning the same coarse
> > grained time saved to the cookie, as if no time had past?
> >
> > It seems like that could cause problems:
> >
> > cpu1                                     cpu2
> > --------------------------------------------------------------------------
> >                                          t2a = ktime_get_coarse_real_ts64_mg
> > t1a = ktime_get_coarse_real_ts64_mg()
> > t1b = ktime_get_real_ts64_mg(t1a)
> >
> >                                          t2b = ktime_get_real_ts64_mg(t2a)
> >
> > Where t2b will seem to be before t1b, even though it happened afterwards.
> >
>
> Ahh no, the subtle thing about atomic64_try_cmpxchg is that it
> overwrites "old" with the value that was currently there in the event
> that the cmp fails.

Ah, ok. Thank you for the explanation there!

> So, the try_cmpxchg there will either swap the new value into place, or
> if it was updated in the meantime, "old" will now refer to the value
> that's currently in the floor word. Either is fine in this case, so we
> don't need to retry anything.


Though if cpu2 then made another call to
ktime_get_coarse_real_ts64_mg(), the value returned there will be the
same as t1b? and would be before t2b?

thanks
-john
John Stultz Sept. 12, 2024, 8:35 p.m. UTC | #5
On Thu, Sep 12, 2024 at 1:33 PM John Stultz <jstultz@google.com> wrote:
>
> On Thu, Sep 12, 2024 at 1:18 PM Jeff Layton <jlayton@kernel.org> wrote:
> > On Thu, 2024-09-12 at 13:11 -0700, John Stultz wrote:
> > > On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > But instead, it seems like if something has happened since the cookie
> > > value was saved (another cpu getting a fine grained timestamp), your
> > > ktime_get_real_ts64_mg() will fall back to returning the same coarse
> > > grained time saved to the cookie, as if no time had past?
> > >
> > > It seems like that could cause problems:
> > >
> > > cpu1                                     cpu2
> > > --------------------------------------------------------------------------
> > >                                          t2a = ktime_get_coarse_real_ts64_mg
> > > t1a = ktime_get_coarse_real_ts64_mg()
> > > t1b = ktime_get_real_ts64_mg(t1a)
> > >
> > >                                          t2b = ktime_get_real_ts64_mg(t2a)
> > >
> > > Where t2b will seem to be before t1b, even though it happened afterwards.
> > >
> >
> > Ahh no, the subtle thing about atomic64_try_cmpxchg is that it
> > overwrites "old" with the value that was currently there in the event
> > that the cmp fails.
>
> Ah, ok. Thank you for the explanation there!
>
> > So, the try_cmpxchg there will either swap the new value into place, or
> > if it was updated in the meantime, "old" will now refer to the value
> > that's currently in the floor word. Either is fine in this case, so we
> > don't need to retry anything.
>
>
> Though if cpu2 then made another call to
> ktime_get_coarse_real_ts64_mg(), the value returned there will be the
> same as t1b? and would be before t2b?

Oh, no. Apologies again, as I see  t2b would be the same as t1b as well. Ok.
-john
Jan Kara Sept. 13, 2024, 11:26 a.m. UTC | #6
On Thu 12-09-24 14:02:52, Jeff Layton wrote:
> The kernel test robot reported a performance hit in some will-it-scale
> tests due to the multigrain timestamp patches.  My own testing showed
> about a 7% drop in performance on the pipe1_threads test, and the data
> showed that coarse_ctime() was slowing down current_time().
> 
> Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> two new public interfaces: The first fills a timespec64 with the later
> of the coarse-grained clock and the floor time, and the second gets a
> fine-grained time and tries to swap it into the floor and fills a
> timespec64 with the result.
> 
> The first function returns an opaque cookie that is suitable for passing
> to the second, which will use it as the "old" value in the cmpxchg.
> 
> With this patch on top of the multigrain series, the will-it-scale
> pipe1_threads microbenchmark shows these averages on my test rig:
> 
> 	v6.11-rc7:			103561295 (baseline)
> 	v6.11-rc7 + mgtime + this:	101357203 (~2% performance drop)
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202409091303.31b2b713-oliver.sang@intel.com
> Suggested-by: Arnd Bergmann <arnd@kernel.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

One question regarding the cookie handling as well :)

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5391e4167d60..bb039c9d525e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
>  	.base[1] = FAST_TK_INIT,
>  };
>  
> +/*
> + * This represents the latest fine-grained time that we have handed out as a
> + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> + * realtime clock on an as-needed basis.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +
>  static inline void tk_normalize_xtime(struct timekeeper *tk)
>  {
>  	while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
>  }
>  EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>  
> +/**
> + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Adjust floor to realtime and compare it to the coarse time. Fill
> + * @ts with the latest one. Returns opaque cookie suitable to pass
> + * to ktime_get_real_ts64_mg.
> + */
> +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> +	struct timekeeper *tk = &tk_core.timekeeper;
> +	u64 floor = atomic64_read(&mg_floor);
> +	ktime_t f_real, offset, coarse;
> +	unsigned int seq;
> +
> +	WARN_ON(timekeeping_suspended);
> +
> +	do {
> +		seq = read_seqcount_begin(&tk_core.seq);
> +		*ts = tk_xtime(tk);
> +		offset = *offsets[TK_OFFS_REAL];
> +	} while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +	coarse = timespec64_to_ktime(*ts);
> +	f_real = ktime_add(floor, offset);
> +	if (ktime_after(f_real, coarse))
> +		*ts = ktime_to_timespec64(f_real);
> +	return floor;
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> +
> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts:		pointer to the timespec to be set
> + * @cookie:	opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> + *
> + * Get a current monotonic fine-grained time value and attempt to swap
> + * it into the floor using @cookie as the "old" value. @ts will be
> + * filled with the resulting floor value, regardless of the outcome of
> + * the swap.
> + */
> +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> +{
> +	struct timekeeper *tk = &tk_core.timekeeper;
> +	ktime_t offset, mono, old = (ktime_t)cookie;
> +	unsigned int seq;
> +	u64 nsecs;

So what would be the difference if we did instead:

	old = atomic64_read(&mg_floor);

and not bother with the cookie? AFAIU this could result in somewhat more
updates to mg_floor (the contention on the mg_floor cacheline would be the
same but there would be more invalidates of the cacheline). OTOH these
updates can happen only if max(current_coarse_time, mg_floor) ==
inode->i_ctime which is presumably rare? What is your concern that I'm
missing?

								Honza	
> +
> +	WARN_ON(timekeeping_suspended);
> +
> +	do {
> +		seq = read_seqcount_begin(&tk_core.seq);
> +
> +		ts->tv_sec = tk->xtime_sec;
> +		mono = tk->tkr_mono.base;
> +		nsecs = timekeeping_get_ns(&tk->tkr_mono);
> +		offset = *offsets[TK_OFFS_REAL];
> +	} while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +	mono = ktime_add_ns(mono, nsecs);
> +	if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> +		ts->tv_nsec = 0;
> +		timespec64_add_ns(ts, nsecs);
> +	} else {
> +		*ts = ktime_to_timespec64(ktime_add(old, offset));
> +	}
> +
> +}
> +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
> +
Jeff Layton Sept. 13, 2024, 12:01 p.m. UTC | #7
On Fri, 2024-09-13 at 13:26 +0200, Jan Kara wrote:
> On Thu 12-09-24 14:02:52, Jeff Layton wrote:
> > The kernel test robot reported a performance hit in some will-it-scale
> > tests due to the multigrain timestamp patches.  My own testing showed
> > about a 7% drop in performance on the pipe1_threads test, and the data
> > showed that coarse_ctime() was slowing down current_time().
> > 
> > Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> > two new public interfaces: The first fills a timespec64 with the later
> > of the coarse-grained clock and the floor time, and the second gets a
> > fine-grained time and tries to swap it into the floor and fills a
> > timespec64 with the result.
> > 
> > The first function returns an opaque cookie that is suitable for passing
> > to the second, which will use it as the "old" value in the cmpxchg.
> > 
> > With this patch on top of the multigrain series, the will-it-scale
> > pipe1_threads microbenchmark shows these averages on my test rig:
> > 
> > 	v6.11-rc7:			103561295 (baseline)
> > 	v6.11-rc7 + mgtime + this:	101357203 (~2% performance drop)
> > 
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202409091303.31b2b713-oliver.sang@intel.com
> > Suggested-by: Arnd Bergmann <arnd@kernel.org>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> One question regarding the cookie handling as well :)
> 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 5391e4167d60..bb039c9d525e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
> >  	.base[1] = FAST_TK_INIT,
> >  };
> >  
> > +/*
> > + * This represents the latest fine-grained time that we have handed out as a
> > + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> > + * realtime clock on an as-needed basis.
> > + */
> > +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> > +
> >  static inline void tk_normalize_xtime(struct timekeeper *tk)
> >  {
> >  	while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> > @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> >  }
> >  EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
> >  
> > +/**
> > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> > + * @ts: timespec64 to be filled
> > + *
> > + * Adjust floor to realtime and compare it to the coarse time. Fill
> > + * @ts with the latest one. Returns opaque cookie suitable to pass
> > + * to ktime_get_real_ts64_mg.
> > + */
> > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> > +{
> > +	struct timekeeper *tk = &tk_core.timekeeper;
> > +	u64 floor = atomic64_read(&mg_floor);
> > +	ktime_t f_real, offset, coarse;
> > +	unsigned int seq;
> > +
> > +	WARN_ON(timekeeping_suspended);
> > +
> > +	do {
> > +		seq = read_seqcount_begin(&tk_core.seq);
> > +		*ts = tk_xtime(tk);
> > +		offset = *offsets[TK_OFFS_REAL];
> > +	} while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > +	coarse = timespec64_to_ktime(*ts);
> > +	f_real = ktime_add(floor, offset);
> > +	if (ktime_after(f_real, coarse))
> > +		*ts = ktime_to_timespec64(f_real);
> > +	return floor;
> > +}
> > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> > +
> > +/**
> > + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> > + * @ts:		pointer to the timespec to be set
> > + * @cookie:	opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> > + *
> > + * Get a current monotonic fine-grained time value and attempt to swap
> > + * it into the floor using @cookie as the "old" value. @ts will be
> > + * filled with the resulting floor value, regardless of the outcome of
> > + * the swap.
> > + */
> > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> > +{
> > +	struct timekeeper *tk = &tk_core.timekeeper;
> > +	ktime_t offset, mono, old = (ktime_t)cookie;
> > +	unsigned int seq;
> > +	u64 nsecs;
> 
> So what would be the difference if we did instead:
> 
> 	old = atomic64_read(&mg_floor);
> 
> and not bother with the cookie? AFAIU this could result in somewhat more
> updates to mg_floor (the contention on the mg_floor cacheline would be the
> same but there would be more invalidates of the cacheline). OTOH these
> updates can happen only if max(current_coarse_time, mg_floor) ==
> inode->i_ctime which is presumably rare? What is your concern that I'm
> missing?
> 

My main concern is the "somewhat more updates to mg_floor". mg_floor is
a global variable, so one of my main goals is to minimize the updates
to it. There is no correctness issue in doing what you're saying above
(AFAICT anyway), but the window of time between when we fetch the
current floor and try to do the swap will be smaller, and we'll end up
doing more swaps as a result.

Do you have any objection to adding the cookie to this API?

> 
> > +
> > +	WARN_ON(timekeeping_suspended);
> > +
> > +	do {
> > +		seq = read_seqcount_begin(&tk_core.seq);
> > +
> > +		ts->tv_sec = tk->xtime_sec;
> > +		mono = tk->tkr_mono.base;
> > +		nsecs = timekeeping_get_ns(&tk->tkr_mono);
> > +		offset = *offsets[TK_OFFS_REAL];
> > +	} while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > +	mono = ktime_add_ns(mono, nsecs);
> > +	if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> > +		ts->tv_nsec = 0;
> > +		timespec64_add_ns(ts, nsecs);
> > +	} else {
> > +		*ts = ktime_to_timespec64(ktime_add(old, offset));
> > +	}
> > +
> > +}
> > +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
> > +
John Stultz Sept. 13, 2024, 6:43 p.m. UTC | #8
On Fri, Sep 13, 2024 at 5:01 AM Jeff Layton <jlayton@kernel.org> wrote:
> On Fri, 2024-09-13 at 13:26 +0200, Jan Kara wrote:
> > So what would be the difference if we did instead:
> >
> >       old = atomic64_read(&mg_floor);
> >
> > and not bother with the cookie? AFAIU this could result in somewhat more
> > updates to mg_floor (the contention on the mg_floor cacheline would be the
> > same but there would be more invalidates of the cacheline). OTOH these
> > updates can happen only if max(current_coarse_time, mg_floor) ==
> > inode->i_ctime which is presumably rare? What is your concern that I'm
> > missing?
> >
>
> My main concern is the "somewhat more updates to mg_floor". mg_floor is
> a global variable, so one of my main goals is to minimize the updates
> to it. There is no correctness issue in doing what you're saying above
> (AFAICT anyway), but the window of time between when we fetch the
> current floor and try to do the swap will be smaller, and we'll end up
> doing more swaps as a result.

Would it be worth quantifying that cost?

> Do you have any objection to adding the cookie to this API?

My main concern is it is just a bit subtle. I found it hard to grok
(though I can be pretty dim sometimes, so maybe that doesn't count for
much :)
It seems if it were misused, the fine-grained accessor could
constantly return coarse grained results when called repeatedly with a
very stale cookie.

Further, the point about avoiding "too many" mg_floor writes is a
little fuzzy. It feels almost like folks would need to use the cookie
update as a tuning knob to balance the granularity of their timestamps
against the cost of the global mg_floor writes. So this probably needs
some clear comments to make it more obvious.

thanks
-john
Jeff Layton Sept. 13, 2024, 7:01 p.m. UTC | #9
On Fri, 2024-09-13 at 11:43 -0700, John Stultz wrote:
> On Fri, Sep 13, 2024 at 5:01 AM Jeff Layton <jlayton@kernel.org> wrote:
> > On Fri, 2024-09-13 at 13:26 +0200, Jan Kara wrote:
> > > So what would be the difference if we did instead:
> > > 
> > >       old = atomic64_read(&mg_floor);
> > > 
> > > and not bother with the cookie? AFAIU this could result in somewhat more
> > > updates to mg_floor (the contention on the mg_floor cacheline would be the
> > > same but there would be more invalidates of the cacheline). OTOH these
> > > updates can happen only if max(current_coarse_time, mg_floor) ==
> > > inode->i_ctime which is presumably rare? What is your concern that I'm
> > > missing?
> > > 
> > 
> > My main concern is the "somewhat more updates to mg_floor". mg_floor is
> > a global variable, so one of my main goals is to minimize the updates
> > to it. There is no correctness issue in doing what you're saying above
> > (AFAICT anyway), but the window of time between when we fetch the
> > current floor and try to do the swap will be smaller, and we'll end up
> > doing more swaps as a result.
> 
> Would it be worth quantifying that cost?
> 

There's a patch in the larger set that adds some percpu counters to
count events. One of them was successful floor value swaps. I dropped
that particular counter from the v7 set, but we could resurrect it.


> > Do you have any objection to adding the cookie to this API?
> 
> My main concern is it is just a bit subtle. I found it hard to grok
> (though I can be pretty dim sometimes, so maybe that doesn't count for
> much :)
> It seems if it were misused, the fine-grained accessor could
> constantly return coarse grained results when called repeatedly with a
> very stale cookie.
> 
> Further, the point about avoiding "too many" mg_floor writes is a
> little fuzzy. It feels almost like folks would need to use the cookie
> update as a tuning knob to balance the granularity of their timestamps
> against the cost of the global mg_floor writes. So this probably needs
> some clear comments to make it more obvious.
> 

Fair points. I don't have any hard numbers around it. I'm mainly just
trying to do what I can to keep the floor swaps to an absolute minimum.
This is a global value after all so we really are better off avoiding
cache invalidations.

That said, passing the cookie like this would only open the window a
small amount. I can certainly drop that part of the interface. In the
big scheme of things I doubt it'll make much difference in performance,
and if it does we can always bring it back.

If that sounds OK, I'll send a v8 (after some testing). I have some
comment updates I'd like to add as well.
Jan Kara Sept. 16, 2024, 10:41 a.m. UTC | #10
On Fri 13-09-24 08:01:28, Jeff Layton wrote:
> On Fri, 2024-09-13 at 13:26 +0200, Jan Kara wrote:
> > On Thu 12-09-24 14:02:52, Jeff Layton wrote:
> > > +/**
> > > + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> > > + * @ts:		pointer to the timespec to be set
> > > + * @cookie:	opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> > > + *
> > > + * Get a current monotonic fine-grained time value and attempt to swap
> > > + * it into the floor using @cookie as the "old" value. @ts will be
> > > + * filled with the resulting floor value, regardless of the outcome of
> > > + * the swap.
> > > + */
> > > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> > > +{
> > > +	struct timekeeper *tk = &tk_core.timekeeper;
> > > +	ktime_t offset, mono, old = (ktime_t)cookie;
> > > +	unsigned int seq;
> > > +	u64 nsecs;
> > 
> > So what would be the difference if we did instead:
> > 
> > 	old = atomic64_read(&mg_floor);
> > 
> > and not bother with the cookie? AFAIU this could result in somewhat more
> > updates to mg_floor (the contention on the mg_floor cacheline would be the
> > same but there would be more invalidates of the cacheline). OTOH these
> > updates can happen only if max(current_coarse_time, mg_floor) ==
> > inode->i_ctime which is presumably rare? What is your concern that I'm
> > missing?
> > 
> 
> My main concern is the "somewhat more updates to mg_floor". mg_floor is
> a global variable, so one of my main goals is to minimize the updates
> to it. There is no correctness issue in doing what you're saying above
> (AFAICT anyway), but the window of time between when we fetch the
> current floor and try to do the swap will be smaller, and we'll end up
> doing more swaps as a result.
> 
> Do you have any objection to adding the cookie to this API?

No objection as such but as John said, I had also some trouble
understanding what the cookie value is about and what are the constraints
in using it. So if we can live without cookie, it would be a simplification
of the API. If the cooking indeed brings noticeable performance benefit, we
just need to document that the cookie is about performance and how to use
it to get good performance.

								Honza
diff mbox series

Patch

diff --git a/fs/inode.c b/fs/inode.c
index f0fbfd470d8e..3c7e16935bac 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -65,13 +65,6 @@  static unsigned int i_hash_shift __ro_after_init;
 static struct hlist_head *inode_hashtable __ro_after_init;
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
 
-/*
- * This represents the latest fine-grained time that we have handed out as a
- * timestamp on the system. Tracked as a monotonic value, and converted to the
- * realtime clock on an as-needed basis.
- */
-static __cacheline_aligned_in_smp atomic64_t ctime_floor;
-
 /*
  * Empty aops. Can be used for the cases where the user does not
  * define any of the address_space operations.
@@ -2255,25 +2248,6 @@  int file_remove_privs(struct file *file)
 }
 EXPORT_SYMBOL(file_remove_privs);
 
-/**
- * coarse_ctime - return the current coarse-grained time
- * @floor: current (monotonic) ctime_floor value
- *
- * Get the coarse-grained time, and then determine whether to
- * return it or the current floor value. Returns the later of the
- * floor and coarse grained timestamps, converted to realtime
- * clock value.
- */
-static ktime_t coarse_ctime(ktime_t floor)
-{
-	ktime_t coarse = ktime_get_coarse();
-
-	/* If coarse time is already newer, return that */
-	if (!ktime_after(floor, coarse))
-		return ktime_get_coarse_real();
-	return ktime_mono_to_real(floor);
-}
-
 /**
  * current_time - Return FS time (possibly fine-grained)
  * @inode: inode.
@@ -2284,11 +2258,11 @@  static ktime_t coarse_ctime(ktime_t floor)
  */
 struct timespec64 current_time(struct inode *inode)
 {
-	ktime_t floor = atomic64_read(&ctime_floor);
-	ktime_t now = coarse_ctime(floor);
-	struct timespec64 now_ts = ktime_to_timespec64(now);
+	struct timespec64 now;
 	u32 cns;
 
+	ktime_get_coarse_real_ts64_mg(&now);
+
 	if (!is_mgtime(inode))
 		goto out;
 
@@ -2299,11 +2273,11 @@  struct timespec64 current_time(struct inode *inode)
 		 * If there is no apparent change, then
 		 * get a fine-grained timestamp.
 		 */
-		if (now_ts.tv_nsec == (cns & ~I_CTIME_QUERIED))
-			ktime_get_real_ts64(&now_ts);
+		if (now.tv_nsec == (cns & ~I_CTIME_QUERIED))
+			ktime_get_real_ts64(&now);
 	}
 out:
-	return timestamp_truncate(now_ts, inode);
+	return timestamp_truncate(now, inode);
 }
 EXPORT_SYMBOL(current_time);
 
@@ -2745,7 +2719,7 @@  EXPORT_SYMBOL(timestamp_truncate);
  *
  * Set the inode's ctime to the current value for the inode. Returns the
  * current value that was assigned. If this is not a multigrain inode, then we
- * just set it to whatever the coarse_ctime is.
+ * set it to the later of the coarse time and floor value.
  *
  * If it is multigrain, then we first see if the coarse-grained timestamp is
  * distinct from what we have. If so, then we'll just use that. If we have to
@@ -2756,16 +2730,16 @@  EXPORT_SYMBOL(timestamp_truncate);
  */
 struct timespec64 inode_set_ctime_current(struct inode *inode)
 {
-	ktime_t now, floor = atomic64_read(&ctime_floor);
-	struct timespec64 now_ts;
+	struct timespec64 now;
 	u32 cns, cur;
+	u64 cookie;
 
-	now = coarse_ctime(floor);
+	cookie = ktime_get_coarse_real_ts64_mg(&now);
 
 	/* Just return that if this is not a multigrain fs */
 	if (!is_mgtime(inode)) {
-		now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
-		inode_set_ctime_to_ts(inode, now_ts);
+		now = timestamp_truncate(now, inode);
+		inode_set_ctime_to_ts(inode, now);
 		goto out;
 	}
 
@@ -2776,44 +2750,27 @@  struct timespec64 inode_set_ctime_current(struct inode *inode)
 	 */
 	cns = smp_load_acquire(&inode->i_ctime_nsec);
 	if (cns & I_CTIME_QUERIED) {
-		ktime_t ctime = ktime_set(inode->i_ctime_sec, cns & ~I_CTIME_QUERIED);
-
-		if (!ktime_after(now, ctime)) {
-			ktime_t old, fine;
+		struct timespec64 ctime = { .tv_sec = inode->i_ctime_sec,
+					    .tv_nsec = cns & ~I_CTIME_QUERIED };
 
-			/* Get a fine-grained time */
-			fine = ktime_get();
-			mgtime_counter_inc(mg_fine_stamps);
-
-			/*
-			 * If the cmpxchg works, we take the new floor value. If
-			 * not, then that means that someone else changed it after we
-			 * fetched it but before we got here. That value is just
-			 * as good, so keep it.
-			 */
-			old = floor;
-			if (atomic64_try_cmpxchg(&ctime_floor, &old, fine))
-				mgtime_counter_inc(mg_floor_swaps);
-			else
-				fine = old;
-			now = ktime_mono_to_real(fine);
-		}
+		if (timespec64_compare(&now, &ctime) <= 0)
+			ktime_get_real_ts64_mg(&now, cookie);
 	}
 	mgtime_counter_inc(mg_ctime_updates);
-	now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
-	cur = cns;
+	now = timestamp_truncate(now, inode);
 
 	/* No need to cmpxchg if it's exactly the same */
-	if (cns == now_ts.tv_nsec && inode->i_ctime_sec == now_ts.tv_sec) {
-		trace_ctime_xchg_skip(inode, &now_ts);
+	if (cns == now.tv_nsec && inode->i_ctime_sec == now.tv_sec) {
+		trace_ctime_xchg_skip(inode, &now);
 		goto out;
 	}
+	cur = cns;
 retry:
 	/* Try to swap the nsec value into place. */
-	if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now_ts.tv_nsec)) {
+	if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now.tv_nsec)) {
 		/* If swap occurred, then we're (mostly) done */
-		inode->i_ctime_sec = now_ts.tv_sec;
-		trace_ctime_ns_xchg(inode, cns, now_ts.tv_nsec, cur);
+		inode->i_ctime_sec = now.tv_sec;
+		trace_ctime_ns_xchg(inode, cns, now.tv_nsec, cur);
 		mgtime_counter_inc(mg_ctime_swaps);
 	} else {
 		/*
@@ -2827,11 +2784,11 @@  struct timespec64 inode_set_ctime_current(struct inode *inode)
 			goto retry;
 		}
 		/* Otherwise, keep the existing ctime */
-		now_ts.tv_sec = inode->i_ctime_sec;
-		now_ts.tv_nsec = cur & ~I_CTIME_QUERIED;
+		now.tv_sec = inode->i_ctime_sec;
+		now.tv_nsec = cur & ~I_CTIME_QUERIED;
 	}
 out:
-	return now_ts;
+	return now;
 }
 EXPORT_SYMBOL(inode_set_ctime_current);
 
@@ -2854,8 +2811,7 @@  EXPORT_SYMBOL(inode_set_ctime_current);
  */
 struct timespec64 inode_set_ctime_deleg(struct inode *inode, struct timespec64 update)
 {
-	ktime_t now, floor = atomic64_read(&ctime_floor);
-	struct timespec64 now_ts, cur_ts;
+	struct timespec64 now, cur_ts;
 	u32 cur, old;
 
 	/* pairs with try_cmpxchg below */
@@ -2867,12 +2823,11 @@  struct timespec64 inode_set_ctime_deleg(struct inode *inode, struct timespec64 u
 	if (timespec64_compare(&update, &cur_ts) <= 0)
 		return cur_ts;
 
-	now = coarse_ctime(floor);
-	now_ts = ktime_to_timespec64(now);
+	ktime_get_coarse_real_ts64_mg(&now);
 
 	/* Clamp the update to "now" if it's in the future */
-	if (timespec64_compare(&update, &now_ts) > 0)
-		update = now_ts;
+	if (timespec64_compare(&update, &now) > 0)
+		update = now;
 
 	update = timestamp_truncate(update, inode);
 
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index fc12a9ba2c88..cf2293158c65 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -45,6 +45,10 @@  extern void ktime_get_real_ts64(struct timespec64 *tv);
 extern void ktime_get_coarse_ts64(struct timespec64 *ts);
 extern void ktime_get_coarse_real_ts64(struct timespec64 *ts);
 
+/* Multigrain timestamp interfaces */
+extern u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts);
+extern void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie);
+
 void getboottime64(struct timespec64 *ts);
 
 /*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5391e4167d60..bb039c9d525e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -114,6 +114,13 @@  static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
 	.base[1] = FAST_TK_INIT,
 };
 
+/*
+ * This represents the latest fine-grained time that we have handed out as a
+ * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
+ * realtime clock on an as-needed basis.
+ */
+static __cacheline_aligned_in_smp atomic64_t mg_floor;
+
 static inline void tk_normalize_xtime(struct timekeeper *tk)
 {
 	while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
@@ -2394,6 +2401,76 @@  void ktime_get_coarse_real_ts64(struct timespec64 *ts)
 }
 EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
 
+/**
+ * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
+ * @ts: timespec64 to be filled
+ *
+ * Adjust floor to realtime and compare it to the coarse time. Fill
+ * @ts with the latest one. Returns opaque cookie suitable to pass
+ * to ktime_get_real_ts64_mg.
+ */
+u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	u64 floor = atomic64_read(&mg_floor);
+	ktime_t f_real, offset, coarse;
+	unsigned int seq;
+
+	WARN_ON(timekeeping_suspended);
+
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		*ts = tk_xtime(tk);
+		offset = *offsets[TK_OFFS_REAL];
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+
+	coarse = timespec64_to_ktime(*ts);
+	f_real = ktime_add(floor, offset);
+	if (ktime_after(f_real, coarse))
+		*ts = ktime_to_timespec64(f_real);
+	return floor;
+}
+EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
+
+/**
+ * ktime_get_real_ts64_mg - attempt to update floor value and return result
+ * @ts:		pointer to the timespec to be set
+ * @cookie:	opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
+ *
+ * Get a current monotonic fine-grained time value and attempt to swap
+ * it into the floor using @cookie as the "old" value. @ts will be
+ * filled with the resulting floor value, regardless of the outcome of
+ * the swap.
+ */
+void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	ktime_t offset, mono, old = (ktime_t)cookie;
+	unsigned int seq;
+	u64 nsecs;
+
+	WARN_ON(timekeeping_suspended);
+
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+
+		ts->tv_sec = tk->xtime_sec;
+		mono = tk->tkr_mono.base;
+		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		offset = *offsets[TK_OFFS_REAL];
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+
+	mono = ktime_add_ns(mono, nsecs);
+	if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
+		ts->tv_nsec = 0;
+		timespec64_add_ns(ts, nsecs);
+	} else {
+		*ts = ktime_to_timespec64(ktime_add(old, offset));
+	}
+
+}
+EXPORT_SYMBOL(ktime_get_real_ts64_mg);
+
 void ktime_get_coarse_ts64(struct timespec64 *ts)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;