Message ID | 20241001-mgtime-v8-1-903343d91bc3@kernel.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | fs: multigrain timestamp redux | expand |
On Tue, Oct 1, 2024 at 3:59 AM Jeff Layton <jlayton@kernel.org> wrote: > > Multigrain timestamps allow the kernel to use fine-grained timestamps > when an inode's attributes is being actively observed via ->getattr(). > With this support, it's possible for a file to get a fine-grained > timestamp, and another modified after it to get a coarse-grained stamp > that is earlier than the fine-grained time. If this happens then the > files can appear to have been modified in reverse order, which breaks > VFS ordering guarantees. > > To prevent this, maintain a floor value for multigrain timestamps. > Whenever a fine-grained timestamp is handed out, record it, and when > coarse-grained stamps are handed out, ensure they are not earlier than > that value. If the coarse-grained timestamp is earlier than the > fine-grained floor, return the floor value instead. > > Add a static singleton atomic64_t into timekeeper.c that we can use to > keep track of the latest fine-grained time ever handed out. This is > tracked as a monotonic ktime_t value to ensure that it isn't affected by > clock jumps. Because it is updated at different times than the rest of > the timekeeper object, the floor value is managed independently of the > timekeeper via a cmpxchg() operation, and sits on its own cacheline. > > This patch also adds two new public interfaces: > > - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the > coarse-grained clock and the floor time > > - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries > to swap it into the floor. A timespec64 is filled with the result. > > Since the floor is global, take care to avoid updating it unless it's > absolutely necessary. If we do the cmpxchg and find that the value has > been updated since we fetched it, then we discard the fine-grained time > that was fetched in favor of the recent update. > > Note that the VFS ordering guarantees assume that the realtime clock > does not experience a backward jump. POSIX requires that we stamp files > using realtime clock values, so if a backward clock jump occurs, then > files can appear to have been modified in reverse order. > > Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits > Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: John Stultz <jstultz@google.com>
On Tue, Oct 01 2024 at 06:58, Jeff Layton wrote: V8 ? I remember that I reviewed v* already :) Also the sentence after the susbsystem prefix starts with an uppercase letter. > Multigrain timestamps allow the kernel to use fine-grained timestamps > when an inode's attributes is being actively observed via ->getattr(). > With this support, it's possible for a file to get a fine-grained > timestamp, and another modified after it to get a coarse-grained stamp > that is earlier than the fine-grained time. If this happens then the > files can appear to have been modified in reverse order, which breaks > VFS ordering guarantees. > > To prevent this, maintain a floor value for multigrain timestamps. > Whenever a fine-grained timestamp is handed out, record it, and when > coarse-grained stamps are handed out, ensure they are not earlier than > that value. If the coarse-grained timestamp is earlier than the > fine-grained floor, return the floor value instead. > > Add a static singleton atomic64_t into timekeeper.c that we can use to s/we can use/is used/ > keep track of the latest fine-grained time ever handed out. This is > tracked as a monotonic ktime_t value to ensure that it isn't affected by > clock jumps. Because it is updated at different times than the rest of > the timekeeper object, the floor value is managed independently of the > timekeeper via a cmpxchg() operation, and sits on its own cacheline. > > This patch also adds two new public interfaces: git grep 'This patch' Docuementation/process/ > - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the > coarse-grained clock and the floor time > > - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries > to swap it into the floor. A timespec64 is filled with the result. > > Since the floor is global, take care to avoid updating it unless it's > absolutely necessary. If we do the cmpxchg and find that the value has We do nothing. Please describe your changes in passive voice and do not impersonate code. > been updated since we fetched it, then we discard the fine-grained time > that was fetched in favor of the recent update. This still is confusing. Something like this: The floor value is global and updated via a single try_cmpxchg(). If that fails then the operation raced with a concurrent update. It does not matter whether the new value is later than the value, which the failed cmpxchg() tried to write, or not. Add sensible technical explanation. > +/* > + * Multigrain timestamps require that we keep track of the latest fine-grained > + * timestamp that has been issued, and never return a coarse-grained timestamp > + * that is earlier than that value. > + * > + * mg_floor represents the latest fine-grained time that we have handed out as > + * a timestamp on the system. Tracked as a monotonic ktime_t, and converted to > + * the realtime clock on an as-needed basis. > + * > + * This ensures that we never issue a timestamp earlier than one that has > + * already been issued, as long as the realtime clock never experiences a > + * backward clock jump. If the realtime clock is set to an earlier time, then > + * realtime timestamps can appear to go backward. > + */ > +static __cacheline_aligned_in_smp atomic64_t mg_floor; > + > static inline void tk_normalize_xtime(struct timekeeper *tk) > { > while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) { > @@ -2394,6 +2410,86 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts) > } > EXPORT_SYMBOL(ktime_get_coarse_real_ts64); > > +/** > + * ktime_get_coarse_real_ts64_mg - return latter of coarse grained time or floor > + * @ts: timespec64 to be filled > + * > + * Fetch the global mg_floor value, convert it to realtime and > + * compare it to the current coarse-grained time. Fill @ts with > + * whichever is latest. Note that this is a filesystem-specific > + * interface and should be avoided outside of that context. > + */ > +void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts) > +{ > + struct timekeeper *tk = &tk_core.timekeeper; > + u64 floor = atomic64_read(&mg_floor); > + ktime_t f_real, offset, coarse; > + unsigned int seq; > + > + do { > + seq = read_seqcount_begin(&tk_core.seq); > + *ts = tk_xtime(tk); > + offset = tk_core.timekeeper.offs_real; > + } while (read_seqcount_retry(&tk_core.seq, seq)); > + > + coarse = timespec64_to_ktime(*ts); > + f_real = ktime_add(floor, offset); > + if (ktime_after(f_real, coarse)) > + *ts = ktime_to_timespec64(f_real); > +} > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg); > + > +/** > + * ktime_get_real_ts64_mg - attempt to update floor value and return result > + * @ts: pointer to the timespec to be set > + * > + * Get a monotonic fine-grained time value and attempt to swap it into the > + * floor. If it succeeds then accept the new floor value. If it fails > + * then another task raced in during the interim time and updated the floor. > + * That value is just as valid, so accept that value in this case. Why? 'just as valid' does not tell me anything. > + * @ts will be filled with the resulting floor value, regardless of > + * the outcome of the swap. Note that this is a filesystem specific interface > + * and should be avoided outside of that context. > + */ > +void ktime_get_real_ts64_mg(struct timespec64 *ts) > +{ > + struct timekeeper *tk = &tk_core.timekeeper; > + ktime_t old = atomic64_read(&mg_floor); > + ktime_t offset, mono; > + unsigned int seq; > + u64 nsecs; > + > + do { > + seq = read_seqcount_begin(&tk_core.seq); > + > + ts->tv_sec = tk->xtime_sec; > + mono = tk->tkr_mono.base; > + nsecs = timekeeping_get_ns(&tk->tkr_mono); > + offset = tk_core.timekeeper.offs_real; > + } while (read_seqcount_retry(&tk_core.seq, seq)); > + > + mono = ktime_add_ns(mono, nsecs); > + > + /* > + * Attempt to update the floor with the new time value. Accept the > + * resulting floor value regardless of the outcome of the swap. > + */ > + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) { > + ts->tv_nsec = 0; > + timespec64_add_ns(ts, nsecs); > + } else { > + /* > + * Something has changed mg_floor since "old" was I complained about 'something has changed ...' before. Can we please have proper technical explanations? > + * fetched. "old" has now been updated with the > + * current value of mg_floor, so use that to return > + * the current coarse floor value. This still does not explain why the new floor value is valid even when it is before the value in @mono. Thanks, tglx
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index fc12a9ba2c88..7aa85246c183 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -45,6 +45,10 @@ extern void ktime_get_real_ts64(struct timespec64 *tv); extern void ktime_get_coarse_ts64(struct timespec64 *ts); extern void ktime_get_coarse_real_ts64(struct timespec64 *ts); +/* Multigrain timestamp interfaces */ +extern void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts); +extern void ktime_get_real_ts64_mg(struct timespec64 *ts); + void getboottime64(struct timespec64 *ts); /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 7e6f409bf311..37004a4758cf 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -114,6 +114,22 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = { .base[1] = FAST_TK_INIT, }; +/* + * Multigrain timestamps require that we keep track of the latest fine-grained + * timestamp that has been issued, and never return a coarse-grained timestamp + * that is earlier than that value. + * + * mg_floor represents the latest fine-grained time that we have handed out as + * a timestamp on the system. Tracked as a monotonic ktime_t, and converted to + * the realtime clock on an as-needed basis. + * + * This ensures that we never issue a timestamp earlier than one that has + * already been issued, as long as the realtime clock never experiences a + * backward clock jump. If the realtime clock is set to an earlier time, then + * realtime timestamps can appear to go backward. + */ +static __cacheline_aligned_in_smp atomic64_t mg_floor; + static inline void tk_normalize_xtime(struct timekeeper *tk) { while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) { @@ -2394,6 +2410,86 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts) } EXPORT_SYMBOL(ktime_get_coarse_real_ts64); +/** + * ktime_get_coarse_real_ts64_mg - return latter of coarse grained time or floor + * @ts: timespec64 to be filled + * + * Fetch the global mg_floor value, convert it to realtime and + * compare it to the current coarse-grained time. Fill @ts with + * whichever is latest. Note that this is a filesystem-specific + * interface and should be avoided outside of that context. + */ +void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts) +{ + struct timekeeper *tk = &tk_core.timekeeper; + u64 floor = atomic64_read(&mg_floor); + ktime_t f_real, offset, coarse; + unsigned int seq; + + do { + seq = read_seqcount_begin(&tk_core.seq); + *ts = tk_xtime(tk); + offset = tk_core.timekeeper.offs_real; + } while (read_seqcount_retry(&tk_core.seq, seq)); + + coarse = timespec64_to_ktime(*ts); + f_real = ktime_add(floor, offset); + if (ktime_after(f_real, coarse)) + *ts = ktime_to_timespec64(f_real); +} +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg); + +/** + * ktime_get_real_ts64_mg - attempt to update floor value and return result + * @ts: pointer to the timespec to be set + * + * Get a monotonic fine-grained time value and attempt to swap it into the + * floor. If it succeeds then accept the new floor value. If it fails + * then another task raced in during the interim time and updated the floor. + * That value is just as valid, so accept that value in this case. + * + * @ts will be filled with the resulting floor value, regardless of + * the outcome of the swap. Note that this is a filesystem specific interface + * and should be avoided outside of that context. + */ +void ktime_get_real_ts64_mg(struct timespec64 *ts) +{ + struct timekeeper *tk = &tk_core.timekeeper; + ktime_t old = atomic64_read(&mg_floor); + ktime_t offset, mono; + unsigned int seq; + u64 nsecs; + + do { + seq = read_seqcount_begin(&tk_core.seq); + + ts->tv_sec = tk->xtime_sec; + mono = tk->tkr_mono.base; + nsecs = timekeeping_get_ns(&tk->tkr_mono); + offset = tk_core.timekeeper.offs_real; + } while (read_seqcount_retry(&tk_core.seq, seq)); + + mono = ktime_add_ns(mono, nsecs); + + /* + * Attempt to update the floor with the new time value. Accept the + * resulting floor value regardless of the outcome of the swap. + */ + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) { + ts->tv_nsec = 0; + timespec64_add_ns(ts, nsecs); + } else { + /* + * Something has changed mg_floor since "old" was + * fetched. "old" has now been updated with the + * current value of mg_floor, so use that to return + * the current coarse floor value. + */ + *ts = ktime_to_timespec64(ktime_add(old, offset)); + } +} +EXPORT_SYMBOL_GPL(ktime_get_real_ts64_mg); + void ktime_get_coarse_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper;