diff mbox

[RFC,1/2] timekeeper: change interface of clocksource reding functions

Message ID 1498647301-130851-2-git-send-email-dplotnikov@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Denis Plotnikov June 28, 2017, 10:55 a.m. UTC
When using timekeepeing API in some cases it is useful to return
timestamp value used along with the time calculated to use that
timestamp value for other purpuses (e.g. in KVM master clock timestamp)

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/hyperv/hv_init.c           |  4 ++--
 arch/x86/include/asm/pvclock.h      |  2 +-
 arch/x86/kernel/hpet.c              |  4 ++--
 arch/x86/kernel/kvmclock.c          | 19 +++++++++------
 arch/x86/kernel/pvclock.c           | 11 ++++++---
 arch/x86/kernel/tsc.c               |  7 ++++--
 arch/x86/lguest/boot.c              |  2 +-
 arch/x86/platform/uv/uv_time.c      | 10 ++++----
 arch/x86/xen/time.c                 | 21 ++++++++++-------
 arch/x86/xen/xen-ops.h              |  2 +-
 drivers/clocksource/acpi_pm.c       | 12 +++++-----
 drivers/hv/hv_util.c                |  6 ++---
 include/linux/clocksource.h         |  9 ++++++--
 include/linux/timekeeper_internal.h |  2 +-
 include/linux/timekeeping.h         | 17 ++++++++++----
 kernel/time/clocksource.c           |  4 ++--
 kernel/time/jiffies.c               |  2 +-
 kernel/time/timekeeping.c           | 46 ++++++++++++++++++-------------------
 18 files changed, 106 insertions(+), 74 deletions(-)

Comments

Radim Krčmář July 10, 2017, 1 p.m. UTC | #1
2017-06-28 13:55+0300, Denis Plotnikov:
> When using timekeepeing API in some cases it is useful to return
> timestamp value used along with the time calculated to use that
> timestamp value for other purpuses (e.g. in KVM master clock timestamp)

Makes sense.  What I don't like about this interface is the TSC centric
approach in a code that isn't even x86 specific -- other architectures
might have a similar counter they'd like to use.

Could we get it a bit more generic?

At least returning the type of the clock and its counter value.
(kvmclock is a bit problematic for the generic solution, because it is
 TSC based, but should pass through the kvmclock value if we were going
 to make it clean ...)

---
Actually, we might be overengineering it.  With the master clock we have
now, I'm thinking that the gtod is not crucial and we could as well be
using something like:

  static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
  {
  	local_irq_save(flags);
  	*cycle_now = rdtsc();
  	*kernel_ns = ktime_get_boot_ns();
  	local_irq_restore(flags);

  	return todo_tsc_hardware_is_good_enough();
  }

because kvmclock is shifted when master clock is recomputed or disabled,
so it should never happen. (And if it happens, then this bit of
imprecision doesn't matter much.)
The host doesn't need to use TSC either, because master clock uses TSC
frequency that differs from boot_ns -- we only need to be sure that the
TSC hardware doesn't change its frequency.

The small delta introduced with this (rdtsc <-> ktime_get_boot_ns) is
the same that the guest would see if we didn't enable master clock.
kvmclock updates are then normally using the same cycle_now/kernel_ns
pair to remain stable.
Denis Plotnikov July 21, 2017, 2 p.m. UTC | #2
On 10.07.2017 16:00, Radim Krčmář wrote:
> 2017-06-28 13:55+0300, Denis Plotnikov:
>> When using timekeepeing API in some cases it is useful to return
>> timestamp value used along with the time calculated to use that
>> timestamp value for other purpuses (e.g. in KVM master clock timestamp)
> 
> Makes sense.  What I don't like about this interface is the TSC centric
> approach in a code that isn't even x86 specific -- other architectures
> might have a similar counter they'd like to use.
> 
> Could we get it a bit more generic?
Yes, sure
> 
> At least returning the type of the clock and its counter value.
> (kvmclock is a bit problematic for the generic solution, because it is
>   TSC based, but should pass through the kvmclock value if we were going
>   to make it clean ...)
> 
> ---
> Actually, we might be overengineering it.  With the master clock we have
> now, I'm thinking that the gtod is not crucial and we could as well be
> using something like:
> 
>    static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
>    {
>    	local_irq_save(flags);
>    	*cycle_now = rdtsc();
>    	*kernel_ns = ktime_get_boot_ns();
>    	local_irq_restore(flags);
> 
>    	return todo_tsc_hardware_is_good_enough();
>    }
> 
> because kvmclock is shifted when master clock is recomputed or disabled,
> so it should never happen. (And if it happens, then this bit of
> imprecision doesn't matter much.)
> The host doesn't need to use TSC either, because master clock uses TSC
> frequency that differs from boot_ns -- we only need to be sure that the
> TSC hardware doesn't change its frequency.
> 
> The small delta introduced with this (rdtsc <-> ktime_get_boot_ns) is
> the same that the guest would see if we didn't enable master clock.
> kvmclock updates are then normally using the same cycle_now/kernel_ns
> pair to remain stable.
> 
I thought about that (and actually tried that first). The concern is 
about L2 guests running over L1 KVM where the margin between cycle_now 
and kernel_ns is unpredictable because L1 KVM can be scheduled at any 
time for relatively long period.
That's why I decided to rework timekeeper and get kernel_time
and corresponding cycle value from there
diff mbox

Patch

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 5b882cc..5cd0db2 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -36,7 +36,7 @@  struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 	return tsc_pg;
 }
 
-static u64 read_hv_clock_tsc(struct clocksource *arg)
+static u64 read_hv_clock_tsc(struct clocksource *arg, u64 *tsc_stamp)
 {
 	u64 current_tick = hv_read_tsc_page(tsc_pg);
 
@@ -55,7 +55,7 @@  static struct clocksource hyperv_cs_tsc = {
 };
 #endif
 
-static u64 read_hv_clock_msr(struct clocksource *arg)
+static u64 read_hv_clock_msr(struct clocksource *arg, u64 *tsc_stamp)
 {
 	u64 current_tick;
 	/*
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 448cfe1..c3d049b 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -14,7 +14,7 @@  static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
 #endif
 
 /* some helper functions for xen and kvm pv clock sources */
-u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *tsc_stamp);
 u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
 void pvclock_set_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 89ff7af..b77d3fd 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -792,7 +792,7 @@  static union hpet_lock hpet __cacheline_aligned = {
 	{ .lock = __ARCH_SPIN_LOCK_UNLOCKED, },
 };
 
-static u64 read_hpet(struct clocksource *cs)
+static u64 read_hpet(struct clocksource *cs, u64 *tsc_stamp)
 {
 	unsigned long flags;
 	union hpet_lock old, new;
@@ -850,7 +850,7 @@  static u64 read_hpet(struct clocksource *cs)
 /*
  * For UP or 32-bit.
  */
-static u64 read_hpet(struct clocksource *cs)
+static u64 read_hpet(struct clocksource *cs, u64 *tsc_stamp)
 {
 	return (u64)hpet_readl(HPET_COUNTER);
 }
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d889676..736c7fd 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -82,7 +82,7 @@  static int kvm_set_wallclock(const struct timespec *now)
 	return -1;
 }
 
-static u64 kvm_clock_read(void)
+static u64 kvm_clock_read(u64 *tsc_stamp)
 {
 	struct pvclock_vcpu_time_info *src;
 	u64 ret;
@@ -91,30 +91,35 @@  static u64 kvm_clock_read(void)
 	preempt_disable_notrace();
 	cpu = smp_processor_id();
 	src = &hv_clock[cpu].pvti;
-	ret = pvclock_clocksource_read(src);
+	ret = pvclock_clocksource_read(src, tsc_stamp);
 	preempt_enable_notrace();
 	return ret;
 }
 
-static u64 kvm_clock_get_cycles(struct clocksource *cs)
+static u64 kvm_clock_get_cycles(struct clocksource *cs, u64 *tsc_stamp)
 {
-	return kvm_clock_read();
+	return kvm_clock_read(tsc_stamp);
+}
+
+static u64 kvm_clock_read_single(void)
+{
+	return kvm_clock_read(NULL);
 }
 
 static u64 kvm_sched_clock_read(void)
 {
-	return kvm_clock_read() - kvm_sched_clock_offset;
+	return kvm_clock_read_single() - kvm_sched_clock_offset;
 }
 
 static inline void kvm_sched_clock_init(bool stable)
 {
 	if (!stable) {
-		pv_time_ops.sched_clock = kvm_clock_read;
+		pv_time_ops.sched_clock = kvm_clock_read_single;
 		clear_sched_clock_stable();
 		return;
 	}
 
-	kvm_sched_clock_offset = kvm_clock_read();
+	kvm_sched_clock_offset = kvm_clock_read_single();
 	pv_time_ops.sched_clock = kvm_sched_clock_read;
 
 	printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 5c3f6d6..d5fe8c3 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -72,17 +72,22 @@  u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
 
 	return flags & valid_flags;
 }
+EXPORT_SYMBOL(pvclock_read_flags);
 
-u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *tsc_stamp)
 {
 	unsigned version;
 	u64 ret;
 	u64 last;
 	u8 flags;
+	u64 tsc;
 
 	do {
 		version = pvclock_read_begin(src);
-		ret = __pvclock_read_cycles(src, rdtsc_ordered());
+		tsc = rdtsc_ordered();
+		if (tsc_stamp)
+			*tsc_stamp = tsc;
+		ret = __pvclock_read_cycles(src, tsc);
 		flags = src->flags;
 	} while (pvclock_read_retry(src, version));
 
@@ -136,7 +141,7 @@  void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 		rmb();		/* fetch time before checking version */
 	} while ((wall_clock->version & 1) || (version != wall_clock->version));
 
-	delta = pvclock_clocksource_read(vcpu_time);	/* time since system boot */
+	delta = pvclock_clocksource_read(vcpu_time, NULL);	/* time since system boot */
 	delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
 
 	now.tv_nsec = do_div(delta, NSEC_PER_SEC);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 714dfba..fee4d7b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1110,9 +1110,12 @@  static void tsc_resume(struct clocksource *cs)
  * checking the result of read_tsc() - cycle_last for being negative.
  * That works because CLOCKSOURCE_MASK(64) does not mask out any bit.
  */
-static u64 read_tsc(struct clocksource *cs)
+static u64 read_tsc(struct clocksource *cs, u64 *tsc_stamp)
 {
-	return (u64)rdtsc_ordered();
+	u64 tsc = (u64)rdtsc_ordered();
+	if (tsc_stamp)
+		*tsc_stamp = tsc;
+	return tsc;
 }
 
 static void tsc_cs_mark_unstable(struct clocksource *cs)
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 9947269..731423f 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -916,7 +916,7 @@  static unsigned long lguest_tsc_khz(void)
  * If we can't use the TSC, the kernel falls back to our lower-priority
  * "lguest_clock", where we read the time value given to us by the Host.
  */
-static u64 lguest_clock_read(struct clocksource *cs)
+static u64 lguest_clock_read(struct clocksource *cs, u64 *tsc_stamp)
 {
 	unsigned long sec, nsec;
 
diff --git a/arch/x86/platform/uv/uv_time.c b/arch/x86/platform/uv/uv_time.c
index b082d71..24a4b61 100644
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -30,7 +30,7 @@ 
 
 #define RTC_NAME		"sgi_rtc"
 
-static u64 uv_read_rtc(struct clocksource *cs);
+static u64 uv_read_rtc(struct clocksource *cs, u64 *tsc_stamp);
 static int uv_rtc_next_event(unsigned long, struct clock_event_device *);
 static int uv_rtc_shutdown(struct clock_event_device *evt);
 
@@ -133,7 +133,7 @@  static int uv_setup_intr(int cpu, u64 expires)
 	/* Initialize comparator value */
 	uv_write_global_mmr64(pnode, UVH_INT_CMPB, expires);
 
-	if (uv_read_rtc(NULL) <= expires)
+	if (uv_read_rtc(NULL, NULL) <= expires)
 		return 0;
 
 	return !uv_intr_pending(pnode);
@@ -269,7 +269,7 @@  static int uv_rtc_unset_timer(int cpu, int force)
 
 	spin_lock_irqsave(&head->lock, flags);
 
-	if ((head->next_cpu == bcpu && uv_read_rtc(NULL) >= *t) || force)
+	if ((head->next_cpu == bcpu && uv_read_rtc(NULL, NULL) >= *t) || force)
 		rc = 1;
 
 	if (rc) {
@@ -296,7 +296,7 @@  static int uv_rtc_unset_timer(int cpu, int force)
  * cachelines of it's own page.  This allows faster simultaneous reads
  * from a given socket.
  */
-static u64 uv_read_rtc(struct clocksource *cs)
+static u64 uv_read_rtc(struct clocksource *cs, u64 *tsc_stamp)
 {
 	unsigned long offset;
 
@@ -316,7 +316,7 @@  static int uv_rtc_next_event(unsigned long delta,
 {
 	int ced_cpu = cpumask_first(ced->cpumask);
 
-	return uv_rtc_set_timer(ced_cpu, delta + uv_read_rtc(NULL));
+	return uv_rtc_set_timer(ced_cpu, delta + uv_read_rtc(NULL, NULL));
 }
 
 /*
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index a1895a8..6dfab65 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -39,21 +39,21 @@  static unsigned long xen_tsc_khz(void)
 	return pvclock_tsc_khz(info);
 }
 
-u64 xen_clocksource_read(void)
+u64 xen_clocksource_read(u64 *tsc_stamp)
 {
         struct pvclock_vcpu_time_info *src;
 	u64 ret;
 
 	preempt_disable_notrace();
 	src = &__this_cpu_read(xen_vcpu)->time;
-	ret = pvclock_clocksource_read(src);
+	ret = pvclock_clocksource_read(src, tsc_stamp);
 	preempt_enable_notrace();
 	return ret;
 }
 
-static u64 xen_clocksource_get_cycles(struct clocksource *cs)
+static u64 xen_clocksource_get_cycles(struct clocksource *cs, u64 *tsc_stamp)
 {
-	return xen_clocksource_read();
+	return xen_clocksource_read(tsc_stamp);
 }
 
 static void xen_read_wallclock(struct timespec *ts)
@@ -105,12 +105,12 @@  static int xen_pvclock_gtod_notify(struct notifier_block *nb,
 		op.u.settime64.mbz = 0;
 		op.u.settime64.secs = now.tv_sec;
 		op.u.settime64.nsecs = now.tv_nsec;
-		op.u.settime64.system_time = xen_clocksource_read();
+		op.u.settime64.system_time = xen_clocksource_read(NULL);
 	} else {
 		op.cmd = XENPF_settime32;
 		op.u.settime32.secs = now.tv_sec;
 		op.u.settime32.nsecs = now.tv_nsec;
-		op.u.settime32.system_time = xen_clocksource_read();
+		op.u.settime32.system_time = xen_clocksource_read(NULL);
 	}
 
 	ret = HYPERVISOR_platform_op(&op);
@@ -178,7 +178,7 @@  static struct clocksource xen_clocksource __read_mostly = {
 */
 static s64 get_abs_timeout(unsigned long delta)
 {
-	return xen_clocksource_read() + delta;
+	return xen_clocksource_read(NULL) + delta;
 }
 
 static int xen_timerop_shutdown(struct clock_event_device *evt)
@@ -366,8 +366,13 @@  void xen_timer_resume(void)
 	}
 }
 
+static u64 xen_clocksource_read_data(void)
+{
+	return xen_clocksource_read(NULL);
+}
+
 static const struct pv_time_ops xen_time_ops __initconst = {
-	.sched_clock = xen_clocksource_read,
+	.sched_clock = xen_clocksource_read_data,
 	.steal_clock = xen_steal_clock,
 };
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9a440a4..035f2ea 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -67,7 +67,7 @@  void xen_init_irq_ops(void);
 void xen_setup_timer(int cpu);
 void xen_setup_runstate_info(int cpu);
 void xen_teardown_timer(int cpu);
-u64 xen_clocksource_read(void);
+u64 xen_clocksource_read(u64 *tsc_stamp);
 void xen_setup_cpu_clockevents(void);
 void __init xen_init_time_ops(void);
 void __init xen_hvm_init_time_ops(void);
diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 1961e35..a75de2c 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -58,7 +58,7 @@  u32 acpi_pm_read_verified(void)
 	return v2;
 }
 
-static u64 acpi_pm_read(struct clocksource *cs)
+static u64 acpi_pm_read(struct clocksource *cs, u64 *tsc_stamp)
 {
 	return (u64)read_pmtmr();
 }
@@ -81,7 +81,7 @@  static int __init acpi_pm_good_setup(char *__str)
 }
 __setup("acpi_pm_good", acpi_pm_good_setup);
 
-static u64 acpi_pm_read_slow(struct clocksource *cs)
+static u64 acpi_pm_read_slow(struct clocksource *cs, u64 *tsc_stamp)
 {
 	return (u64)acpi_pm_read_verified();
 }
@@ -149,9 +149,9 @@  static int verify_pmtmr_rate(void)
 	unsigned long count, delta;
 
 	mach_prepare_counter();
-	value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+	value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm, NULL);
 	mach_countup(&count);
-	value2 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+	value2 = clocksource_acpi_pm.read(&clocksource_acpi_pm, NULL);
 	delta = (value2 - value1) & ACPI_PM_MASK;
 
 	/* Check that the PMTMR delta is within 5% of what we expect */
@@ -184,9 +184,9 @@  static int __init init_acpi_pm_clocksource(void)
 	/* "verify" this timing source: */
 	for (j = 0; j < ACPI_PM_MONOTONICITY_CHECKS; j++) {
 		udelay(100 * j);
-		value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+		value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm, NULL);
 		for (i = 0; i < ACPI_PM_READ_CHECKS; i++) {
-			value2 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+			value2 = clocksource_acpi_pm.read(&clocksource_acpi_pm, NULL);
 			if (value2 == value1)
 				continue;
 			if (value2 > value1)
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 186b100..74def09 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -218,7 +218,7 @@  static void hv_set_host_time(struct work_struct *work)
 
 	wrk = container_of(work, struct adj_time_work, work);
 
-	reftime = hyperv_cs->read(hyperv_cs);
+	reftime = hyperv_cs->read(hyperv_cs, NULL);
 	newtime = wrk->host_time + (reftime - wrk->ref_time);
 	host_ts = ns_to_timespec64((newtime - WLTIMEDELTA) * 100);
 
@@ -278,7 +278,7 @@  static inline void adj_guesttime(u64 hosttime, u64 reftime, u8 adj_flags)
 		 */
 		spin_lock_irqsave(&host_ts.lock, flags);
 
-		cur_reftime = hyperv_cs->read(hyperv_cs);
+		cur_reftime = hyperv_cs->read(hyperv_cs, NULL);
 		host_ts.host_time = hosttime;
 		host_ts.ref_time = cur_reftime;
 		ktime_get_snapshot(&host_ts.snap);
@@ -530,7 +530,7 @@  static int hv_ptp_gettime(struct ptp_clock_info *info, struct timespec64 *ts)
 	u64 newtime, reftime;
 
 	spin_lock_irqsave(&host_ts.lock, flags);
-	reftime = hyperv_cs->read(hyperv_cs);
+	reftime = hyperv_cs->read(hyperv_cs, NULL);
 	newtime = host_ts.host_time + (reftime - host_ts.ref_time);
 	*ts = ns_to_timespec64((newtime - WLTIMEDELTA) * 100);
 	spin_unlock_irqrestore(&host_ts.lock, flags);
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index f2b10d9..897aeb5 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -48,7 +48,12 @@  struct module;
  *			400-499: Perfect
  *				The ideal clocksource. A must-use where
  *				available.
- * @read:		returns a cycle value, passes clocksource as argument
+ * @read:		returns a cycle value, passes clocksource and
+ * 			a pointer where tsc value is stored which was used in
+ * 			calcualtion of the cycle value if any,
+ * 			otherwise the pointer value is untouched. Must check
+ * 			if the pointer is not-NULL
+ * 			this value is used in kvm code for storing tsc_timestamp
  * @enable:		optional function to enable the clocksource
  * @disable:		optional function to disable the clocksource
  * @mask:		bitmask for two's complement
@@ -77,7 +82,7 @@  struct module;
  * structure.
  */
 struct clocksource {
-	u64 (*read)(struct clocksource *cs);
+	u64 (*read)(struct clocksource *cs, u64 *tsc_stamp);
 	u64 mask;
 	u32 mult;
 	u32 shift;
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 110f453..58a2281 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -29,7 +29,7 @@ 
  */
 struct tk_read_base {
 	struct clocksource	*clock;
-	u64			(*read)(struct clocksource *cs);
+	u64			(*read)(struct clocksource *cs, u64 *tsc_stamp);
 	u64			mask;
 	u64			cycle_last;
 	u32			mult;
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ddc229f..31df92c 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -171,7 +171,7 @@  enum tk_offsets {
 };
 
 extern ktime_t ktime_get(void);
-extern ktime_t ktime_get_with_offset(enum tk_offsets offs);
+extern ktime_t ktime_get_with_offset(enum tk_offsets offs, u64 *cycles);
 extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs);
 extern ktime_t ktime_get_raw(void);
 extern u32 ktime_get_resolution_ns(void);
@@ -181,7 +181,7 @@  extern u32 ktime_get_resolution_ns(void);
  */
 static inline ktime_t ktime_get_real(void)
 {
-	return ktime_get_with_offset(TK_OFFS_REAL);
+	return ktime_get_with_offset(TK_OFFS_REAL, NULL);
 }
 
 /**
@@ -192,15 +192,19 @@  static inline ktime_t ktime_get_real(void)
  */
 static inline ktime_t ktime_get_boottime(void)
 {
-	return ktime_get_with_offset(TK_OFFS_BOOT);
+	return ktime_get_with_offset(TK_OFFS_BOOT, NULL);
 }
 
+static inline ktime_t ktime_get_boottime_and_cycles(u64 *cycles)
+{
+	return ktime_get_with_offset(TK_OFFS_BOOT, cycles);
+}
 /**
  * ktime_get_clocktai - Returns the TAI time of day in ktime_t format
  */
 static inline ktime_t ktime_get_clocktai(void)
 {
-	return ktime_get_with_offset(TK_OFFS_TAI);
+	return ktime_get_with_offset(TK_OFFS_TAI, NULL);
 }
 
 /**
@@ -226,6 +230,11 @@  static inline u64 ktime_get_boot_ns(void)
 	return ktime_to_ns(ktime_get_boottime());
 }
 
+static inline u64 ktime_get_boot_ns_and_cycles(u64 *cycles)
+{
+	return ktime_to_ns(ktime_get_boottime_and_cycles(cycles));
+}
+
 static inline u64 ktime_get_tai_ns(void)
 {
 	return ktime_to_ns(ktime_get_clocktai());
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 93621ae..e48a6eb 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -194,8 +194,8 @@  static void clocksource_watchdog(unsigned long data)
 		}
 
 		local_irq_disable();
-		csnow = cs->read(cs);
-		wdnow = watchdog->read(watchdog);
+		csnow = cs->read(cs, NULL);
+		wdnow = watchdog->read(watchdog, NULL);
 		local_irq_enable();
 
 		/* Clocksource initialized ? */
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 4977191..079a78ec 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -48,7 +48,7 @@ 
 #define JIFFIES_SHIFT	8
 #endif
 
-static u64 jiffies_read(struct clocksource *cs)
+static u64 jiffies_read(struct clocksource *cs, u64 *tsc_stamp)
 {
 	return (u64) jiffies;
 }
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9652bc5..994f83b 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -160,7 +160,7 @@  static void timekeeping_check_update(struct timekeeper *tk, u64 offset)
 	}
 }
 
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(struct tk_read_base *tkr, u64 *tsc_stamp)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	u64 now, last, mask, max, delta;
@@ -175,7 +175,7 @@  static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 	 */
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tkr->read(tkr->clock);
+		now = tkr->read(tkr->clock, tsc_stamp);
 		last = tkr->cycle_last;
 		mask = tkr->mask;
 		max = tkr->clock->max_cycles;
@@ -204,12 +204,12 @@  static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 static inline void timekeeping_check_update(struct timekeeper *tk, u64 offset)
 {
 }
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(struct tk_read_base *tkr, u64 *tsc_stamp)
 {
 	u64 cycle_now, delta;
 
 	/* read clocksource */
-	cycle_now = tkr->read(tkr->clock);
+	cycle_now = tkr->read(tkr->clock, tsc_stamp);
 
 	/* calculate the delta since the last update_wall_time */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
@@ -240,7 +240,7 @@  static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	tk->tkr_mono.clock = clock;
 	tk->tkr_mono.read = clock->read;
 	tk->tkr_mono.mask = clock->mask;
-	tk->tkr_mono.cycle_last = tk->tkr_mono.read(clock);
+	tk->tkr_mono.cycle_last = tk->tkr_mono.read(clock, NULL);
 
 	tk->tkr_raw.clock = clock;
 	tk->tkr_raw.read = clock->read;
@@ -311,11 +311,11 @@  static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 delta)
 	return nsec + arch_gettimeoffset();
 }
 
-static inline u64 timekeeping_get_ns(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_ns(struct tk_read_base *tkr, u64 *tsc_stamp)
 {
 	u64 delta;
 
-	delta = timekeeping_get_delta(tkr);
+	delta = timekeeping_get_delta(tkr, tsc_stamp);
 	return timekeeping_delta_to_ns(tkr, delta);
 }
 
@@ -404,7 +404,7 @@  static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 
 		now += timekeeping_delta_to_ns(tkr,
 				clocksource_delta(
-					tkr->read(tkr->clock),
+					tkr->read(tkr->clock, NULL),
 					tkr->cycle_last,
 					tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
@@ -456,7 +456,7 @@  EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
 /* Suspend-time cycles value for halted fast timekeeper. */
 static u64 cycles_at_suspend;
 
-static u64 dummy_clock_read(struct clocksource *cs)
+static u64 dummy_clock_read(struct clocksource *cs, u64 *tsc_stamp)
 {
 	return cycles_at_suspend;
 }
@@ -477,7 +477,7 @@  static void halt_fast_timekeeper(struct timekeeper *tk)
 	struct tk_read_base *tkr = &tk->tkr_mono;
 
 	memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy));
-	cycles_at_suspend = tkr->read(tkr->clock);
+	cycles_at_suspend = tkr->read(tkr->clock, NULL);
 	tkr_dummy.read = dummy_clock_read;
 	update_fast_timekeeper(&tkr_dummy, &tk_fast_mono);
 
@@ -653,7 +653,7 @@  static void timekeeping_forward_now(struct timekeeper *tk)
 	u64 cycle_now, delta;
 	u64 nsec;
 
-	cycle_now = tk->tkr_mono.read(clock);
+	cycle_now = tk->tkr_mono.read(clock, NULL);
 	delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 	tk->tkr_mono.cycle_last = cycle_now;
 	tk->tkr_raw.cycle_last  = cycle_now;
@@ -686,7 +686,7 @@  int __getnstimeofday64(struct timespec64 *ts)
 		seq = read_seqcount_begin(&tk_core.seq);
 
 		ts->tv_sec = tk->xtime_sec;
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, NULL);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -727,7 +727,7 @@  ktime_t ktime_get(void)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = tk->tkr_mono.base;
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, NULL);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -758,7 +758,7 @@  static ktime_t *offsets[TK_OFFS_MAX] = {
 	[TK_OFFS_TAI]	= &tk_core.timekeeper.offs_tai,
 };
 
-ktime_t ktime_get_with_offset(enum tk_offsets offs)
+ktime_t ktime_get_with_offset(enum tk_offsets offs, u64 *tsc_stamp)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	unsigned int seq;
@@ -770,7 +770,7 @@  ktime_t ktime_get_with_offset(enum tk_offsets offs)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = ktime_add(tk->tkr_mono.base, *offset);
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, tsc_stamp);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -812,7 +812,7 @@  ktime_t ktime_get_raw(void)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = tk->tkr_raw.base;
-		nsecs = timekeeping_get_ns(&tk->tkr_raw);
+		nsecs = timekeeping_get_ns(&tk->tkr_raw, NULL);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -840,7 +840,7 @@  void ktime_get_ts64(struct timespec64 *ts)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		ts->tv_sec = tk->xtime_sec;
-		nsec = timekeeping_get_ns(&tk->tkr_mono);
+		nsec = timekeeping_get_ns(&tk->tkr_mono, NULL);
 		tomono = tk->wall_to_monotonic;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
@@ -930,7 +930,7 @@  void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 
-		now = tk->tkr_mono.read(tk->tkr_mono.clock);
+		now = tk->tkr_mono.read(tk->tkr_mono.clock, NULL);
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
@@ -1108,7 +1108,7 @@  int get_device_system_crosststamp(int (*get_time_fn)
 		 * Check whether the system counter value provided by the
 		 * device driver is on the current timekeeping interval.
 		 */
-		now = tk->tkr_mono.read(tk->tkr_mono.clock);
+		now = tk->tkr_mono.read(tk->tkr_mono.clock, NULL);
 		interval_start = tk->tkr_mono.cycle_last;
 		if (!cycle_between(interval_start, cycles, now)) {
 			clock_was_set_seq = tk->clock_was_set_seq;
@@ -1359,7 +1359,7 @@  void getrawmonotonic64(struct timespec64 *ts)
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		nsecs = timekeeping_get_ns(&tk->tkr_raw);
+		nsecs = timekeeping_get_ns(&tk->tkr_raw, NULL);
 		ts64 = tk->raw_time;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
@@ -1629,7 +1629,7 @@  void timekeeping_resume(void)
 	 * The less preferred source will only be tried if there is no better
 	 * usable source. The rtc part is handled separately in rtc core code.
 	 */
-	cycle_now = tk->tkr_mono.read(clock);
+	cycle_now = tk->tkr_mono.read(clock, NULL);
 	if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
 		cycle_now > tk->tkr_mono.cycle_last) {
 		u64 nsec, cyc_delta;
@@ -2030,7 +2030,7 @@  void update_wall_time(void)
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
 	offset = real_tk->cycle_interval;
 #else
-	offset = clocksource_delta(tk->tkr_mono.read(tk->tkr_mono.clock),
+	offset = clocksource_delta(tk->tkr_mono.read(tk->tkr_mono.clock, NULL),
 				   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 #endif
 
@@ -2203,7 +2203,7 @@  ktime_t ktime_get_update_offsets_now(unsigned int *cwsseq, ktime_t *offs_real,
 		seq = read_seqcount_begin(&tk_core.seq);
 
 		base = tk->tkr_mono.base;
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, NULL);
 		base = ktime_add_ns(base, nsecs);
 
 		if (*cwsseq != tk->clock_was_set_seq) {