diff mbox

sched_clock: Prevent callers from seeing half-updated data

Message ID 1391806139-20116-1-git-send-email-sboyd@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Stephen Boyd Feb. 7, 2014, 8:48 p.m. UTC
If two sched_clock sources are registered we may end up in a
situation where a call to sched_clock() may be accessing the
epoch cycle count for the old counter and the cycle count for the
new counter. This can lead to confusing results where
sched_clock() values jump and then are reset to 0 (due to the way
the registration function forces the epoch_ns to be 0). Fix this
by reorganizing the registration function to hold the seqlock for
as short a time as possible while we update the clock_data
structure for a new counter and stop resetting the epoch_ns count
to 0.

Reported-by: Will Deacon <will.deacon@arm.com>
Cc: Josh Cartwright <joshc@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 kernel/time/sched_clock.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

Comments

Stephen Boyd Feb. 7, 2014, 10:22 p.m. UTC | #1
On 02/07, Stephen Boyd wrote:
> If two sched_clock sources are registered we may end up in a
> situation where a call to sched_clock() may be accessing the
> epoch cycle count for the old counter and the cycle count for the
> new counter. This can lead to confusing results where
> sched_clock() values jump and then are reset to 0 (due to the way
> the registration function forces the epoch_ns to be 0). Fix this
> by reorganizing the registration function to hold the seqlock for
> as short a time as possible while we update the clock_data
> structure for a new counter and stop resetting the epoch_ns count
> to 0.

Hmm.. This won't properly accumulate time. We need to put
whatever time has elapsed into epoch_ns when we register the new
counter for this to work. I don't have a board with this
configuration but I'll send a v2 that should fix this. Hopefully
Will can test it.
John Stultz Feb. 7, 2014, 10:28 p.m. UTC | #2
On 02/07/2014 02:22 PM, Stephen Boyd wrote:
> On 02/07, Stephen Boyd wrote:
>> If two sched_clock sources are registered we may end up in a
>> situation where a call to sched_clock() may be accessing the
>> epoch cycle count for the old counter and the cycle count for the
>> new counter. This can lead to confusing results where
>> sched_clock() values jump and then are reset to 0 (due to the way
>> the registration function forces the epoch_ns to be 0). Fix this
>> by reorganizing the registration function to hold the seqlock for
>> as short a time as possible while we update the clock_data
>> structure for a new counter and stop resetting the epoch_ns count
>> to 0.
> Hmm.. This won't properly accumulate time. We need to put
> whatever time has elapsed into epoch_ns when we register the new
> counter for this to work. I don't have a board with this
> configuration but I'll send a v2 that should fix this. Hopefully
> Will can test it.

Also maybe clarify in the commit message that this is a result of not
having the necessary locking in place in the registration code (likely
due to it not really being required in the single clock case), just so
Ingo and others have some more context as to why this is needed now and
wasn't hit before.

thanks
-john
Stephen Boyd Feb. 11, 2014, 6:49 a.m. UTC | #3
On 02/07, John Stultz wrote:
> On 02/07/2014 02:22 PM, Stephen Boyd wrote:
> > On 02/07, Stephen Boyd wrote:
> >> If two sched_clock sources are registered we may end up in a
> >> situation where a call to sched_clock() may be accessing the
> >> epoch cycle count for the old counter and the cycle count for the
> >> new counter. This can lead to confusing results where
> >> sched_clock() values jump and then are reset to 0 (due to the way
> >> the registration function forces the epoch_ns to be 0). Fix this
> >> by reorganizing the registration function to hold the seqlock for
> >> as short a time as possible while we update the clock_data
> >> structure for a new counter and stop resetting the epoch_ns count
> >> to 0.
> > Hmm.. This won't properly accumulate time. We need to put
> > whatever time has elapsed into epoch_ns when we register the new
> > counter for this to work. I don't have a board with this
> > configuration but I'll send a v2 that should fix this. Hopefully
> > Will can test it.
> 
> Also maybe clarify in the commit message that this is a result of not
> having the necessary locking in place in the registration code (likely
> due to it not really being required in the single clock case), just so
> Ingo and others have some more context as to why this is needed now and
> wasn't hit before.
> 

Hmph... I already sent v2 before you replied. Is the commit text
good enough? I do mention that this is about two sched_clock
sources being registered.
John Stultz Feb. 17, 2014, 6:13 p.m. UTC | #4
On 02/10/2014 10:49 PM, Stephen Boyd wrote:
> On 02/07, John Stultz wrote:
>> On 02/07/2014 02:22 PM, Stephen Boyd wrote:
>>> On 02/07, Stephen Boyd wrote:
>>>> If two sched_clock sources are registered we may end up in a
>>>> situation where a call to sched_clock() may be accessing the
>>>> epoch cycle count for the old counter and the cycle count for the
>>>> new counter. This can lead to confusing results where
>>>> sched_clock() values jump and then are reset to 0 (due to the way
>>>> the registration function forces the epoch_ns to be 0). Fix this
>>>> by reorganizing the registration function to hold the seqlock for
>>>> as short a time as possible while we update the clock_data
>>>> structure for a new counter and stop resetting the epoch_ns count
>>>> to 0.
>>> Hmm.. This won't properly accumulate time. We need to put
>>> whatever time has elapsed into epoch_ns when we register the new
>>> counter for this to work. I don't have a board with this
>>> configuration but I'll send a v2 that should fix this. Hopefully
>>> Will can test it.
>> Also maybe clarify in the commit message that this is a result of not
>> having the necessary locking in place in the registration code (likely
>> due to it not really being required in the single clock case), just so
>> Ingo and others have some more context as to why this is needed now and
>> wasn't hit before.
>>
> Hmph... I already sent v2 before you replied. Is the commit text
> good enough? I do mention that this is about two sched_clock
> sources being registered.

I'll tweak the commit message a bit make this point more clear.

thanks
-john
diff mbox

Patch

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 0abb36464281..36ffce3a4b83 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -116,20 +116,38 @@  static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
 void __init sched_clock_register(u64 (*read)(void), int bits,
 				 unsigned long rate)
 {
+	u64 res, wrap, new_mask, new_epoch;
+	u32 new_mult, new_shift;
+	ktime_t new_wrap_kt;
 	unsigned long r;
-	u64 res, wrap;
 	char r_unit;
 
 	if (cd.rate > rate)
 		return;
 
 	WARN_ON(!irqs_disabled());
-	read_sched_clock = read;
-	sched_clock_mask = CLOCKSOURCE_MASK(bits);
-	cd.rate = rate;
 
 	/* calculate the mult/shift to convert counter ticks to ns. */
-	clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 3600);
+	clocks_calc_mult_shift(&new_mult, &new_shift, rate, NSEC_PER_SEC, 3600);
+
+	new_mask = CLOCKSOURCE_MASK(bits);
+
+	/* calculate how many ns until we wrap */
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+
+	/* update epoch for the new counter */
+	new_epoch = read();
+
+	raw_write_seqcount_begin(&cd.seq);
+	read_sched_clock = read;
+	sched_clock_mask = new_mask;
+	cd.rate = rate;
+	cd.wrap_kt = new_wrap_kt;
+	cd.mult = new_mult;
+	cd.shift = new_shift;
+	cd.epoch_cyc = new_epoch;
+	raw_write_seqcount_end(&cd.seq);
 
 	r = rate;
 	if (r >= 4000000) {
@@ -141,22 +159,12 @@  void __init sched_clock_register(u64 (*read)(void), int bits,
 	} else
 		r_unit = ' ';
 
-	/* calculate how many ns until we wrap */
-	wrap = clocks_calc_max_nsecs(cd.mult, cd.shift, 0, sched_clock_mask);
-	cd.wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
-
 	/* calculate the ns resolution of this counter */
-	res = cyc_to_ns(1ULL, cd.mult, cd.shift);
+	res = cyc_to_ns(1ULL, new_mult, new_shift);
+
 	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lluns\n",
 		bits, r, r_unit, res, wrap);
 
-	update_sched_clock();
-
-	/*
-	 * Ensure that sched_clock() starts off at 0ns
-	 */
-	cd.epoch_ns = 0;
-
 	/* Enable IRQ time accounting if we have a fast enough sched_clock */
 	if (irqtime > 0 || (irqtime == -1 && rate >= 1000000))
 		enable_sched_clock_irqtime();