diff mbox

sched: Support current clocksource handling in fallback sched_clock().

Message ID 20090527001543.GA8493@linux-sh.org (mailing list archive)
State Superseded
Headers show

Commit Message

Paul Mundt May 27, 2009, 12:15 a.m. UTC
On Wed, May 27, 2009 at 01:49:33AM +0200, Thomas Gleixner wrote:
> On Wed, 27 May 2009, Paul Mundt wrote:
> > Ok, so based on this and John's locking concerns, how about something
> > like this? It doesn't handle the wrapping cases, but I wonder if we
> > really want to add that amount of logic to sched_clock() in the first
> > place. Clocksources that wrap frequently could either leave the flag
> > unset, or do something similar to the TSC code where the cyc2ns shift is
> > used. If this is something we want to handle generically, then I'll have
> > a go at generalizing the TSC cyc2ns scaling bits for the next spin.
> 
> Gah. There is no locking issue. As Peter explained before the
> scheduler code can cope with some inaccurate value.
> 
> The wrap issue is completly academic. If the current clock source has
> a wrap issue then it needs to be addressed anyway by frequent enough
> wakeups to assure correctness of timekeeping and that makes it
> suitable for the sched clock domain as well. Also the scheduler can
> not hit a value which has not gone through the irq_enter() based
> update after a long idle sleep.
> 
> So changing your previous patch from
> 
>    if (clock && clock->rating > 100)
> 
> to
> 
>    if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
> 
> is sufficient.
> 
Works for me.. here's v3 with an updated changelog.

--

sched: Support current clocksource handling in fallback sched_clock(), v3.

There are presently a number of issues and limitations with how the
clocksource and sched_clock() interaction works today. Configurations
tend to be grouped in to one of the following:

	- Platform provides a clocksource unsuitable for sched_clock()
	  and prefers to use the generic jiffies-backed implementation.

	- Platform provides its own clocksource and sched_clock() that
	  wraps in to it.

	- Platform uses a generic clocksource (ie, drivers/clocksource/)
	  combined with the generic jiffies-backed sched_clock().

	- Platform supports multiple sched_clock()-capable clocksources.

This patch adds a new CLOCK_SOURCE_USE_FOR_SCHED_CLOCK flag to address
these issues. The first case simply doesn't set the flag at all (or
clears it, if a clocksource is unstable), while the second case is made
redundant for any sched_clock() implementation that just does the
cyc2ns() case, which tends to be the vast majority of the embedded
platforms.

The remaining cases are handled transparently, in that sched_clock() will
always read from whatever the current clocksource is, as long as it is
has the flag set. This permits switching between multiple clocksources
which may or may not support being used by sched_clock(), and while some
inaccuracy may occur in these corner cases, the scheduler can live with
this.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

---

 include/linux/clocksource.h |    1 +
 kernel/sched_clock.c        |    9 +++++++++
 2 files changed, 10 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Daniel Walker May 27, 2009, 4:25 p.m. UTC | #1
On Wed, 2009-05-27 at 09:15 +0900, Paul Mundt wrote:

>  unsigned long long __attribute__((weak)) sched_clock(void)
>  {
> +	/*
> +	 * Use the current clocksource when it becomes available later in
> +	 * the boot process, and ensure that it is usable for sched_clock().
> +	 */
> +	if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
> +		return cyc2ns(clock, clocksource_read(clock));
> +
> +	/* Otherwise just fall back on jiffies */
>  	return (unsigned long long)(jiffies - INITIAL_JIFFIES)

Do we really need all this complexity in a fast path? the jiffies
clocksource is static and always ready. Could we instead remove the
usage of "clock" and create a new pointer called "sched_clocksource" and
initialize it to the jiffies clock, and allow the clocksource management
code to update that new pointer based on the
CLOCK_SOURCE_USE_FOR_SCHED_CLOCK flag when new clocksources are
registered/removed/marked unstable etc..

that would eliminate all the code in sched_clock except one line,

unsigned long long __attribute__((weak)) sched_clock(void)
{
	return cyc2ns(sched_clocksource, clocksource_read(sched_clocksource));
}

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mundt May 28, 2009, 8:44 a.m. UTC | #2
On Wed, May 27, 2009 at 09:25:25AM -0700, Daniel Walker wrote:
> On Wed, 2009-05-27 at 09:15 +0900, Paul Mundt wrote:
> 
> >  unsigned long long __attribute__((weak)) sched_clock(void)
> >  {
> > +	/*
> > +	 * Use the current clocksource when it becomes available later in
> > +	 * the boot process, and ensure that it is usable for sched_clock().
> > +	 */
> > +	if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
> > +		return cyc2ns(clock, clocksource_read(clock));
> > +
> > +	/* Otherwise just fall back on jiffies */
> >  	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> 
> Do we really need all this complexity in a fast path? the jiffies
> clocksource is static and always ready. Could we instead remove the
> usage of "clock" and create a new pointer called "sched_clocksource" and
> initialize it to the jiffies clock, and allow the clocksource management
> code to update that new pointer based on the
> CLOCK_SOURCE_USE_FOR_SCHED_CLOCK flag when new clocksources are
> registered/removed/marked unstable etc..
> 
> that would eliminate all the code in sched_clock except one line,
> 
> unsigned long long __attribute__((weak)) sched_clock(void)
> {
> 	return cyc2ns(sched_clocksource, clocksource_read(sched_clocksource));
> }
> 
It would, but then what would that do to the sched_clock() value? It will
reset once a CLOCK_SOURCE_USE_FOR_SCHED_CLOCK clocksource comes along,
which basically means that sched_clock() then rolls back, which is worse
than simply being delayed a bit. So even if we were to use a special
sched_clocksource, it still could not be used until it was set up at
registration time.

Now we could of course add a sched_clocksource assignment in
select_clocksource(), but that's really no different than just testing
'clock' as we do today. All it does is move the flag test in to a
different path.

So, I think the current v3 is still the best solution.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..70d156f 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -212,6 +212,7 @@  extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..a0c18da 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@ 
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,6 +39,14 @@ 
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
+	/*
+	 * Use the current clocksource when it becomes available later in
+	 * the boot process, and ensure that it is usable for sched_clock().
+	 */
+	if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
+		return cyc2ns(clock, clocksource_read(clock));
+
+	/* Otherwise just fall back on jiffies */
 	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
 					* (NSEC_PER_SEC / HZ);
 }