diff mbox

[v2,2/2] ARM: delay: allow timer-based delay implementation to be selected

Message ID 20120713111337.GH18079@mudshark.cambridge.arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Will Deacon July 13, 2012, 11:13 a.m. UTC
On Fri, Jul 13, 2012 at 11:48:25AM +0100, Shilimkar, Santosh wrote:
> Wiil,

Hi Saantosh :),

> On Fri, Jul 13, 2012 at 2:27 PM, Will Deacon <will.deacon@arm.com> wrote:
> > That should work, providing you can get a sensible initial estimate for
> > loops_per_jiffy.
> >
> Do you have an updated version of the patch ?

I was anticipating that the platform would set the initial loops_per_jiffy
value if it requires udelays before loop calibration and the default of 4k
is wildly off.

If people need loops_per_jiffy to be updated at the same time as lpj_fine,
I can post that as a separate patch (below) as Russell has merged v2 of these
patches into his delay branch. That said, I'd certainly like to know if this
is actually a real problem (and can't be solved by choosing a compromise value
as the initial loops_per_jiffy). I think Shinya was doing some tests so
I'll wait to see how those went.

Will

---8<---

Comments

Santosh Shilimkar July 13, 2012, 12:04 p.m. UTC | #1
On Fri, Jul 13, 2012 at 4:43 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 13, 2012 at 11:48:25AM +0100, Shilimkar, Santosh wrote:
>> Wiil,
>
> Hi Saantosh :),
>
>> On Fri, Jul 13, 2012 at 2:27 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > That should work, providing you can get a sensible initial estimate for
>> > loops_per_jiffy.
>> >
>> Do you have an updated version of the patch ?
>
> I was anticipating that the platform would set the initial loops_per_jiffy
> value if it requires udelays before loop calibration and the default of 4k
> is wildly off.
>
> If people need loops_per_jiffy to be updated at the same time as lpj_fine,
> I can post that as a separate patch (below) as Russell has merged v2 of these
> patches into his delay branch. That said, I'd certainly like to know if this
> is actually a real problem (and can't be solved by choosing a compromise value
> as the initial loops_per_jiffy). I think Shinya was doing some tests so
> I'll wait to see how those went.
>
I just tried the suggested change + two patches.
I haven't gone through the complete discussion but quick test of your patches
on OMAP5, shows me, the BOGOMIPS is completely wrong. Looks like
you are not updating the  'loops_per_jiffy' in the calibration loop.

What Am I missing ?

Regards
santosh
Will Deacon July 13, 2012, 12:08 p.m. UTC | #2
On Fri, Jul 13, 2012 at 01:04:27PM +0100, Shilimkar, Santosh wrote:
> On Fri, Jul 13, 2012 at 4:43 PM, Will Deacon <will.deacon@arm.com> wrote:
> > I was anticipating that the platform would set the initial loops_per_jiffy
> > value if it requires udelays before loop calibration and the default of 4k
> > is wildly off.
> >
> > If people need loops_per_jiffy to be updated at the same time as lpj_fine,
> > I can post that as a separate patch (below) as Russell has merged v2 of these
> > patches into his delay branch. That said, I'd certainly like to know if this
> > is actually a real problem (and can't be solved by choosing a compromise value
> > as the initial loops_per_jiffy). I think Shinya was doing some tests so
> > I'll wait to see how those went.
> >
> I just tried the suggested change + two patches.
> I haven't gone through the complete discussion but quick test of your patches
> on OMAP5, shows me, the BOGOMIPS is completely wrong. Looks like
> you are not updating the  'loops_per_jiffy' in the calibration loop.

What is your bogomips reading and what frequency is your architected timer
ticking at? loops_per_jiffy is updated by calibrate_delay in
init/calibrate.c so we shouldn't need to worry about that (the diff I posted
in my last email is just for udelay calls between switching to the timer and
doing the calibration).

> What Am I missing ?

Well, your bogomips number will probably be *much* lower than before, so
don't be surprised by that.

Will
Santosh Shilimkar July 13, 2012, 12:14 p.m. UTC | #3
On Fri, Jul 13, 2012 at 5:38 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 13, 2012 at 01:04:27PM +0100, Shilimkar, Santosh wrote:
>> On Fri, Jul 13, 2012 at 4:43 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > I was anticipating that the platform would set the initial loops_per_jiffy
>> > value if it requires udelays before loop calibration and the default of 4k
>> > is wildly off.
>> >
>> > If people need loops_per_jiffy to be updated at the same time as lpj_fine,
>> > I can post that as a separate patch (below) as Russell has merged v2 of these
>> > patches into his delay branch. That said, I'd certainly like to know if this
>> > is actually a real problem (and can't be solved by choosing a compromise value
>> > as the initial loops_per_jiffy). I think Shinya was doing some tests so
>> > I'll wait to see how those went.
>> >
>> I just tried the suggested change + two patches.
>> I haven't gone through the complete discussion but quick test of your patches
>> on OMAP5, shows me, the BOGOMIPS is completely wrong. Looks like
>> you are not updating the  'loops_per_jiffy' in the calibration loop.
>
> What is your bogomips reading and what frequency is your architected timer
> ticking at? loops_per_jiffy is updated by calibrate_delay in
> init/calibrate.c so we shouldn't need to worry about that (the diff I posted
> in my last email is just for udelay calls between switching to the timer and
> doing the calibration).
>
>> What Am I missing ?
>
> Well, your bogomips number will probably be *much* lower than before, so
> don't be surprised by that.
>
Indeed !!
This is what I observed. Can we not keep that behavior old way. BogoMIPS and
CPU clock is often very easy to co-related and helpful for CPUFREQ stats.

Regards
Santosh
Will Deacon July 13, 2012, 12:23 p.m. UTC | #4
On Fri, Jul 13, 2012 at 01:14:59PM +0100, Shilimkar, Santosh wrote:
> On Fri, Jul 13, 2012 at 5:38 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Well, your bogomips number will probably be *much* lower than before, so
> > don't be surprised by that.
> >
> Indeed !!
> This is what I observed. Can we not keep that behavior old way. BogoMIPS and
> CPU clock is often very easy to co-related and helpful for CPUFREQ stats.

I defer to David Miller, who had a similar reaction when he did this for
sparc:

http://www.spinics.net/lists/sparclinux/msg08550.html

Will
Santosh Shilimkar July 13, 2012, 12:28 p.m. UTC | #5
On Fri, Jul 13, 2012 at 5:53 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Jul 13, 2012 at 01:14:59PM +0100, Shilimkar, Santosh wrote:
>> On Fri, Jul 13, 2012 at 5:38 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > Well, your bogomips number will probably be *much* lower than before, so
>> > don't be surprised by that.
>> >
>> Indeed !!
>> This is what I observed. Can we not keep that behavior old way. BogoMIPS and
>> CPU clock is often very easy to co-related and helpful for CPUFREQ stats.
>
> I defer to David Miller, who had a similar reaction when he did this for
> sparc:
>
> http://www.spinics.net/lists/sparclinux/msg08550.html
>
Fair enough. Anybody who sees this BOGOMIPS change will
get similar reaction. I am fine with it now after knowing the
reason.
Thought was just to avoid similar question in future which you
might have to answer :-)

Regards
Santosh
Shinya Kuribayashi July 17, 2012, 3:10 a.m. UTC | #6
Will, Stephen and Santosh,

On 7/13/2012 8:13 PM, Will Deacon wrote:
> I was anticipating that the platform would set the initial loops_per_jiffy
> value if it requires udelays before loop calibration and the default of 4k
> is wildly off.

I overlooked two different lpj setups were involved at hand.

First one was, the initial loops_per_jiffy value of 4k was too small for
almost all processors running Linux today, so I set up loops_per_jiffy
_early_, calculated from the CPU clock speed.  I didn't mentioned this
before, sorry for confusion.

So my initial loops_per_jiffy is not 4k at this point.  It's optimized
for loop-based delay with the CPU running at 1.2GHz (much bigger than
default 4k).

And later, init_current_timer_delay() got processed.  Actual udelay()
behavior switched from loop-based delay to timer-based one immediately,
while my loops_per_jiffy was not changed/updated to appropriate value.

This is why my udelay()s, used after init_current_timer_delay(), were
taking considerable long time to expire.   Note that my initial tests
for Will's patchset was done using a loadable module dedicated for
udelay tests, that was prepared for 2.6.35/3.0 kernels beforehand.

And this time, I confirmed that updating loops_per_jiffy at the same
time as lpj_fine, works perfectly as expected for me.

> If people need loops_per_jiffy to be updated at the same time as lpj_fine,
> I can post that as a separate patch (below) as Russell has merged v2 of these
> patches into his delay branch. That said, I'd certainly like to know if this
> is actually a real problem (and can't be solved by choosing a compromise value
> as the initial loops_per_jiffy). I think Shinya was doing some tests so
> I'll wait to see how those went.

From my observations:

(1) loops_per_jiffy can easily be calculated from the CPU clock speed.
If your platform is capable of detecting CPU frequency at run-time,
settingi up loops_per_jiffy _early_ can allow you early use of udelay()s.

Or even if you don't need udelay() early, setting up lpj_fine (or having
calibrate_delay_is_known()) allows you to skip calibrate_delay() later.
This is useful and can be applied to both UP and SMP systems.

(2) For SMP platforms, if you need ealy use of udelay(), you have to
update loops_per_jiffy at the same time as init_current_timer_delay().
It could be done in init_current_timer_delay(), or platforms can take
care of that, that need udelay() available early.  Either one should be
fine with me.
--
Shinya Kuribayashi
Renesas Electronics
Santosh Shilimkar July 17, 2012, 6:11 a.m. UTC | #7
On Tue, Jul 17, 2012 at 8:40 AM, Shinya Kuribayashi
<shinya.kuribayashi.px@renesas.com> wrote:
> Will, Stephen and Santosh,
>
> On 7/13/2012 8:13 PM, Will Deacon wrote:
>> I was anticipating that the platform would set the initial loops_per_jiffy
>> value if it requires udelays before loop calibration and the default of 4k
>> is wildly off.
>
> I overlooked two different lpj setups were involved at hand.
>
> First one was, the initial loops_per_jiffy value of 4k was too small for
> almost all processors running Linux today, so I set up loops_per_jiffy
> _early_, calculated from the CPU clock speed.  I didn't mentioned this
> before, sorry for confusion.
>
> So my initial loops_per_jiffy is not 4k at this point.  It's optimized
> for loop-based delay with the CPU running at 1.2GHz (much bigger than
> default 4k).
>
> And later, init_current_timer_delay() got processed.  Actual udelay()
> behavior switched from loop-based delay to timer-based one immediately,
> while my loops_per_jiffy was not changed/updated to appropriate value.
>
> This is why my udelay()s, used after init_current_timer_delay(), were
> taking considerable long time to expire.   Note that my initial tests
> for Will's patchset was done using a loadable module dedicated for
> udelay tests, that was prepared for 2.6.35/3.0 kernels beforehand.
>
> And this time, I confirmed that updating loops_per_jiffy at the same
> time as lpj_fine, works perfectly as expected for me.
>
>> If people need loops_per_jiffy to be updated at the same time as lpj_fine,
>> I can post that as a separate patch (below) as Russell has merged v2 of these
>> patches into his delay branch. That said, I'd certainly like to know if this
>> is actually a real problem (and can't be solved by choosing a compromise value
>> as the initial loops_per_jiffy). I think Shinya was doing some tests so
>> I'll wait to see how those went.
>
> From my observations:
>
> (1) loops_per_jiffy can easily be calculated from the CPU clock speed.
> If your platform is capable of detecting CPU frequency at run-time,
> settingi up loops_per_jiffy _early_ can allow you early use of udelay()s.
>
> Or even if you don't need udelay() early, setting up lpj_fine (or having
> calibrate_delay_is_known()) allows you to skip calibrate_delay() later.
> This is useful and can be applied to both UP and SMP systems.
>
> (2) For SMP platforms, if you need ealy use of udelay(), you have to
> update loops_per_jiffy at the same time as init_current_timer_delay().
> It could be done in init_current_timer_delay(), or platforms can take
> care of that, that need udelay() available early.  Either one should be
> fine with me.

Thanks for the detailed explanation. CPU clock detection is indeed the
nit way to skip the calibration overhead and this was one of the comment
when I tried to push the skipping of calibration for secondary CPUs.

Looks like you have a working patch for the clock detection. Will
you able to post that patch so that this long pending calibration
for secondary CPUs gets optimized.

Regards
Santosh
diff mbox

Patch

diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index d6dacc6..395d5fb 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -59,6 +59,7 @@  void __init init_current_timer_delay(unsigned long freq)
 {
        pr_info("Switching to timer-based delay loop\n");
        lpj_fine                        = freq / HZ;
+       loops_per_jiffy                 = lpj_fine;
        arm_delay_ops.delay             = __timer_delay;
        arm_delay_ops.const_udelay      = __timer_const_udelay;
        arm_delay_ops.udelay            = __timer_udelay;