Message ID | 20120713111337.GH18079@mudshark.cambridge.arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 13, 2012 at 4:43 PM, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Jul 13, 2012 at 11:48:25AM +0100, Shilimkar, Santosh wrote: >> Wiil, > > Hi Saantosh :), > >> On Fri, Jul 13, 2012 at 2:27 PM, Will Deacon <will.deacon@arm.com> wrote: >> > That should work, providing you can get a sensible initial estimate for >> > loops_per_jiffy. >> > >> Do you have an updated version of the patch ? > > I was anticipating that the platform would set the initial loops_per_jiffy > value if it requires udelays before loop calibration and the default of 4k > is wildly off. > > If people need loops_per_jiffy to be updated at the same time as lpj_fine, > I can post that as a separate patch (below) as Russell has merged v2 of these > patches into his delay branch. That said, I'd certainly like to know if this > is actually a real problem (and can't be solved by choosing a compromise value > as the initial loops_per_jiffy). I think Shinya was doing some tests so > I'll wait to see how those went. > I just tried the suggested change + two patches. I haven't gone through the complete discussion but quick test of your patches on OMAP5, shows me, the BOGOMIPS is completely wrong. Looks like you are not updating the 'loops_per_jiffy' in the calibration loop. What Am I missing ? Regards santosh
On Fri, Jul 13, 2012 at 01:04:27PM +0100, Shilimkar, Santosh wrote: > On Fri, Jul 13, 2012 at 4:43 PM, Will Deacon <will.deacon@arm.com> wrote: > > I was anticipating that the platform would set the initial loops_per_jiffy > > value if it requires udelays before loop calibration and the default of 4k > > is wildly off. > > > > If people need loops_per_jiffy to be updated at the same time as lpj_fine, > > I can post that as a separate patch (below) as Russell has merged v2 of these > > patches into his delay branch. That said, I'd certainly like to know if this > > is actually a real problem (and can't be solved by choosing a compromise value > > as the initial loops_per_jiffy). I think Shinya was doing some tests so > > I'll wait to see how those went. > > > I just tried the suggested change + two patches. > I haven't gone through the complete discussion but quick test of your patches > on OMAP5, shows me, the BOGOMIPS is completely wrong. Looks like > you are not updating the 'loops_per_jiffy' in the calibration loop. What is your bogomips reading and what frequency is your architected timer ticking at? loops_per_jiffy is updated by calibrate_delay in init/calibrate.c so we shouldn't need to worry about that (the diff I posted in my last email is just for udelay calls between switching to the timer and doing the calibration). > What Am I missing ? Well, your bogomips number will probably be *much* lower than before, so don't be surprised by that. Will
On Fri, Jul 13, 2012 at 5:38 PM, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Jul 13, 2012 at 01:04:27PM +0100, Shilimkar, Santosh wrote: >> On Fri, Jul 13, 2012 at 4:43 PM, Will Deacon <will.deacon@arm.com> wrote: >> > I was anticipating that the platform would set the initial loops_per_jiffy >> > value if it requires udelays before loop calibration and the default of 4k >> > is wildly off. >> > >> > If people need loops_per_jiffy to be updated at the same time as lpj_fine, >> > I can post that as a separate patch (below) as Russell has merged v2 of these >> > patches into his delay branch. That said, I'd certainly like to know if this >> > is actually a real problem (and can't be solved by choosing a compromise value >> > as the initial loops_per_jiffy). I think Shinya was doing some tests so >> > I'll wait to see how those went. >> > >> I just tried the suggested change + two patches. >> I haven't gone through the complete discussion but quick test of your patches >> on OMAP5, shows me, the BOGOMIPS is completely wrong. Looks like >> you are not updating the 'loops_per_jiffy' in the calibration loop. > > What is your bogomips reading and what frequency is your architected timer > ticking at? loops_per_jiffy is updated by calibrate_delay in > init/calibrate.c so we shouldn't need to worry about that (the diff I posted > in my last email is just for udelay calls between switching to the timer and > doing the calibration). > >> What Am I missing ? > > Well, your bogomips number will probably be *much* lower than before, so > don't be surprised by that. > Indeed !! This is what I observed. Can we not keep that behavior old way. BogoMIPS and CPU clock is often very easy to co-related and helpful for CPUFREQ stats. Regards Santosh
On Fri, Jul 13, 2012 at 01:14:59PM +0100, Shilimkar, Santosh wrote: > On Fri, Jul 13, 2012 at 5:38 PM, Will Deacon <will.deacon@arm.com> wrote: > > Well, your bogomips number will probably be *much* lower than before, so > > don't be surprised by that. > > > Indeed !! > This is what I observed. Can we not keep that behavior old way. BogoMIPS and > CPU clock is often very easy to co-related and helpful for CPUFREQ stats. I defer to David Miller, who had a similar reaction when he did this for sparc: http://www.spinics.net/lists/sparclinux/msg08550.html Will
On Fri, Jul 13, 2012 at 5:53 PM, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Jul 13, 2012 at 01:14:59PM +0100, Shilimkar, Santosh wrote: >> On Fri, Jul 13, 2012 at 5:38 PM, Will Deacon <will.deacon@arm.com> wrote: >> > Well, your bogomips number will probably be *much* lower than before, so >> > don't be surprised by that. >> > >> Indeed !! >> This is what I observed. Can we not keep that behavior old way. BogoMIPS and >> CPU clock is often very easy to co-related and helpful for CPUFREQ stats. > > I defer to David Miller, who had a similar reaction when he did this for > sparc: > > http://www.spinics.net/lists/sparclinux/msg08550.html > Fair enough. Anybody who sees this BOGOMIPS change will get similar reaction. I am fine with it now after knowing the reason. Thought was just to avoid similar question in future which you might have to answer :-) Regards Santosh
Will, Stephen and Santosh, On 7/13/2012 8:13 PM, Will Deacon wrote: > I was anticipating that the platform would set the initial loops_per_jiffy > value if it requires udelays before loop calibration and the default of 4k > is wildly off. I overlooked two different lpj setups were involved at hand. First one was, the initial loops_per_jiffy value of 4k was too small for almost all processors running Linux today, so I set up loops_per_jiffy _early_, calculated from the CPU clock speed. I didn't mentioned this before, sorry for confusion. So my initial loops_per_jiffy is not 4k at this point. It's optimized for loop-based delay with the CPU running at 1.2GHz (much bigger than default 4k). And later, init_current_timer_delay() got processed. Actual udelay() behavior switched from loop-based delay to timer-based one immediately, while my loops_per_jiffy was not changed/updated to appropriate value. This is why my udelay()s, used after init_current_timer_delay(), were taking considerable long time to expire. Note that my initial tests for Will's patchset was done using a loadable module dedicated for udelay tests, that was prepared for 2.6.35/3.0 kernels beforehand. And this time, I confirmed that updating loops_per_jiffy at the same time as lpj_fine, works perfectly as expected for me. > If people need loops_per_jiffy to be updated at the same time as lpj_fine, > I can post that as a separate patch (below) as Russell has merged v2 of these > patches into his delay branch. That said, I'd certainly like to know if this > is actually a real problem (and can't be solved by choosing a compromise value > as the initial loops_per_jiffy). I think Shinya was doing some tests so > I'll wait to see how those went. From my observations: (1) loops_per_jiffy can easily be calculated from the CPU clock speed. If your platform is capable of detecting CPU frequency at run-time, settingi up loops_per_jiffy _early_ can allow you early use of udelay()s. Or even if you don't need udelay() early, setting up lpj_fine (or having calibrate_delay_is_known()) allows you to skip calibrate_delay() later. This is useful and can be applied to both UP and SMP systems. (2) For SMP platforms, if you need ealy use of udelay(), you have to update loops_per_jiffy at the same time as init_current_timer_delay(). It could be done in init_current_timer_delay(), or platforms can take care of that, that need udelay() available early. Either one should be fine with me. -- Shinya Kuribayashi Renesas Electronics
On Tue, Jul 17, 2012 at 8:40 AM, Shinya Kuribayashi <shinya.kuribayashi.px@renesas.com> wrote: > Will, Stephen and Santosh, > > On 7/13/2012 8:13 PM, Will Deacon wrote: >> I was anticipating that the platform would set the initial loops_per_jiffy >> value if it requires udelays before loop calibration and the default of 4k >> is wildly off. > > I overlooked two different lpj setups were involved at hand. > > First one was, the initial loops_per_jiffy value of 4k was too small for > almost all processors running Linux today, so I set up loops_per_jiffy > _early_, calculated from the CPU clock speed. I didn't mentioned this > before, sorry for confusion. > > So my initial loops_per_jiffy is not 4k at this point. It's optimized > for loop-based delay with the CPU running at 1.2GHz (much bigger than > default 4k). > > And later, init_current_timer_delay() got processed. Actual udelay() > behavior switched from loop-based delay to timer-based one immediately, > while my loops_per_jiffy was not changed/updated to appropriate value. > > This is why my udelay()s, used after init_current_timer_delay(), were > taking considerable long time to expire. Note that my initial tests > for Will's patchset was done using a loadable module dedicated for > udelay tests, that was prepared for 2.6.35/3.0 kernels beforehand. > > And this time, I confirmed that updating loops_per_jiffy at the same > time as lpj_fine, works perfectly as expected for me. > >> If people need loops_per_jiffy to be updated at the same time as lpj_fine, >> I can post that as a separate patch (below) as Russell has merged v2 of these >> patches into his delay branch. That said, I'd certainly like to know if this >> is actually a real problem (and can't be solved by choosing a compromise value >> as the initial loops_per_jiffy). I think Shinya was doing some tests so >> I'll wait to see how those went. > > From my observations: > > (1) loops_per_jiffy can easily be calculated from the CPU clock speed. > If your platform is capable of detecting CPU frequency at run-time, > settingi up loops_per_jiffy _early_ can allow you early use of udelay()s. > > Or even if you don't need udelay() early, setting up lpj_fine (or having > calibrate_delay_is_known()) allows you to skip calibrate_delay() later. > This is useful and can be applied to both UP and SMP systems. > > (2) For SMP platforms, if you need ealy use of udelay(), you have to > update loops_per_jiffy at the same time as init_current_timer_delay(). > It could be done in init_current_timer_delay(), or platforms can take > care of that, that need udelay() available early. Either one should be > fine with me. Thanks for the detailed explanation. CPU clock detection is indeed the nit way to skip the calibration overhead and this was one of the comment when I tried to push the skipping of calibration for secondary CPUs. Looks like you have a working patch for the clock detection. Will you able to post that patch so that this long pending calibration for secondary CPUs gets optimized. Regards Santosh
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c index d6dacc6..395d5fb 100644 --- a/arch/arm/lib/delay.c +++ b/arch/arm/lib/delay.c @@ -59,6 +59,7 @@ void __init init_current_timer_delay(unsigned long freq) { pr_info("Switching to timer-based delay loop\n"); lpj_fine = freq / HZ; + loops_per_jiffy = lpj_fine; arm_delay_ops.delay = __timer_delay; arm_delay_ops.const_udelay = __timer_const_udelay; arm_delay_ops.udelay = __timer_udelay;