Message ID | 20160916054917.16930-1-briannorris@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Brian, On 16/09/16 06:49, Brian Norris wrote: > Since commit 4fbcdc813fb9 ("clocksource: arm_arch_timer: Use clocksource > for suspend timekeeping"), this driver assumes that the ARM architected > timer keeps running in suspend. This is not the case for some ARM SoCs, > depending on the HW state used for system suspend. Let's not assume that > all SoCs support this, and instead only support this if the device tree > explicitly tells us it's "always on". In all other cases, just fall back > to the RTC. This should be relatively harmless. I'm afraid you're confusing two things: - the counter, which *must* carry on counting no matter what, as (quoting the ARM ARM) "The system counter must be implemented in an always-on power domain" - the timer, which is allowed to be powered off, and can be tagged with the "always-on" property to indicate that it is guaranteed to stay up (which in practice only exists in virtual machines and never on real HW). If your counter does stop counting when suspended, then this is starting to either feel like a HW bug, or someone is killing the clock that feeds this counter when entering suspend. If this is the former, then we need a separate quirk to indicate the non-standard behaviour. If it is the latter, don't do it! ;-) > > It seems fair to key the system-suspend behavior off the same property > used for C3STOP, since if the timer doesn't keep context for CPU sleep, > it likely doesn't for system sleep either. > > Signed-off-by: Brian Norris <briannorris@chromium.org> > --- > drivers/clocksource/arm_arch_timer.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c > index 57700541f951..e28677a34f02 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -490,7 +490,7 @@ static struct clocksource clocksource_counter = { > .rating = 400, > .read = arch_counter_read, > .mask = CLOCKSOURCE_MASK(56), > - .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_SUSPEND_NONSTOP, > + .flags = CLOCK_SOURCE_IS_CONTINUOUS, > }; > > static struct cyclecounter cyclecounter = { > @@ -526,6 +526,8 @@ static void __init arch_counter_register(unsigned type) > clocksource_counter.name = "arch_mem_counter"; > } > > + if (!arch_timer_c3stop) > + clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP; > start_count = arch_timer_read_counter(); > clocksource_register_hz(&clocksource_counter, arch_timer_rate); > cyclecounter.mult = clocksource_counter.mult; > Given the above, I don't think this patch is acceptable. Thanks, M.
On 16/09/2016 10:06, Marc Zyngier wrote: > Hi Brian, > > On 16/09/16 06:49, Brian Norris wrote: >> Since commit 4fbcdc813fb9 ("clocksource: arm_arch_timer: Use clocksource >> for suspend timekeeping"), this driver assumes that the ARM architected >> timer keeps running in suspend. This is not the case for some ARM SoCs, >> depending on the HW state used for system suspend. Let's not assume that >> all SoCs support this, and instead only support this if the device tree >> explicitly tells us it's "always on". In all other cases, just fall back >> to the RTC. This should be relatively harmless. > > I'm afraid you're confusing two things: > - the counter, which *must* carry on counting no matter what, as > (quoting the ARM ARM) "The system counter must be implemented in an > always-on power domain" > - the timer, which is allowed to be powered off, and can be tagged with > the "always-on" property to indicate that it is guaranteed to stay up > (which in practice only exists in virtual machines and never on real HW). > > If your counter does stop counting when suspended, then this is starting > to either feel like a HW bug, or someone is killing the clock that feeds > this counter when entering suspend. > > If this is the former, then we need a separate quirk to indicate the > non-standard behaviour. If it is the latter, don't do it! ;-) +1 -- Daniel <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
On Fri, Sep 16, 2016 at 09:06:55AM +0100, Marc Zyngier wrote: > Hi Brian, Hi Marc, Thanks for the quick response. > On 16/09/16 06:49, Brian Norris wrote: > > Since commit 4fbcdc813fb9 ("clocksource: arm_arch_timer: Use clocksource > > for suspend timekeeping"), this driver assumes that the ARM architected > > timer keeps running in suspend. This is not the case for some ARM SoCs, > > depending on the HW state used for system suspend. Let's not assume that > > all SoCs support this, and instead only support this if the device tree > > explicitly tells us it's "always on". In all other cases, just fall back > > to the RTC. This should be relatively harmless. > > I'm afraid you're confusing two things: > - the counter, which *must* carry on counting no matter what, as > (quoting the ARM ARM) "The system counter must be implemented in an > always-on power domain" > - the timer, which is allowed to be powered off, and can be tagged with > the "always-on" property to indicate that it is guaranteed to stay up > (which in practice only exists in virtual machines and never on real HW). Indeed, sorry for that confusion, and thanks for the explanations. > If your counter does stop counting when suspended, then this is starting > to either feel like a HW bug, or someone is killing the clock that feeds > this counter when entering suspend. > > If this is the former, then we need a separate quirk to indicate the > non-standard behaviour. If it is the latter, don't do it! ;-) It's beginning to seem more like a HW quirk which yields nonstandard behavior. AIUI, this SoC normally runs the counter off its 24 MHz clock, but for low power modes, this "always-on" domain switches over to a 32 KHz alternative clock. Unfortunately, the counter doesn't actually tick when run this way. I'm trying to confirm with the chip designers (Rockchip, RK3399) about the nature of the quirk, but I think we'll need a separate DT flag for this behavior. Brian
On 20/09/16 00:14, Brian Norris wrote: > On Fri, Sep 16, 2016 at 09:06:55AM +0100, Marc Zyngier wrote: >> Hi Brian, > > Hi Marc, > > Thanks for the quick response. > >> On 16/09/16 06:49, Brian Norris wrote: >>> Since commit 4fbcdc813fb9 ("clocksource: arm_arch_timer: Use clocksource >>> for suspend timekeeping"), this driver assumes that the ARM architected >>> timer keeps running in suspend. This is not the case for some ARM SoCs, >>> depending on the HW state used for system suspend. Let's not assume that >>> all SoCs support this, and instead only support this if the device tree >>> explicitly tells us it's "always on". In all other cases, just fall back >>> to the RTC. This should be relatively harmless. >> >> I'm afraid you're confusing two things: >> - the counter, which *must* carry on counting no matter what, as >> (quoting the ARM ARM) "The system counter must be implemented in an >> always-on power domain" >> - the timer, which is allowed to be powered off, and can be tagged with >> the "always-on" property to indicate that it is guaranteed to stay up >> (which in practice only exists in virtual machines and never on real HW). > > Indeed, sorry for that confusion, and thanks for the explanations. > >> If your counter does stop counting when suspended, then this is starting >> to either feel like a HW bug, or someone is killing the clock that feeds >> this counter when entering suspend. >> >> If this is the former, then we need a separate quirk to indicate the >> non-standard behaviour. If it is the latter, don't do it! ;-) > > It's beginning to seem more like a HW quirk which yields nonstandard > behavior. AIUI, this SoC normally runs the counter off its 24 MHz clock, > but for low power modes, this "always-on" domain switches over to a 32 > KHz alternative clock. Unfortunately, the counter doesn't actually tick > when run this way. I'm trying to confirm with the chip designers > (Rockchip, RK3399) about the nature of the quirk, but I think we'll need > a separate DT flag for this behavior. The counter is allowed to be clocked at a different rate, as long as it is incremented by the frequency ratio on each tick of the new frequency. In your case, the counter should increment by 750 on each tick of the 32kHz clock. If the rk3399 implementation doesn't do this, then this is a bug, and we need a quirk to work around it. Note that such a quirk will have some other impacts, such as the gettimeofday implementation in the VDSO (which relies on the counter making forward progress). There could be other issues in the timer subsystem as well... This doesn't look like a pleasant thing to fix. Thanks, M.
Hi Marc, Thanks again for the help. I was checking with Rockchip on the details. On Tue, Sep 20, 2016 at 08:47:07AM +0100, Marc Zyngier wrote: > The counter is allowed to be clocked at a different rate, as long as it > is incremented by the frequency ratio on each tick of the new frequency. > In your case, the counter should increment by 750 on each tick of the > 32kHz clock. If the rk3399 implementation doesn't do this, then this is > a bug, and we need a quirk to work around it. I had hope that we could find a switch that would do the above for rk3399, since other parts of the system (e.g., the PMU itself) support switching from the 24MHz to 32KHz clock, but Rockchip confirmed that it is indeed a HW quirk that the arch timer's counter does not support clocking out ticks based on the 32KHz clock. So I'm planning to send a v2 that adds a "arm,no-tick-in-suspend" property. <Begin side note> rk3288 (ARMv7 system widely used for our Chromebooks) has the same issue, except the kernel we're using for production (based on v3.14) doesn't have the following commit, which stopped utilizing the RTC: commit 0fa88cb4b82b5cf7429bc1cef9db006ca035754e Author: Xunlei Pang <pang.xunlei@linaro.org> Date: Wed Apr 1 20:34:38 2015 -0700 time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource And any mainline testing on rk3288 doesn't see the problem, because mainline doesn't support its lowest-power sleep modes well enough (see ROCKCHIP_ARM_OFF_LOGIC_DEEP in arch/arm/mach-rockchip/pm.c). <End side note> > Note that such a quirk will have some other impacts, such as the > gettimeofday implementation in the VDSO (which relies on the counter > making forward progress). There could be other issues in the timer > subsystem as well... This doesn't look like a pleasant thing to fix. How sure are you of these problems? I'm a bit new to the kernel timekeeping subsystem, but doesn't this kind of code already have to handle time adjustments like this when reprogramming the system time (settimeofday())? And might we be covered for the suspend/resume case when we allow the kernel to fall back to the RTC instead, which adjusts the sleep delta with timekeeping_inject_sleeptime64()? And (weaker evidence here) we haven't seen problems on rk3288 so far, at least without the above referenced rtc commit 0fa88cb4b82. But admittedly there are some differences between arch/{arm,arm64}/. Regards, Brian
On Tue, 27 Sep 2016 18:23:11 -0700 Brian Norris <briannorris@chromium.org> wrote: Hi Brian, > Hi Marc, > > Thanks again for the help. I was checking with Rockchip on the details. > > On Tue, Sep 20, 2016 at 08:47:07AM +0100, Marc Zyngier wrote: > > The counter is allowed to be clocked at a different rate, as long as it > > is incremented by the frequency ratio on each tick of the new frequency. > > In your case, the counter should increment by 750 on each tick of the > > 32kHz clock. If the rk3399 implementation doesn't do this, then this is > > a bug, and we need a quirk to work around it. > > I had hope that we could find a switch that would do the above for > rk3399, since other parts of the system (e.g., the PMU itself) support > switching from the 24MHz to 32KHz clock, but Rockchip confirmed that it > is indeed a HW quirk that the arch timer's counter does not support > clocking out ticks based on the 32KHz clock. So I'm planning to send a > v2 that adds a "arm,no-tick-in-suspend" property. Fair enough. > > <Begin side note> > rk3288 (ARMv7 system widely used for our Chromebooks) has the same > issue, except the kernel we're using for production (based on v3.14) > doesn't have the following commit, which stopped utilizing the RTC: > > commit 0fa88cb4b82b5cf7429bc1cef9db006ca035754e > Author: Xunlei Pang <pang.xunlei@linaro.org> > Date: Wed Apr 1 20:34:38 2015 -0700 > > time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource > > And any mainline testing on rk3288 doesn't see the problem, because > mainline doesn't support its lowest-power sleep modes well enough (see > ROCKCHIP_ARM_OFF_LOGIC_DEEP in arch/arm/mach-rockchip/pm.c). Arghh... So even my favourite Chromebook (from which I'm typing this email) is affected? Not very nice... > <End side note> > > > Note that such a quirk will have some other impacts, such as the > > gettimeofday implementation in the VDSO (which relies on the counter > > making forward progress). There could be other issues in the timer > > subsystem as well... This doesn't look like a pleasant thing to fix. > > How sure are you of these problems? I'm a bit new to the kernel > timekeeping subsystem, but doesn't this kind of code already have to > handle time adjustments like this when reprogramming the system time > (settimeofday())? And might we be covered for the suspend/resume case > when we allow the kernel to fall back to the RTC instead, which adjusts > the sleep delta with timekeeping_inject_sleeptime64()? And (weaker > evidence here) we haven't seen problems on rk3288 so far, at least > without the above referenced rtc commit 0fa88cb4b82. But admittedly > there are some differences between arch/{arm,arm64}/. The 32bit port only gained a VDSO recently (3.14 doesn't have it), and mainline doesn't switch the counter off, as you noted above. As for the 64bit kernel, it would be interesting to verify that on resume, the VDSO does return the right (corrected) value, and not something stale. Thanks, M.
Hi Marc, On Thu, Sep 29, 2016 at 05:08:47PM +0100, Marc Zyngier wrote: > On Tue, 27 Sep 2016 18:23:11 -0700 > Brian Norris <briannorris@chromium.org> wrote: > > On Tue, Sep 20, 2016 at 08:47:07AM +0100, Marc Zyngier wrote: > > <Begin side note> > > rk3288 (ARMv7 system widely used for our Chromebooks) has the same > > issue, except the kernel we're using for production (based on v3.14) > > doesn't have the following commit, which stopped utilizing the RTC: > > > > commit 0fa88cb4b82b5cf7429bc1cef9db006ca035754e > > Author: Xunlei Pang <pang.xunlei@linaro.org> > > Date: Wed Apr 1 20:34:38 2015 -0700 > > > > time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource > > > > And any mainline testing on rk3288 doesn't see the problem, because > > mainline doesn't support its lowest-power sleep modes well enough (see > > ROCKCHIP_ARM_OFF_LOGIC_DEEP in arch/arm/mach-rockchip/pm.c). > > Arghh... So even my favourite Chromebook (from which I'm typing this > email) is affected? Not very nice... Yep. But if you're running mainline, you just get to have high S3 power consumption instead! > > <End side note> > As for the 64bit kernel, it would be interesting to verify that on > resume, the VDSO does return the right (corrected) value, and not > something stale. It would be interesting, except all my current user spaces are built for 32-bit, so it's not too easy for me to test. Perhaps I could pull in this [1]. (On the bright side, this means that VDSO can't possibly be breaking on my systems!) Brian [1] http://www.spinics.net/lists/arm-kernel/msg530185.html
On 10/04, Brian Norris wrote: > Hi Marc, > > On Thu, Sep 29, 2016 at 05:08:47PM +0100, Marc Zyngier wrote: > > On Tue, 27 Sep 2016 18:23:11 -0700 > > Brian Norris <briannorris@chromium.org> wrote: > > > On Tue, Sep 20, 2016 at 08:47:07AM +0100, Marc Zyngier wrote: > > > <Begin side note> > > > rk3288 (ARMv7 system widely used for our Chromebooks) has the same > > > issue, except the kernel we're using for production (based on v3.14) > > > doesn't have the following commit, which stopped utilizing the RTC: > > > > > > commit 0fa88cb4b82b5cf7429bc1cef9db006ca035754e > > > Author: Xunlei Pang <pang.xunlei@linaro.org> > > > Date: Wed Apr 1 20:34:38 2015 -0700 > > > > > > time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource > > > > > > And any mainline testing on rk3288 doesn't see the problem, because > > > mainline doesn't support its lowest-power sleep modes well enough (see > > > ROCKCHIP_ARM_OFF_LOGIC_DEEP in arch/arm/mach-rockchip/pm.c). > > > > Arghh... So even my favourite Chromebook (from which I'm typing this > > email) is affected? Not very nice... > > Yep. But if you're running mainline, you just get to have high S3 power > consumption instead! Just curious, do we enter this state during cpuidle as well? Or is it only across suspend that the clock stops working?
On Tue, Oct 18, 2016 at 06:24:41PM -0700, Stephen Boyd wrote: > Just curious, do we enter this state during cpuidle as well? Or > is it only across suspend that the clock stops working? I believe we do not on either rk3288 or rk3399. We'd have to be powering off almost the entire system before we'd be able to gate the 24 MHz oscillator, AIUI. Brian
On 10/18/2016 06:36 PM, Brian Norris wrote: > I believe we do not on either rk3288 or rk3399. We'd have to be powering > off almost the entire system before we'd be able to gate the 24 MHz > oscillator, AIUI. > Great! That avoids a major headache.
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index 57700541f951..e28677a34f02 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -490,7 +490,7 @@ static struct clocksource clocksource_counter = { .rating = 400, .read = arch_counter_read, .mask = CLOCKSOURCE_MASK(56), - .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_SUSPEND_NONSTOP, + .flags = CLOCK_SOURCE_IS_CONTINUOUS, }; static struct cyclecounter cyclecounter = { @@ -526,6 +526,8 @@ static void __init arch_counter_register(unsigned type) clocksource_counter.name = "arch_mem_counter"; } + if (!arch_timer_c3stop) + clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP; start_count = arch_timer_read_counter(); clocksource_register_hz(&clocksource_counter, arch_timer_rate); cyclecounter.mult = clocksource_counter.mult;
Since commit 4fbcdc813fb9 ("clocksource: arm_arch_timer: Use clocksource for suspend timekeeping"), this driver assumes that the ARM architected timer keeps running in suspend. This is not the case for some ARM SoCs, depending on the HW state used for system suspend. Let's not assume that all SoCs support this, and instead only support this if the device tree explicitly tells us it's "always on". In all other cases, just fall back to the RTC. This should be relatively harmless. It seems fair to key the system-suspend behavior off the same property used for C3STOP, since if the timer doesn't keep context for CPU sleep, it likely doesn't for system sleep either. Signed-off-by: Brian Norris <briannorris@chromium.org> --- drivers/clocksource/arm_arch_timer.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)