Message ID | D24E577EA1E37B4788A3BFBC418F14A70760C04A@wetsrvex01.loepfe.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>From: Vellemans, Noel [mailto:Noel.Vellemans@visionBMS.com] >Sent: Donnerstag, 5. Oktober 2017 16:19 >Hello, > Hi, not sure if I can help on this, but as I did some testing myself I thought I should throw in my results as well. >DryIce , SRTC not working on imx53. ( kernel 4.x) ( same hardware running >older kernel versions.. means , rtc is working) > Is it only the kernel you are changing? I am asking because I had the impression that hwclock behaves different on Debian stretch (util-linux 2.29.2) and jessie (util-linux 2.25.2). I am saying impression because it seemed on jessie I would always get a response of hwclock, but on stretch never. When I did more systematic testing it looks like right after boot hwclock -r will always fail. But if I wait some minutes, all calls succeed. >... >QUICK analyses ( could be wrong) ? >It seems that hwclock is reading the current-timestamp 3 times and if not >changed in those 3 read cycles… it sets up an read-interrupt-abort able time >reader that should return as soon as the irq fires… but this seems to be >missing ! > I am seeing a lot of interrupts with kernel 4.14-rc4 (jessie and stretch), but they seem to be unhandled: root@CX9020:~# uname -a Linux CX9020 4.14.0-rc4+ #151 PREEMPT Wed Oct 11 10:40:34 CEST 2017 armv7l GNU/Linux root@CX9020:~# hwclock -D -r hwclock from util-linux 2.29.2 Using the /dev interface to the clock. Last drift adjustment done at 1490885082 seconds after 1969 Last calibration done at 1490885082 seconds after 1969 Hardware clock is on UTC time Assuming hardware clock is kept in UTC time. Waiting for clock tick... [ 26.795437] irq 40: nobody cared (try booting with the "irqpoll" option) [ 26.802696] handlers: [ 26.805031] [<c06029e8>] dryice_irq [ 26.808584] Disabling IRQ #40 select() to /dev/rtc to wait for clock tick timed out...synchronization failed root@CX9020:~# cat /proc/interrupts CPU0 17: 4276 tzic 1 Edge mmc0 18: 52 tzic 2 Edge mmc1 22: 0 tzic 6 Edge sdma 30: 176 tzic 14 Edge 53f80200.usb 40: 100000 tzic 24 Edge 53fa4000.srtc 48: 268 tzic 32 Edge 53fc0000.serial 55: 2288 tzic 39 Edge i.MX Timer Tick 74: 0 tzic 58 Edge 53f98000.wdog 79: 278 tzic 63 Edge 63fc4000.i2c 93: 0 tzic 77 Edge arm-pmu 103: 662 tzic 87 Edge 63fec000.ethernet 145: 0 gpio-mxc 1 Edge 50004000.esdhc cd 148: 0 gpio-mxc 4 Edge 50008000.esdhc cd 368: 674 IPU 23 Edge imx_drm 369: 0 IPU 28 Edge imx_drm Err: 0 I added some tracing to dryice_irq() and saw that most of the time (if not all the time) dsr == DSR_MCO /* monotonic clock overflow */ with dier vary between 0x110, 0x10 and even 0x0. I don't know what's the right thing to do, to recover from DSR_MCO. " return IRQ_HANDLED" will stop the nobody cared message but hwclock still times out. And for completeness: root@CX9020:~# dmesg | grep srtc [ 0.299043] imxdi_rtc 53fa4000.srtc: Unlocked unit detected [ 0.299539] imxdi_rtc 53fa4000.srtc: security violation interrupt not available. [ 0.299757] rtc rtc0: 53fa4000.srtc: dev (253:0) [ 0.299778] imxdi_rtc 53fa4000.srtc: rtc core: registered 53fa4000.srtc as rtc0 [ 0.436785] imxdi_rtc 53fa4000.srtc: setting system clock to 2017-10-12 06:38:08 UTC (1507790288) [ 445.486624] imxdi_rtc 53fa4000.srtc: Write-wait timeout val = 0x59df0f8e reg = 0x00000008 [ 3025.076612] imxdi_rtc 53fa4000.srtc: Write-wait timeout val = 0x59df19a2 reg = 0x00000008 [ 3082.316612] imxdi_rtc 53fa4000.srtc: Write-wait timeout val = 0x59df19db reg = 0x00000008 Novice question: Is hwclock still required these days? For me it looks like the kernel is synchronizing with rtc on it's own. Maybe some kernel config is incompatible with hwclock? Regards, Patrick Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff Registered office: Verl, Germany | Register court: Guetersloh HRA 7075
On Thu, Oct 12, 2017 at 07:50:41AM +0000, Patrick Brünn wrote: > Novice question: Is hwclock still required these days? For me it looks > like the kernel is synchronizing with rtc on it's own. Maybe some kernel > config is incompatible with hwclock? It depends on your application. If you want the kernel's idea of time to be wrong up to a second or two, then you can rely on the kernel's time setting. Please realise that the kernel has always set the time from the RTC, even on x86 where hwclock has been used. hwclock, however, has some advanced features and advantages that are beneficial if you're after accuracy. 1) hwclock will try to read the RTC as close to a second-change as possible, so that the time read from the RTC is as close to the second. 2) hwclock can measure and correct the RTC time for its own drift if hwclock has been allowed to capture and process the offset. What this means is that hwclock has the capability to precisely set the kernel's time at boot, way more accurately than the kernel does. The kernel's time setting is focused on speed, not accuracy. So, if your userspace application is to monitor something using a precise timestamp, and you are NTP synchronising (or other method) the time on the system, then you need the kernel's idea of time to be set much more precisely to avoid NTP making big corrections over the following three to six hours. This happens because NTP will slew the clock for a few seconds of difference, which makes storing and reloading the PPM value useless, and can also mean that in such a monitoring application, the results are unreliable until NTP has re-stabilised. Here's an example of an application where this may matter: average speed camera system. You have two cameras over a section of road, each with their own processing, which are NTP synchronised. Each reads the numberplate of passing vehicles using ANPR technology, and timestamps the passing of the vehicle using their local clock. The distance is known, so it's an easy calculation to calculate the vehicle speed. If the vehicle speed is over the limit, the driver is fined. Consider what the implications are if one of the systems rebooted and then had incorrect time (up to two seconds wrong) for up to six hours after - a two second error is about a 3% error in recorded speed. Would you want to be sent a speeding fine from such a system? (We have the first non-motorway road in Surrey, UK to have average speed cameras installed down its entire length because of "piston heads" who think the speed limit is 60mph rather than the sign- posted 40mph.) Another, probably more relevant application is a stratum-1 NTP server synchronised via PPS to a GPS. I wonder how many people are aware that if you reboot such a setup relying on the kernel's time setting only, the time sent to clients will be wrong while NTP slews the local clock. I've seen these effects locally, where rebooting exactly such a system results in slaved systems which have no other source of time also making big corrections.
On 12/10/2017 at 07:50:41 +0000, Patrick Brünn wrote: > I am seeing a lot of interrupts with kernel 4.14-rc4 (jessie and stretch), but they seem to be unhandled: > root@CX9020:~# uname -a > Linux CX9020 4.14.0-rc4+ #151 PREEMPT Wed Oct 11 10:40:34 CEST 2017 armv7l GNU/Linux > root@CX9020:~# hwclock -D -r > hwclock from util-linux 2.29.2 > Using the /dev interface to the clock. > Last drift adjustment done at 1490885082 seconds after 1969 > Last calibration done at 1490885082 seconds after 1969 > Hardware clock is on UTC time > Assuming hardware clock is kept in UTC time. > Waiting for clock tick... > [ 26.795437] irq 40: nobody cared (try booting with the "irqpoll" option) > [ 26.802696] handlers: > [ 26.805031] [<c06029e8>] dryice_irq > [ 26.808584] Disabling IRQ #40 > select() to /dev/rtc to wait for clock tick timed out...synchronization failed > root@CX9020:~# cat /proc/interrupts > CPU0 > 17: 4276 tzic 1 Edge mmc0 > 18: 52 tzic 2 Edge mmc1 > 22: 0 tzic 6 Edge sdma > 30: 176 tzic 14 Edge 53f80200.usb > 40: 100000 tzic 24 Edge 53fa4000.srtc > 48: 268 tzic 32 Edge 53fc0000.serial > 55: 2288 tzic 39 Edge i.MX Timer Tick > 74: 0 tzic 58 Edge 53f98000.wdog > 79: 278 tzic 63 Edge 63fc4000.i2c > 93: 0 tzic 77 Edge arm-pmu > 103: 662 tzic 87 Edge 63fec000.ethernet > 145: 0 gpio-mxc 1 Edge 50004000.esdhc cd > 148: 0 gpio-mxc 4 Edge 50008000.esdhc cd > 368: 674 IPU 23 Edge imx_drm > 369: 0 IPU 28 Edge imx_drm > Err: 0 > > I added some tracing to dryice_irq() and saw that most of the time (if not all the time) dsr == DSR_MCO /* monotonic clock overflow */ with dier vary between 0x110, 0x10 and even 0x0. > I don't know what's the right thing to do, to recover from DSR_MCO. " return IRQ_HANDLED" will stop the nobody cared message but hwclock still times out. > hwclock times out because the alarm is not working properly. Can you check whether rtctest is working?
>From: Alexandre Belloni [mailto:alexandre.belloni@free-electrons.com] >Sent: Donnerstag, 9. November 2017 04:03 > >hwclock times out because the alarm is not working properly. Can you >check whether rtctest is working? > You mean "tools/testing/selftests/timers/rtctest", right? I just run it and it's stuck at "Counting 5 update (1/sec) interrupts from reading /dev/rtc0:" Kernel log shows a new message for each try I restart rtctest. [ 1032.349562] imxdi_rtc 53fa4000.srtc: Write-wait timeout val = 0x5a042418 reg = 0x00000008" Regards, Patrick Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff Registered office: Verl, Germany | Register court: Guetersloh HRA 7075
+Cc Fabio On 09/11/2017 at 09:59:02 +0000, Patrick Brünn wrote: > > >From: Alexandre Belloni [mailto:alexandre.belloni@free-electrons.com] > >Sent: Donnerstag, 9. November 2017 04:03 > > > >hwclock times out because the alarm is not working properly. Can you > >check whether rtctest is working? > > > You mean "tools/testing/selftests/timers/rtctest", right? > > I just run it and it's stuck at "Counting 5 update (1/sec) interrupts from reading /dev/rtc0:" > So alarms or interrupts definitively don't work. > Kernel log shows a new message for each try I restart rtctest. > [ 1032.349562] imxdi_rtc 53fa4000.srtc: Write-wait timeout val = 0x5a042418 reg = 0x00000008" > That would explain it because failing to write DCAMR means that the alarm is not set to the proper value. But that doesn't explain all the spurious interrupts you are seeing. Fabio, is there any recent change in the platform code that would make reading/writing the rtc register fail?
diff --git a/arch/arm/boot/dts/imx53.dtsi b/arch/arm/boot/dts/imx53.dtsi index 2e516f4..8bf0d89 100644 --- a/arch/arm/boot/dts/imx53.dtsi +++ b/arch/arm/boot/dts/imx53.dtsi @@ -433,6 +433,15 @@ clock-names = "ipg", "per"; }; + srtc: srtc@53fa4000 { + compatible = "fsl,imx53-rtc", "fsl,imx25-rtc"; + reg = <0x53fa4000 0x4000>; + interrupts = <24>; + interrupt-parent = <&tzic>; + clocks = <&clks IMX5_CLK_SRTC_GATE>; + clock-names = "ipg"; + }; + Best Regards Noel