diff mbox

Weird kernel dumps seen on 437-gp on linux-next

Message ID 5706804B.8020608@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Grygorii Strashko April 7, 2016, 3:44 p.m. UTC
On 04/07/2016 06:24 PM, Nishanth Menon wrote:
> On 04/07/2016 10:17 AM, Franklin S Cooper Jr. wrote:
>> Hi All,
>>
>> While testing patches to fix NFS not working on linux-next I ran into to
>> a bunch of kernel dumps when booting my 437-gp evm via NFS.
>>
>> Bootlog: http://pastebin.ubuntu.com/15670741/
>> Another Bootlog: http://pastebin.ubuntu.com/15671302/
>>
>> I'm pretty sure it isn't related to the NFS issue. I dropped the local
>> NFS fixes and reverted the patches that originally caused the NFS
>> problem and I'm still able to see these kernel dumps. Now this doesn't
>> occur all the time. Originally I was able to see this issue 1 out of 5
>> times but recently it takes over 10 times for this problem to occur.
>>
>> Also I am unable to reproduce this issue on my AM335x GP evm. I've
>> rebooted the board 10 times and haven't seen any kernel dumps.
>>
> https://github.com/nmenon/kernel-test-logs/blob/next-20160406/multi_v7_defconfig/am43xx-gpevm.txt
> looked good to me at least.. but then it was reported as intermittent..
> 
> Could be
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/381174.html
> 
> 

The guilty patch is 55ee701 "arm: omap2: board-generic: use omap4_local_timer_init for AM437x"
which changes clocksource from gptimer to 32k_counter.

You see this "INFO: rcu_sched self-detected stall on CPU", because sometime during boot
wrong value is read from 32k_counter, which cause invalid jiffies calculation.

Personally, I think that it could be errata:
am437x: "Advisory 22 Watchdog Timers: Delay Needed to Read Some WDTimer Registers After Wakeup"
ompa4460: i103 "1.1 Delay needed to read some GP timer, WD timer and sync timer registers after wakeup"


also, you can check it with below diff:
[4294938600] ============= cycle_now19816771 delta4294959581 tkr->cycle_last19824486 jiffies4294938600 12E6143
[4294938600] ============= cycle_now19817092 delta4294959902 tkr->cycle_last19824486 jiffies4294938600 12E6284
[4294938600] ============= cycle_now19817413 delta4294960223 tkr->cycle_last19824486 jiffies4294938600 12E63C5
^^^ good jiffies
[13078482] BUG: ================================== now19817734 cycle_last4314784734 real_tk19824486
^^^ bad jiffies <-- 7-8 second of kernel boot
[13078482] INFO: rcu_preempt self-detected stall on CPU
diff mbox

Patch

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284..0858077 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1053,10 +1053,9 @@  static size_t print_time(u64 ts, char *buf)
        rem_nsec = do_div(ts, 1000000000);
 
        if (!buf)
-               return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
+               return snprintf(NULL, 0, "[%5lu] ", (unsigned long)jiffies);
 
-       return sprintf(buf, "[%5lu.%06lu] ",
-                      (unsigned long)ts, rem_nsec / 1000);
+       return sprintf(buf, "[%lu] ", (unsigned long)jiffies);
 }

with above change you could see smth. like this in log: