Message ID | CACVXFVPjo2dQpYPpqZS8GC8UPNzB5HLURpYm6-g1FOBbSUjDyA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 17, 2013 at 10:16 AM, Ming Lei <ming.lei@canonical.com> wrote: > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c > index f219703..89bc3a4 100644 > --- a/arch/arm/kernel/process.c > +++ b/arch/arm/kernel/process.c > @@ -225,7 +225,7 @@ void __show_regs(struct pt_regs *regs) > unsigned long flags; > char buf[64]; > > - show_regs_print_info(KERN_DEFAULT); > + //show_regs_print_info(KERN_DEFAULT); Forget to mention, __show_regs() isn't run under both cases. Thanks, -- Ming Lei
On Fri, May 17, 2013 at 03:16:40AM +0100, Ming Lei wrote: > Hi, Hello, > The commit a43cb95d5(dump_stack: unify debug information printed by show_regs()) > caused ARM perf regression, then 'perf top' outputs mistakenly, see > [1]. The correct > output should be [2], which can be got after reverting the commit or > doing it partly > by the one line change below: > > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c > index f219703..89bc3a4 100644 > --- a/arch/arm/kernel/process.c > +++ b/arch/arm/kernel/process.c > @@ -225,7 +225,7 @@ void __show_regs(struct pt_regs *regs) > unsigned long flags; > char buf[64]; > > - show_regs_print_info(KERN_DEFAULT); > + //show_regs_print_info(KERN_DEFAULT); > > > Looks a bit weird, anyone can give a hint? So it's still the morning and I haven't had my coffee yet, but I'm really struggling to see what you're getting at. Why does this have anything to do with perf? > [1], 'perf top' mistaken output > Samples: 17K of event 'cpu-clock', Event count (approx.): 3516532661 > 97.51% [smsc95xx] [k] 0x013645b8 > 0.21% libc-2.15.so [.] strstr > 0.14% libc-2.15.so [.] strchr > 0.12% libc-2.15.so [.] strcmp [...] > [2], 'perf top' correct output > Samples: 46K of event 'cpu-clock', Event count (approx.): 937128704 > 96.44% [kernel] [k] cpuidle_enter_state > 0.19% libc-2.15.so [.] strstr > 0.16% [kernel] [k] kallsyms_expand_symbol.clone.0 > 0.13% [kernel] [k] _raw_spin_unlock_irq [...] Are you saying that the profile you're seeing is radically different, rather than there being some formatting error that I can't spot? If so, that sounds really strange and I can't see how the patch you mention is to blame... If we want to persue this, I guess other obvious questions are: which kernel are you running? Does this affect multiple architectures (your diff only changes ARM)? What's the workload which you are profiling? Will
On Fri, May 17, 2013 at 4:55 PM, Will Deacon <will.deacon@arm.com> wrote: > > So it's still the morning and I haven't had my coffee yet, but I'm really > struggling to see what you're getting at. Why does this have anything to do > with perf? I don't know, and I just report it out, :-) I found the problem days ago, and until yesterday I had one more hours to git-bisect it, but the result is very frustrated. > >> [1], 'perf top' mistaken output >> Samples: 17K of event 'cpu-clock', Event count (approx.): 3516532661 >> 97.51% [smsc95xx] [k] 0x013645b8 >> 0.21% libc-2.15.so [.] strstr >> 0.14% libc-2.15.so [.] strchr >> 0.12% libc-2.15.so [.] strcmp > > [...] > >> [2], 'perf top' correct output >> Samples: 46K of event 'cpu-clock', Event count (approx.): 937128704 >> 96.44% [kernel] [k] cpuidle_enter_state >> 0.19% libc-2.15.so [.] strstr >> 0.16% [kernel] [k] kallsyms_expand_symbol.clone.0 >> 0.13% [kernel] [k] _raw_spin_unlock_irq > > [...] > > Are you saying that the profile you're seeing is radically different, > rather than there being some formatting error that I can't spot? If so, that Yes. > sounds really strange and I can't see how the patch you mention is to > blame... > > If we want to persue this, I guess other obvious questions are: which kernel > are you running? Does this affect multiple architectures (your diff only Either 3.10-rc1 or today's -next tree(3.10.0-rc1-next-20130516). > changes ARM)? What's the workload which you are profiling? I only tested it on Pandaboard, and no real workload, so you can see cpuidle_enter_state is the top frequent symbol. Or could anyone else try to verify the problem on their own environment? Thanks, -- Ming Lei
On Fri, May 17, 2013 at 10:27:05AM +0100, Ming Lei wrote: > On Fri, May 17, 2013 at 4:55 PM, Will Deacon <will.deacon@arm.com> wrote: > > > > So it's still the morning and I haven't had my coffee yet, but I'm really > > struggling to see what you're getting at. Why does this have anything to do > > with perf? > > I don't know, and I just report it out, :-) > > I found the problem days ago, and until yesterday I had one more hours to > git-bisect it, but the result is very frustrated. > > > > >> [1], 'perf top' mistaken output > >> Samples: 17K of event 'cpu-clock', Event count (approx.): 3516532661 > >> 97.51% [smsc95xx] [k] 0x013645b8 > >> 0.21% libc-2.15.so [.] strstr > >> 0.14% libc-2.15.so [.] strchr > >> 0.12% libc-2.15.so [.] strcmp > > > > [...] > > > >> [2], 'perf top' correct output > >> Samples: 46K of event 'cpu-clock', Event count (approx.): 937128704 > >> 96.44% [kernel] [k] cpuidle_enter_state > >> 0.19% libc-2.15.so [.] strstr > >> 0.16% [kernel] [k] kallsyms_expand_symbol.clone.0 > >> 0.13% [kernel] [k] _raw_spin_unlock_irq > > > > [...] > > > > Are you saying that the profile you're seeing is radically different, > > rather than there being some formatting error that I can't spot? If so, that > > Yes. > > > sounds really strange and I can't see how the patch you mention is to > > blame... > > > > If we want to persue this, I guess other obvious questions are: which kernel > > are you running? Does this affect multiple architectures (your diff only > > Either 3.10-rc1 or today's -next tree(3.10.0-rc1-next-20130516). > > > changes ARM)? What's the workload which you are profiling? > > I only tested it on Pandaboard, and no real workload, so you can see > cpuidle_enter_state is the top frequent symbol. It's probably easier if you choose a workload, otherwise it's difficult to see what is `correct' and what is broken. For example, your broken output seems to be in the smsc95xx driver, so assumedly there's a bunch of networking going on whereas your other output is in cpuidle_enter_state. We need an apples-for-apples comparison if you think there's a bug in the kernel. Can you try profiling hackbench or something? > Or could anyone else try to verify the problem on their own environment? How are you running perf top? Will
On Fri, May 17, 2013 at 5:36 PM, Will Deacon <will.deacon@arm.com> wrote: > > It's probably easier if you choose a workload, otherwise it's difficult to > see what is `correct' and what is broken. For example, your broken output > seems to be in the smsc95xx driver, so assumedly there's a bunch of > networking going on whereas your other output is in cpuidle_enter_state. If all modules are removed, the result will become OK. And if I only insert one module, the top sample will fall inside the only module, but I am sure nothing is working on the symbols of the module. > > We need an apples-for-apples comparison if you think there's a bug in the > kernel. Can you try profiling hackbench or something? > >> Or could anyone else try to verify the problem on their own environment? > > How are you running perf top? sudo perf top Thanks -- Ming Lei
On Fri, May 17, 2013 at 10:48:23AM +0100, Ming Lei wrote: > On Fri, May 17, 2013 at 5:36 PM, Will Deacon <will.deacon@arm.com> wrote: > > > > It's probably easier if you choose a workload, otherwise it's difficult to > > see what is `correct' and what is broken. For example, your broken output > > seems to be in the smsc95xx driver, so assumedly there's a bunch of > > networking going on whereas your other output is in cpuidle_enter_state. > > If all modules are removed, the result will become OK. And if I only insert one > module, the top sample will fall inside the only module, but I am sure > nothing is working on the symbols of the module. Can you check your dmesg please? I suppose the smsc95xx driver might be screaming about something and somehow the changes to the reg printing have made it get stuck. > > > > We need an apples-for-apples comparison if you think there's a bug in the > > kernel. Can you try profiling hackbench or something? > > > >> Or could anyone else try to verify the problem on their own environment? > > > > How are you running perf top? > > sudo perf top ... and did hackbench make any difference (i.e. is the profile more stable)? Will
On Fri, May 17, 2013 at 5:54 PM, Will Deacon <will.deacon@arm.com> wrote: > On Fri, May 17, 2013 at 10:48:23AM +0100, Ming Lei wrote: >> On Fri, May 17, 2013 at 5:36 PM, Will Deacon <will.deacon@arm.com> wrote: >> > >> > It's probably easier if you choose a workload, otherwise it's difficult to >> > see what is `correct' and what is broken. For example, your broken output >> > seems to be in the smsc95xx driver, so assumedly there's a bunch of >> > networking going on whereas your other output is in cpuidle_enter_state. >> >> If all modules are removed, the result will become OK. And if I only insert one >> module, the top sample will fall inside the only module, but I am sure >> nothing is working on the symbols of the module. > > Can you check your dmesg please? I suppose the smsc95xx driver might be > screaming about something and somehow the changes to the reg printing have > made it get stuck. If I insert only one useless module(such as, spi-altera.ko) on Pandaboard, the top sample will fall inside the module of spi-altera, so it is nothing to do with smsc95xx, also you can see the sampled top IP is an invalid address. > >> > >> > We need an apples-for-apples comparison if you think there's a bug in the >> > kernel. Can you try profiling hackbench or something? >> > >> >> Or could anyone else try to verify the problem on their own environment? >> > >> > How are you running perf top? >> >> sudo perf top > > ... and did hackbench make any difference (i.e. is the profile more stable)? When hackbench is started, there is no much change compared with idle, only another sample( 2.49% [unknown] [.] 0xb6f618a0) comes and becomes top 2. Thanks, -- Ming Lei
On Fri, May 17, 2013 at 11:07:37AM +0100, Ming Lei wrote: > On Fri, May 17, 2013 at 5:54 PM, Will Deacon <will.deacon@arm.com> wrote: > > On Fri, May 17, 2013 at 10:48:23AM +0100, Ming Lei wrote: > >> On Fri, May 17, 2013 at 5:36 PM, Will Deacon <will.deacon@arm.com> wrote: > >> > > >> > It's probably easier if you choose a workload, otherwise it's difficult to > >> > see what is `correct' and what is broken. For example, your broken output > >> > seems to be in the smsc95xx driver, so assumedly there's a bunch of > >> > networking going on whereas your other output is in cpuidle_enter_state. > >> > >> If all modules are removed, the result will become OK. And if I only insert one > >> module, the top sample will fall inside the only module, but I am sure > >> nothing is working on the symbols of the module. Ok, I tried to reproduce this on my TC2 with 3.10-rc1 and perf top looks fine. I also tried loading your spi-altera module and it made no difference. I'm using a fairly old perf tool (reports its version as 3.5.5). Will
On Fri, May 17, 2013 at 11:59 PM, Will Deacon <will.deacon@arm.com> wrote: > > Ok, I tried to reproduce this on my TC2 with 3.10-rc1 and perf top looks > fine. I also tried loading your spi-altera module and it made no difference. > > I'm using a fairly old perf tool (reports its version as 3.5.5). It isn't related with perf tool. Finally, looks the problem disappeared after upgrading gcc to "gcc version 4.7.3 (Sourcery CodeBench Lite 2013.05-24)" from "gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50)", so it should be caused by compiler. Thanks, -- Ming Lei
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index f219703..89bc3a4 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -225,7 +225,7 @@ void __show_regs(struct pt_regs *regs) unsigned long flags; char buf[64]; - show_regs_print_info(KERN_DEFAULT); + //show_regs_print_info(KERN_DEFAULT); Looks a bit weird, anyone can give a hint?