Message ID | 1347450610-4342-1-git-send-email-rogerq@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Sep 12, 2012 at 02:50:10PM +0300, Roger Quadros wrote: > gets rid of below messages with CONFIG_DEBUG_PREEMPT enabled > > [ 28.832916] debug_smp_processor_id: 18 callbacks suppressed > [ 28.832946] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/1763 > [ 28.841491] caller is pwrdm_set_next_pwrst+0x54/0x120 > > Tested with perf on OMAP4 Panda. NAK. Using a different function which doesn't have the warning isn't a subsitute for fixing the problem properly. What you're doing is papering over the bug, making the warning go away without properly understanding the problem. The warning is there because something is being done wrong. What that is is exactly what is being said in the warning message. You're getting a CPU number in a context where preemptions can occur - and therefore the CPU that you're running on can change. Think about this sequence: - cpu = smp_processor_id(); /* returns 0 */ - you are preempted - you resume on CPU 1 - trace_clock_disable(clk->name, 0, 0); If trace_clock_disable() uses the CPU number to access per-CPU data without locking, that's going to cause corruption. Please, if you're going to fix a warning, analyse it properly first and don't just search for a function which appears to give you the same functionality but without the warning message.
On 09/12/12 05:05, Russell King - ARM Linux wrote: > NAK. Using a different function which doesn't have the warning isn't a > subsitute for fixing the problem properly. What you're doing is papering > over the bug, making the warning go away without properly understanding > the problem. > > The warning is there because something is being done wrong. What that is > is exactly what is being said in the warning message. You're getting a > CPU number in a context where preemptions can occur - and therefore the > CPU that you're running on can change. > > Think about this sequence: > > - cpu = smp_processor_id(); /* returns 0 */ > - you are preempted > - you resume on CPU 1 > - trace_clock_disable(clk->name, 0, 0); > > If trace_clock_disable() uses the CPU number to access per-CPU data > without locking, that's going to cause corruption. > > Please, if you're going to fix a warning, analyse it properly first and > don't just search for a function which appears to give you the same > functionality but without the warning message. Is anyone actually using the CPU field in this tracepoint? I don't see any usecase for it except for the case where you have a percpu clock, but even then I imagine the name of the clock would be different depending on which CPU it corresponds to. So why can't we just remove the CPU field from the tracepoint?
On Wed, Sep 12, 2012 at 11:44:29AM -0700, Stephen Boyd wrote: > On 09/12/12 05:05, Russell King - ARM Linux wrote: > > NAK. Using a different function which doesn't have the warning isn't a > > subsitute for fixing the problem properly. What you're doing is papering > > over the bug, making the warning go away without properly understanding > > the problem. > > > > The warning is there because something is being done wrong. What that is > > is exactly what is being said in the warning message. You're getting a > > CPU number in a context where preemptions can occur - and therefore the > > CPU that you're running on can change. > > > > Think about this sequence: > > > > - cpu = smp_processor_id(); /* returns 0 */ > > - you are preempted > > - you resume on CPU 1 > > - trace_clock_disable(clk->name, 0, 0); > > > > If trace_clock_disable() uses the CPU number to access per-CPU data > > without locking, that's going to cause corruption. > > > > Please, if you're going to fix a warning, analyse it properly first and > > don't just search for a function which appears to give you the same > > functionality but without the warning message. > > Is anyone actually using the CPU field in this tracepoint? No idea, but for the point I'm raising, that's rather irrelevant... we need to nip these things in the bud before they become more common through allowing the misunderstanding to propagate. > I don't see > any usecase for it except for the case where you have a percpu clock, Which actually makes it all the more important to get the right CPU number, and to have atomicity between reading the CPU number and reading the percpu clock... without that the results may well be meaningless. If the use of a CPU number can be eliminated, then it should be. If not, the proper fix for this is to use interfaces which prevent pre-emption occuring during the critical region - so get_cpu() and put_cpu().
diff --git a/arch/arm/mach-omap2/clock.c b/arch/arm/mach-omap2/clock.c index ea3f565..2765acd 100644 --- a/arch/arm/mach-omap2/clock.c +++ b/arch/arm/mach-omap2/clock.c @@ -285,7 +285,7 @@ void omap2_clk_disable(struct clk *clk) pr_debug("clock: %s: disabling in hardware\n", clk->name); if (clk->ops && clk->ops->disable) { - trace_clock_disable(clk->name, 0, smp_processor_id()); + trace_clock_disable(clk->name, 0, raw_smp_processor_id()); clk->ops->disable(clk); } @@ -339,7 +339,7 @@ int omap2_clk_enable(struct clk *clk) } if (clk->ops && clk->ops->enable) { - trace_clock_enable(clk->name, 1, smp_processor_id()); + trace_clock_enable(clk->name, 1, raw_smp_processor_id()); ret = clk->ops->enable(clk); if (ret) { WARN(1, "clock: %s: could not enable: %d\n", @@ -380,7 +380,7 @@ int omap2_clk_set_rate(struct clk *clk, unsigned long rate) /* dpll_ck, core_ck, virt_prcm_set; plus all clksel clocks */ if (clk->set_rate) { - trace_clock_set_rate(clk->name, rate, smp_processor_id()); + trace_clock_set_rate(clk->name, rate, raw_smp_processor_id()); ret = clk->set_rate(clk, rate); } diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c index 05bd8f0..d384068 100644 --- a/arch/arm/mach-omap2/pm34xx.c +++ b/arch/arm/mach-omap2/pm34xx.c @@ -346,18 +346,21 @@ void omap_sram_idle(void) static void omap3_pm_idle(void) { + int cpu; + local_fiq_disable(); if (omap_irq_pending()) goto out; - trace_power_start(POWER_CSTATE, 1, smp_processor_id()); - trace_cpu_idle(1, smp_processor_id()); + cpu = raw_smp_processor_id(); + trace_power_start(POWER_CSTATE, 1, cpu); + trace_cpu_idle(1, cpu); omap_sram_idle(); - trace_power_end(smp_processor_id()); - trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id()); + trace_power_end(cpu); + trace_cpu_idle(PWR_EVENT_EXIT, cpu); out: local_fiq_enable(); diff --git a/arch/arm/mach-omap2/powerdomain.c b/arch/arm/mach-omap2/powerdomain.c index 69b36e1..f1d0d69 100644 --- a/arch/arm/mach-omap2/powerdomain.c +++ b/arch/arm/mach-omap2/powerdomain.c @@ -169,7 +169,7 @@ static int _pwrdm_state_switch(struct powerdomain *pwrdm, int flag) ((state & OMAP_POWERSTATE_MASK) << 8) | ((prev & OMAP_POWERSTATE_MASK) << 0)); trace_power_domain_target(pwrdm->name, trace_state, - smp_processor_id()); + raw_smp_processor_id()); } break; default: @@ -491,7 +491,7 @@ int pwrdm_set_next_pwrst(struct powerdomain *pwrdm, u8 pwrst) if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) { /* Trace the pwrdm desired target state */ trace_power_domain_target(pwrdm->name, pwrst, - smp_processor_id()); + raw_smp_processor_id()); /* Program the pwrdm desired target state */ ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst); }
gets rid of below messages with CONFIG_DEBUG_PREEMPT enabled [ 28.832916] debug_smp_processor_id: 18 callbacks suppressed [ 28.832946] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/1763 [ 28.841491] caller is pwrdm_set_next_pwrst+0x54/0x120 Tested with perf on OMAP4 Panda. Signed-off-by: Roger Quadros <rogerq@ti.com> --- arch/arm/mach-omap2/clock.c | 6 +++--- arch/arm/mach-omap2/pm34xx.c | 11 +++++++---- arch/arm/mach-omap2/powerdomain.c | 4 ++-- 3 files changed, 12 insertions(+), 9 deletions(-)