Message ID | 1545159239-30628-3-git-send-email-vgupta@synopsys.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARC show_regs fixes | expand |
On Tue 18-12-18 10:53:59, Vineet Gupta wrote: > signal handling core calls ARCH show_regs() with preemption disabled > which causes __might_sleep functions such as mmput leading to lockdep > splat. Workaround by re-enabling preemption temporarily. > > This may not be as bad as it sounds since the preemption disabling > itself was introduced for a supressing smp_processor_id() warning in x86 > code by commit 3a9f84d354ce ("signals, debug: fix BUG: using > smp_processor_id() in preemptible code in print_fatal_signal()") The commit you are referring to here sounds dubious in itself. We do not want to stick a preempt_disable just to silence a warning. show_regs is called from preemptible context at several places (e.g. __warn). Maybe this was not the case in 2009 when the change was introduced but this seems like a relict from the past. So can we fix the actual problem rather than build on top of it instead? Or maybe I am just missing something here.
On 12/20/18 5:04 AM, Michal Hocko wrote: > On Tue 18-12-18 10:53:59, Vineet Gupta wrote: >> signal handling core calls ARCH show_regs() with preemption disabled >> which causes __might_sleep functions such as mmput leading to lockdep >> splat. Workaround by re-enabling preemption temporarily. >> >> This may not be as bad as it sounds since the preemption disabling >> itself was introduced for a supressing smp_processor_id() warning in x86 >> code by commit 3a9f84d354ce ("signals, debug: fix BUG: using >> smp_processor_id() in preemptible code in print_fatal_signal()") > The commit you are referring to here sounds dubious in itself. Indeed that was my thought as well, but it did introduce the preemption disabling logic aroung core calling show_regs() ! > We do not > want to stick a preempt_disable just to silence a warning. I presume you are referring to original commit, not my anti-change in ARC code, which is actually re-enabling it. > show_regs is > called from preemptible context at several places (e.g. __warn). Right, but do we have other reports which show this, perhaps not too many distros have CONFIG__PREEMPT enabled ? > Maybe > this was not the case in 2009 when the change was introduced but this > seems like a relict from the past. So can we fix the actual problem > rather than build on top of it instead? The best/correct fix is to remove the preempt diabling in core code, but that affects every arch out there and will likely trip dormant land mines, needed localized fixes like I'm dealing with now. -Vineet
On Thu 20-12-18 18:45:48, Vineet Gupta wrote: > On 12/20/18 5:04 AM, Michal Hocko wrote: > > On Tue 18-12-18 10:53:59, Vineet Gupta wrote: > >> signal handling core calls ARCH show_regs() with preemption disabled > >> which causes __might_sleep functions such as mmput leading to lockdep > >> splat. Workaround by re-enabling preemption temporarily. > >> > >> This may not be as bad as it sounds since the preemption disabling > >> itself was introduced for a supressing smp_processor_id() warning in x86 > >> code by commit 3a9f84d354ce ("signals, debug: fix BUG: using > >> smp_processor_id() in preemptible code in print_fatal_signal()") > > The commit you are referring to here sounds dubious in itself. > > Indeed that was my thought as well, but it did introduce the preemption disabling > logic aroung core calling show_regs() ! > > > We do not > > want to stick a preempt_disable just to silence a warning. > > I presume you are referring to original commit, not my anti-change in ARC code, > which is actually re-enabling it. Yes, but you are building on a broken concept I believe. What implications does re-enabling really have? Now you could reschedule and you can move to another CPU. Is this really safe? I believe that yes because the preemption disabling is simply bogus. Which doesn't sound like a proper justification, does it? > > show_regs is > > called from preemptible context at several places (e.g. __warn). > > Right, but do we have other reports which show this, perhaps not too many distros > have CONFIG__PREEMPT enabled ? I do not follow. If there is some path to require show_regs to run with preemption disabled while others don't then something is clearly wrong. > > Maybe > > this was not the case in 2009 when the change was introduced but this > > seems like a relict from the past. So can we fix the actual problem > > rather than build on top of it instead? > > The best/correct fix is to remove the preempt diabling in core code, but that > affects every arch out there and will likely trip dormant land mines, needed > localized fixes like I'm dealing with now. Yes, the fix might be more involved but I would much rather prefer a correct code which builds on solid assumptions.
On Fri 21-12-18 14:04:04, Michal Hocko wrote: [...] > Yes, but you are building on a broken concept I believe. What > implications does re-enabling really have? Now you could reschedule and > you can move to another CPU. Is this really safe? I believe that yes > because the preemption disabling is simply bogus. Which doesn't sound > like a proper justification, does it? Well, thinking about it a bit more. What is the result of calling preempt_enable outside of preempt_disabled section? E.g. __warn which doesn't disable preemption AFAICS.
On 12/21/18 5:04 AM, Michal Hocko wrote: >> I presume you are referring to original commit, not my anti-change in ARC code, >> which is actually re-enabling it. > > Yes, but you are building on a broken concept I believe. Not sure where this is heading. Broken concept was introduced by disabling preemption around show_regs() to silence x86 smp_processor_id() splat in 2009. > What > implications does re-enabling really have ? Now you could reschedule and> you can move to another CPU. Is this really safe? From initial testing, none so far. show_regs() is simply pretty printing the passed pt_regs and decoding the current task, which agreed could move to a different CPU (likely will due to console/printk calls), but I don't see how that could mess up its mm or othe rinternal plumbing which it prints. > I believe that yes > because the preemption disabling is simply bogus. Which doesn't sound > like a proper justification, does it? [snip] > I do not follow. If there is some path to require show_regs to run with > preemption disabled while others don't then something is clearly wrong. [snip] > Yes, the fix might be more involved but I would much rather prefer a > correct code which builds on solid assumptions. Right so the first step is reverting the disabled semantics for ARC and do some heavy testing to make sure any fallouts are addressed etc. And if that works, then propagate this change to core itself. Low risk strategy IMO - agree ? Thx, -Vineet
On Fri 21-12-18 09:55:34, Vineet Gupta wrote: > On 12/21/18 5:04 AM, Michal Hocko wrote: [...] > > Yes, the fix might be more involved but I would much rather prefer a > > correct code which builds on solid assumptions. > > Right so the first step is reverting the disabled semantics for ARC and do some > heavy testing to make sure any fallouts are addressed etc. And if that works, then > propagate this change to core itself. Low risk strategy IMO - agree ? Yeah, I would simply remove the preempt_disable and see what falls out. smp_processor_id could be converted to the raw version etc...
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c index 2885bec71fb8..c650d3de13e1 100644 --- a/arch/arc/kernel/troubleshoot.c +++ b/arch/arc/kernel/troubleshoot.c @@ -177,6 +177,12 @@ void show_regs(struct pt_regs *regs) struct task_struct *tsk = current; struct callee_regs *cregs; + /* + * generic code calls us with preemption disabled, but some calls + * here could sleep, so re-enable to avoid lockdep splat + */ + preempt_enable(); + print_task_path_n_nm(tsk); show_regs_print_info(KERN_INFO); @@ -219,6 +225,8 @@ void show_regs(struct pt_regs *regs) cregs = (struct callee_regs *)current->thread.callee_reg; if (cregs) show_callee_regs(cregs); + + preempt_disable(); } void show_kernel_fault_diag(const char *str, struct pt_regs *regs,
signal handling core calls ARCH show_regs() with preemption disabled which causes __might_sleep functions such as mmput leading to lockdep splat. Workaround by re-enabling preemption temporarily. This may not be as bad as it sounds since the preemption disabling itself was introduced for a supressing smp_processor_id() warning in x86 code by commit 3a9f84d354ce ("signals, debug: fix BUG: using smp_processor_id() in preemptible code in print_fatal_signal()") Signed-off-by: Vineet Gupta <vgupta@synopsys.com> --- arch/arc/kernel/troubleshoot.c | 8 ++++++++ 1 file changed, 8 insertions(+)