diff mbox

Bug in _und_usr on dual-core ARM?

Message ID 20110621091529.GA22868@n2100.arm.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King - ARM Linux June 21, 2011, 9:15 a.m. UTC
On Tue, Jun 21, 2011 at 04:31:19PM +0800, TAO HU wrote:
> Hi, All
> 
> We got an issue on our OMAP4 SMP system.
> Looks like __und_user(), which was triggered by a user space
> exception, got a page fault hence lead to might_sleep() failure.

Could you see whether this patch prevents the warning please.

Comments

TAO HU June 21, 2011, 9:37 a.m. UTC | #1
Hi, Russell

Wouldn't it lead to oops with your patch?
We're trying avoid oops actually.

On Tue, Jun 21, 2011 at 5:15 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jun 21, 2011 at 04:31:19PM +0800, TAO HU wrote:
>> Hi, All
>>
>> We got an issue on our OMAP4 SMP system.
>> Looks like __und_user(), which was triggered by a user space
>> exception, got a page fault hence lead to might_sleep() failure.
>
> Could you see whether this patch prevents the warning please.
>
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index bc0e1d8..d52b940 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -289,7 +289,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>         * If we're in an interrupt or have no user
>         * context, we must not take the fault..
>         */
> -       if (in_atomic() || !mm)
> +       if (in_atomic() || irqs_disabled() || !mm)
>                goto no_context;
>
>        /*
>
>
Russell King - ARM Linux June 21, 2011, 9:58 a.m. UTC | #2
On Tue, Jun 21, 2011 at 05:37:31PM +0800, TAO HU wrote:
> Hi, Russell
> 
> Wouldn't it lead to oops with your patch?

The might_sleep() occurs because its trying to process the page fault.
We shouldn't be trying to do that with IRQs disabled - instead, we
should try to fix up the fault from kernel space by invoking the fixup
for the ldrt instructions in __und_usr.

This may throw the system into a loop against your process (which you
should still be able to kill) which will suggest that there's dirty
I-cache lines - we seem to be executing code in a non-present page
(at 0xafd0ce5c).
diff mbox

Patch

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index bc0e1d8..d52b940 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -289,7 +289,7 @@  do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	 * If we're in an interrupt or have no user
 	 * context, we must not take the fault..
 	 */
-	if (in_atomic() || !mm)
+	if (in_atomic() || irqs_disabled() || !mm)
 		goto no_context;
 
 	/*