From patchwork Fri Nov 24 15:56:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Russell King (Oracle)" X-Patchwork-Id: 10074247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1B260602DC for ; Fri, 24 Nov 2017 15:57:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0AB8A2A2BE for ; Fri, 24 Nov 2017 15:57:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F35342A2C8; Fri, 24 Nov 2017 15:57:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BD5E62A2BE for ; Fri, 24 Nov 2017 15:57:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=OG1DPVXfLBk6/6E9s2ydQ1v3ffLMnfXI8/DWjbVJCcM=; b=owxf1LE8Adx/lP +sqScMt0el80QZF74oSIR7fh/R8kmwQNBncGpWoHkssBJhJxOxIH/JQlcoOw2R9sa5vCRYmpkiNC6 4LXoe20uhlWyf2mO/tbA58pgz/+uyEKM1iOT9EgBMq376xRmheRFS8d90trCf+gO4MVUHCXa24BZM RgHZFrv+xv37SdV0ELQF7Q+LqADhiMN9nqTsYuqqgHTEqszRcTu8K2FcaHdbHDe99bQxSDzk60g9g dts3QTrv941hH4o1GHjqADJndVVLA4mespr+VzUo/SlQWbF0DezxqStTOfM1yjkBiisqGDtLk9X7L a7BZoWj6HJswzZvWfrBQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1eIGM7-0000AS-5O; Fri, 24 Nov 2017 15:57:31 +0000 Received: from pandora.armlinux.org.uk ([2001:4d48:ad52:3201:214:fdff:fe10:1be6]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1eIGM3-00009y-0D for linux-arm-kernel@lists.infradead.org; Fri, 24 Nov 2017 15:57:29 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=fAIKriDbKBx5S6O7uYtOnK2PeCSklP5x+zRyFDEjoDg=; b=Gt1Xu2HqLIiwgapfANHr9oTDZDAx1aIG1Ulmejw190o2+WqVjpe67eISB7pvGS+wptiqxf7aBudfWnukrJnG6gJIHF3OyG1hMiLzRCkasPyWskobWPG58UJWlRnJajenqtpe2+iA4vsSS1oHkjVZdAoe5qmDDlqIr/cSRRS/KRM=; Received: from n2100.armlinux.org.uk ([2002:4e20:1eda:1:214:fdff:fe10:4f86]:51862) by pandora.armlinux.org.uk with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1eIGLX-0008E5-5e; Fri, 24 Nov 2017 15:56:55 +0000 Received: from linux by n2100.armlinux.org.uk with local (Exim 4.76) (envelope-from ) id 1eIGLS-00020J-J4; Fri, 24 Nov 2017 15:56:50 +0000 Date: Fri, 24 Nov 2017 15:56:50 +0000 From: Russell King - ARM Linux To: Alex Shi Subject: Re: do page fault in atomic bug on arm Message-ID: <20171124155649.GT31757@n2100.armlinux.org.uk> References: <20171121132001.GH31757@n2100.armlinux.org.uk> <64cbcda0-d040-4872-4a6b-7cd18375b4aa@linaro.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <64cbcda0-d040-4872-4a6b-7cd18375b4aa@linaro.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20171124_075727_543496_3AC58C84 X-CRM114-Status: GOOD ( 22.00 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "private-kwg@linaro.org" , linux-arm-kernel@lists.infradead.org, lts-dev@lists.linaro.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Nov 24, 2017 at 11:09:30PM +0800, Alex Shi wrote: > > >> [ 53.302718] softirqs last enabled at (11474): [] __do_softirq+0x280/0x5ac > >> [ 53.310494] softirqs last disabled at (11433): [] irq_exit+0xf4/0x158 > >> [ 53.317837] CPU: 0 PID: 1691 Comm: ftracetest Not tainted 4.9.55-dirty #1 > >> [ 53.324652] Hardware name: Generic DRA74X (Flattened Device Tree) > >> [ 53.330857] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > >> [ 53.338644] [] (show_stack) from [] (dump_stack+0xa4/0xd0) > >> [ 53.345908] [] (dump_stack) from [] (___might_sleep+0x1ac/0x2a0) > >> [ 53.353694] [] (___might_sleep) from [] (do_page_fault+0x25c/0x428) > >> [ 53.361739] [] (do_page_fault) from [] (do_PrefetchAbort+0x38/0x9c) > >> [ 53.369780] [] (do_PrefetchAbort) from [] (__pabt_svc+0x68/0xa0) > >> [ 53.377557] Exception stack(0xec6fbfa8 to 0xec6fbff0) > >> [ 53.382629] bfa0: 00000001 00000001 ffffffff 00000000 0010ac68 00000007 > >> [ 53.390845] bfc0: 00000001 0000003f 00000009 0000000c fffffffa be9d27a4 000e31fc ec6fbff8 > >> [ 53.399055] bfe0: b6e6d49c b6e6d49c 40070093 ffffffff > >> [ 53.404137] [] (__pabt_svc) from [] (0xb6e6d49c) > > > > It also doesn't help that the backtrace stops at this point, and it looks > > very strange: > > > > 1. the value of PC looks like it's outside of the module space. > > 2. the CPSR indicates that the CPU was in SVC mode in the parent context > > with IRQs disabled. > > 3. We're right at the top of the kernel stack, which suggests no further > > stack frames above this. > > > > We should never be in SVC mode without further stack frames on the kernel > > stack. > > > > We don't seem to have overflowed the kernel stack, as the thread info > > seems correct - and it would also be unlikely that the saved SP value > > would end in ff8 in the exception stack frame. > > Hi Russell, > > Sorry for response late! > Is this SP was stained by sth? As my understand, SP should be times of > 32bits. But why stack print out correct with a incorrect SP? There's nothing wrong with SP. Looking deeper at this, I think that the kernel stack somehow got corrupted earlier: irq event stamp: 12924 hardirqs last enabled at (12923): [] no_work_pending+0x4/0x30 hardirqs last disabled at (12924): [] __pabt_svc+0x60/0xa0 The hard IRQ disable is as a result of taking a prefetch abort in SVC mode. The saved context agrees with that: R0 R1 R2 R3 R4 R5 bfa0: 00000001 00000001 ffffffff 00000000 0010ac68 00000007 R6 R7 R8 R9 R10 FP IP SP bfc0: 00000001 0000003f 00000009 0000000c fffffffa be9d27a4 000e31fc ec6fbff8 LR PC PSR bfe0: b6e6d49c b6e6d49c 40070093 ffffffff The PSR lower 5 bits are 0x13, which is SVC mode. Bit 7 set means IRQs disabled. The PC address was 0xb6e6d49c. The last record we have of interrupts being enabled was in no_work_pending, which is the exit path to usermode - but if we were returning to usermode: (a) how did we get into SVC mode instead (b) why are interrupts disabled (c) why was mm->mmap_sem still held Can you try the following patch to try and catch the problem earlier? I haven't tested it myself, and adding code may move things around in the kernel and make this bug disappear. diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index d523cd8439a3..ff577177b286 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -299,6 +299,8 @@ @ ARM mode restore mov r2, sp ldr r1, [r2, #\offset + S_PSR] @ get calling cpsr + tst r1, #0xcf + bne oops ldr lr, [r2, #\offset + S_PC]! @ get pc msr spsr_cxsf, r1 @ save in spsr_svc #if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_32v6K) @@ -314,6 +316,15 @@ @ after ldm {}^ add sp, sp, #\offset + PT_REGS_SIZE movs pc, lr @ return & move spsr_svc into cpsr +oops: .word 0xe7f001f2 + .pushsection .rodata.str, "aMS", %progbits, 1 +2: .asciz "Returning to usermode but unexpected PSR bits set?" + .popsection + .pushsection __bug_table, "aw" + .align 2 + .word oops, 2b + .hword \@ + .popsection #elif defined(CONFIG_CPU_V7M) @ V7M restore. @ Note that we don't need to do clrex here as clearing the local