From patchwork Wed Mar 28 16:30:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Morse X-Patchwork-Id: 10313555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7931660353 for ; Wed, 28 Mar 2018 16:34:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 676DC29AD6 for ; Wed, 28 Mar 2018 16:34:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59E74297FC; Wed, 28 Mar 2018 16:34:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D4399297FC for ; Wed, 28 Mar 2018 16:34:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=SMgyoQ+cq5IlWxL3ZhIAGzBj25NBTAh8k+bCasG8q4o=; b=Z8y2Hd3Ik9dszp jiEtxGQ81m92OuP/3k/aQiCQseVnhFLTLcKWP3uxvKmq6FoQQRnzD5nRhFLtV82N7K/uAFHwAynrO x4LthNxWyvPqipyMxRb2daIZAVGrSevUiUnbDkodHr9IPDdM1+ocR8oII1x1qlySgDH/E7+wE2MCH W1Zjn/8PQvw+vyTF8IdBm3Efjiis/qmWIXpf3l0v7lQ8VHAahImvj6Qrdb5/Kt2v+QsIBfiD7T5op bA4ojkBVGWGpFdPJCYTKEXELnI8Llw2zL8MqzndtXjWYLyYmpapqrGbBUc6oZZpAlpt8gtN9+eavC 38t9sGSdOPp53uqOUZBQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1f1E1O-0007xf-E1; Wed, 28 Mar 2018 16:33:58 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1f1E1K-0007wN-Qm for linux-arm-kernel@lists.infradead.org; Wed, 28 Mar 2018 16:33:56 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A200815AB; Wed, 28 Mar 2018 09:33:44 -0700 (PDT) Received: from [10.1.207.55] (melchizedek.cambridge.arm.com [10.1.207.55]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D96A43F590; Wed, 28 Mar 2018 09:33:39 -0700 (PDT) Subject: Re: [PATCH 02/11] ACPI / APEI: Generalise the estatus queue's add/remove and notify code To: Borislav Petkov References: <20180215185606.26736-1-james.morse@arm.com> <20180215185606.26736-3-james.morse@arm.com> <20180301150144.GA4215@pd.tnic> <87sh9jbrgc.fsf@e105922-lin.cambridge.arm.com> <20180301223529.GA28811@pd.tnic> <5AA02C26.10803@arm.com> <20180308104408.GB21166@pd.tnic> <5AAFC939.3010309@arm.com> <20180327172510.GB32184@pd.tnic> From: James Morse Message-ID: Date: Wed, 28 Mar 2018 17:30:55 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180327172510.GB32184@pd.tnic> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180328_093354_871494_3513C59C X-CRM114-Status: GOOD ( 23.08 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rafael Wysocki , Tony Luck , Xie XiuQi , linux-mm@kvack.org, Marc Zyngier , Catalin Marinas , Punit Agrawal , Will Deacon , Tyler Baicar , Dongjiu Geng , linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Naoya Horiguchi , kvmarm@lists.cs.columbia.edu, Christoffer Dall , Len Brown Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Borislav, On 27/03/18 18:25, Borislav Petkov wrote: > On Mon, Mar 19, 2018 at 02:29:13PM +0000, James Morse wrote: >> I don't think the die_lock really helps here, do we really want to wait for a >> remote CPU to finish printing an OOPs about user-space's bad memory accesses, >> before we bring the machine down due to this system-wide fatal RAS error? The >> presence of firmware-first means we know this error, and any other oops are >> unrelated. > > Hmm, now that you put it this way... >> I'd like to leave this under the x86-ifdef for now. For arm64 it would be an >> APEI specific arch hook to stop the arch code from printing some messages, > > ... I'm thinking we should ignore the whole serializing of oopses and > really dump that hw error ASAP. If it really is a fatal error, our main > and only goal is to get it out as fast as possible so that it has the > highest chance to appear on some screen or logging facility and thus the > system can be serviced successfully. > > And the other oopses have lower prio. > Hmmm? Yes, I agree. With firmware-first we know that errors the firmware takes first, then notifies by NMI causing us to panic() must be a higher priority than another oops. I'll add a patch[0] to v3 making this argument and removing the #ifdef'd oops_begin(). Thanks, James [0] -----------------%<----------------- ACPI / APEI: don't wait to serialise with oops messages when panic()ing oops_begin() exists to group printk() messages with the oops message printed by die(). To reach this caller we know that platform firmware took this error first, then notified the OS via NMI with a 'panic' severity. Don't wait for another CPU to release the die-lock before we can panic(), our only goal is to print this fatal error and panic(). This code is always called in_nmi(), and since 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"), it has been safe to call printk() from this context. Messages are batched in a per-cpu buffer and printed via irq-work, or a call back from panic(). Signed-off-by: James Morse -----------------%<----------------- Acked-by: Borislav Petkov diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 22f6ea5b9ad5..f348e6540960 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -34,7 +34,6 @@ #include #include #include -#include #include #include #include @@ -736,9 +735,6 @@ static int _in_nmi_notify_one(struct ghes *ghes) sev = ghes_severity(ghes->estatus->error_severity); if (sev >= GHES_SEV_PANIC) { -#ifdef CONFIG_X86 - oops_begin(); -#endif ghes_print_queued_estatus(); __ghes_panic(ghes); }