From patchwork Tue Apr 3 17:08:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Alex G." X-Patchwork-Id: 10321587 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 095D360318 for ; Tue, 3 Apr 2018 17:10:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED25027FA9 for ; Tue, 3 Apr 2018 17:10:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E1F5B28C61; Tue, 3 Apr 2018 17:10:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7499B28C64 for ; Tue, 3 Apr 2018 17:10:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751561AbeDCRJw (ORCPT ); Tue, 3 Apr 2018 13:09:52 -0400 Received: from mail-ot0-f196.google.com ([74.125.82.196]:39377 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbeDCRIv (ORCPT ); Tue, 3 Apr 2018 13:08:51 -0400 Received: by mail-ot0-f196.google.com with SMTP id a14-v6so2042053otf.6; Tue, 03 Apr 2018 10:08:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=7GFnrRW++JjG3oKVequb9zK2xl+DQhNyXoNtJjA1wz4=; b=MseofQNASN8PGlZWZt8mELBubCP+FxPUvf8gRtUU2XRUW2be8QaNgAwpKYWFCZTcG9 vCUwXB7fad/XT/rgeAJNUgh2m3x240eI7l+UAlEkdMwYiJa95x5cFmvX5gHmZjCu9G60 /0BwNnykCpO0AHMGjMcZjxvMOiheQ3MtlGzFgh3ztG8Nji+nAXeQpbNnQvmAPkYtOi7j iWOGOq63pPvX/Pa5PniAGygWE+VCS87CdcRyGC2w+OA5RePTin/8ih0rJeKXdsNdb2bP o08/t/P3crScmY7YKusnFQ9kPDYfKicYT6rlZMpew04+Sgwkf0tBn7UxoP9JvP3XYzF7 jC0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7GFnrRW++JjG3oKVequb9zK2xl+DQhNyXoNtJjA1wz4=; b=o5ZZyuVhejz/HgUJ+e4qkNuf3i6PzvPnoGPYGsWe2P3ldlg0aciZhUNV4nJLDqRt5x Xh4H6QLBoUnkn/LtKPPHHGJ/irDTJf+MPLgnPDlxAB4wYwqga+/KLzA+svQ0wZ0DLCZl NKEXKgihF+C8IKEiC+xhG6Uu8S5ogFwrrljBh5GKrhfPlyLF/krbPEXW2XehOs40xsER IiqnueT2SEjLON7+ClmN6Wg3viXVxslrXsd/Tbc7IxeimK7sFYTJIBJRNjJnc16Vwv/K 3hVxn9ZQFiyNOm1jcOKBMvr5N0WCkapwhqH6Q5QfI2PT7Kg93ei2x4laiEr1XhunD7RB WyEA== X-Gm-Message-State: ALQs6tAUPjfOO19NZBjKnmOrycrnGDLO0YU/B8Ydyjra+ubOwFV2Xfh2 xPyzlkvUSsMP3h3t2k35pvwK23+F X-Google-Smtp-Source: AIpwx48PJG8s6Hi/p8nWTOwvPv5NzSNAlsL94W/QBT4xL3eY4HkJwfYjkV24honLQ73oGe+xFKH20w== X-Received: by 2002:a9d:cd8:: with SMTP id o24-v6mr8005443otd.129.1522775330363; Tue, 03 Apr 2018 10:08:50 -0700 (PDT) Received: from nuclearis2_1.lan (c-98-197-2-30.hsd1.tx.comcast.net. [98.197.2.30]) by smtp.gmail.com with ESMTPSA id p35-v6sm1763878ota.72.2018.04.03.10.08.49 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 03 Apr 2018 10:08:49 -0700 (PDT) From: Alexandru Gagniuc To: linux-acpi@vger.kernel.org Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, Alexandru Gagniuc Subject: [RFC PATCH 1/4] acpi: apei: Return severity of GHES messages after handling Date: Tue, 3 Apr 2018 12:08:27 -0500 Message-Id: <20180403170830.29282-2-mr.nuke.me@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180403170830.29282-1-mr.nuke.me@gmail.com> References: <20180403170830.29282-1-mr.nuke.me@gmail.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The policy currently is to simply panic() on GHES fatal errors. Oftentimes we may correct fatal errors i.e. "Fatal" PCIe errors can be corrected via AER When these errors are corrected, it doesn't make sense to panic(). Update ghes_do_proc() to return the severity of the worst error, while marking handled errors as corrected. Signed-off-by: Alexandru Gagniuc --- drivers/acpi/apei/ghes.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 1efefe919555..25cf77a18e0a 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -383,7 +383,7 @@ static void ghes_clear_estatus(struct ghes *ghes) ghes->flags &= ~GHES_TO_CLEAR; } -static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev) +static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev) { #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE unsigned long pfn; @@ -411,7 +411,10 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int if (flags != -1) memory_failure_queue(pfn, flags); + + return true; #endif + return false; } /* @@ -428,7 +431,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int * GHES_SEV_PANIC does not make it to this handling since the kernel must * panic. */ -static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) +static bool ghes_handle_aer(struct acpi_hest_generic_data *gdata) { #ifdef CONFIG_ACPI_APEI_PCIEAER struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata); @@ -456,20 +459,33 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) (struct aer_capability_regs *) pcie_err->aer_info); } + + return true; #endif + return false; } -static void ghes_do_proc(struct ghes *ghes, +/* + * Handle GHES messages, and return the highest encountered severity. + * Errors which are handled are considered to be CORRECTED. The severity is + * taken from each GHES error data entry, not the error status block. + * An error is considered corrected if it can be dispatched to an appropriate + * handler. However, simply logging an error is not enough to "correct" it. + */ +static int ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { - int sev, sec_sev; + int sev, sec_sev, corrected_sev; struct acpi_hest_generic_data *gdata; guid_t *sec_type; guid_t *fru_id = &NULL_UUID_LE; char *fru_text = ""; + bool handled; + corrected_sev = GHES_SEV_NO; sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { + handled = false; sec_type = (guid_t *)gdata->section_type; sec_sev = ghes_severity(gdata->error_severity); if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID) @@ -484,10 +500,10 @@ static void ghes_do_proc(struct ghes *ghes, ghes_edac_report_mem_error(ghes, sev, mem_err); arch_apei_report_mem_error(sev, mem_err); - ghes_handle_memory_failure(gdata, sev); + handled = ghes_handle_memory_failure(gdata, sev); } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { - ghes_handle_aer(gdata); + handled = ghes_handle_aer(gdata); } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); @@ -500,7 +516,14 @@ static void ghes_do_proc(struct ghes *ghes, sec_sev, err, gdata->error_data_length); } + + if (sec_sev >= GHES_SEV_RECOVERABLE && handled) + sec_sev = GHES_SEV_CORRECTED; + + corrected_sev = max(corrected_sev, sec_sev); } + + return corrected_sev; } static void __ghes_print_estatus(const char *pfx,