From patchwork Tue Oct 9 18:33:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 10633133 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8378916B1 for ; Tue, 9 Oct 2018 18:35:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7BAA629561 for ; Tue, 9 Oct 2018 18:35:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7A083295AD; Tue, 9 Oct 2018 18:35:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA89F29561 for ; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726562AbeJJBxi (ORCPT ); Tue, 9 Oct 2018 21:53:38 -0400 Received: from mga18.intel.com ([134.134.136.126]:64078 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726434AbeJJBxi (ORCPT ); Tue, 9 Oct 2018 21:53:38 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Oct 2018 11:35:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,361,1534834800"; d="scan'208";a="97900982" Received: from agluck-desk.sc.intel.com ([10.3.52.160]) by orsmga001.jf.intel.com with ESMTP; 09 Oct 2018 11:33:58 -0700 From: Tony Luck To: "Rafael J. Wysocki" Cc: Tony Luck , Borislav Petkov , Qiuxu Zhuo , Aristeu Rozanski , Mauro Carvalho Chehab , linux-edac@vger.kernel.org, linux-acpi@vger.kernel.org Subject: [PATCH 1/3] ACPI / adxl: Address translation interface using ACPI DSM Date: Tue, 9 Oct 2018 11:33:53 -0700 Message-Id: <20181009183355.20597-2-tony.luck@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181009183355.20597-1-tony.luck@intel.com> References: <20181009182932.GA20408@agluck-desk> <20181009183355.20597-1-tony.luck@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some new servers provide an interface so that the OS can ask the BIOS to translate a system physical address to a memory address (socket, memory controller, channel, rank, dimm, etc.). This is useful for EDAC drivers that want to take the address of an error reported in a machine check bank and let the user know which DIMM may need to be replaced. Specification for this interface is available at: https://cdrdv2.intel.com/v1/dl/getContent/603354 [Based on earlier code by Qiuxu Zhuo ] Signed-off-by: Tony Luck --- Comments addressed since last version: Rafael: Added URL for specification (to commit & header comment) Fixed NULL error return from error case of adxl_dsm() Boris: Added sanity check on number of address components (also added check that number of values returned by decode operation matches number of strings) g/pr_debug/s//pr_info/ Added "." at end of sentences in comments Make return from adxl_get_component_names() immutable drivers/acpi/Kconfig | 10 ++ drivers/acpi/Makefile | 3 + drivers/acpi/acpi_adxl.c | 199 +++++++++++++++++++++++++++++++++++++++ include/linux/adxl.h | 25 +++++ 4 files changed, 237 insertions(+) create mode 100644 drivers/acpi/acpi_adxl.c create mode 100644 include/linux/adxl.h diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index dd1eea90f67f..327c93b51cb7 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -498,6 +498,16 @@ config ACPI_EXTLOG driver adds support for that functionality with corresponding tracepoint which carries that information to userspace. +config ACPI_ADXL + bool "Physical address to DIMM address translation" + def_bool n + help + Enable interface that calls into BIOS using a DSM (device + specific method) to convert system physical addresses + to DIMM (socket, channel, rank, dimm, etc.). + Only available on some servers. + Used by newer EDAC drivers. + menuconfig PMIC_OPREGION bool "PMIC (Power Management Integrated Circuit) operation region support" help diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 6d59aa109a91..edc039313cd6 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -61,6 +61,9 @@ acpi-$(CONFIG_ACPI_LPIT) += acpi_lpit.o acpi-$(CONFIG_ACPI_GENERIC_GSI) += irq.o acpi-$(CONFIG_ACPI_WATCHDOG) += acpi_watchdog.o +# Address translation +acpi-$(CONFIG_ACPI_ADXL) += acpi_adxl.o + # These are (potentially) separate modules # IPMI may be used by other drivers, so it has to initialise before them diff --git a/drivers/acpi/acpi_adxl.c b/drivers/acpi/acpi_adxl.c new file mode 100644 index 000000000000..a58bd8ec396e --- /dev/null +++ b/drivers/acpi/acpi_adxl.c @@ -0,0 +1,199 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Address translation interface via ACPI DSM. + * Copyright (C) 2018 Intel Corporation + * + * Specification for this interface is available at: + * + * https://cdrdv2.intel.com/v1/dl/getContent/603354 + */ + +#ifdef CONFIG_ACPI_ADXL +#include +#include + +#define ADXL_REVISION 0x1 +#define ADXL_IDX_GET_ADDR_PARAMS 0x1 +#define ADXL_IDX_FORWARD_TRANSLATE 0x2 +#define ACPI_ADXL_PATH "\\_SB.ADXL" + +/* + * The specification doesn't provide a limit on how many + * components are in a memory address. But since we allocate + * memory based on the number the BIOS tells us, we should + * defend against insane values. + */ +#define ADXL_MAX_COMPONENTS 500 + +#undef pr_fmt +#define pr_fmt(fmt) "ADXL: " fmt + +static acpi_handle handle; +static union acpi_object *params; +static const guid_t adxl_guid = + GUID_INIT(0xAA3C050A, 0x7EA4, 0x4C1F, + 0xAF, 0xDA, 0x12, 0x67, 0xDF, 0xD3, 0xD4, 0x8D); + +static int adxl_count; +static char **adxl_component_names; + +static union acpi_object *adxl_dsm(int cmd, union acpi_object argv[]) +{ + union acpi_object *obj, *o; + + obj = acpi_evaluate_dsm_typed(handle, &adxl_guid, ADXL_REVISION, + cmd, argv, ACPI_TYPE_PACKAGE); + if (!obj) { + pr_info("Empty obj\n"); + return NULL; + } + + if (obj->package.count != 2) { + pr_info("Bad pkg count %d\n", obj->package.count); + goto err; + } + + o = obj->package.elements; + if (o->type != ACPI_TYPE_INTEGER) { + pr_info("Bad 1st element type %d\n", o->type); + goto err; + } + if (o->integer.value) { + pr_info("Bad ret val %llu\n", o->integer.value); + goto err; + } + + o = obj->package.elements + 1; + if (o->type != ACPI_TYPE_PACKAGE) { + pr_info("Bad 2nd element type %d\n", o->type); + goto err; + } + return obj; + +err: + ACPI_FREE(obj); + return NULL; +} + +/** + * adxl_get_component_names - get list of memory component names + * Returns NULL terminated list of string names + * + * Give the caller a pointer to the list of memory component names + * e.g. { "SystemAddress", "ProcessorSocketId", "ChannelId", ... NULL } + * Caller should count how many strings in order to allocate a buffer + * for the return from adxl_decode(). + */ +const char * const *adxl_get_component_names(void) +{ + return (const char * const *)adxl_component_names; +} +EXPORT_SYMBOL_GPL(adxl_get_component_names); + +/** + * adxl_decode - ask BIOS to decode a system address to memory address + * @addr: the address to decode + * @component_values: pointer to array of values for each component + * Returns 0 on success, negative error code otherwise + * + * The index of each value returned in the array matches the index of + * each component name returned by adxl_get_component_names(). + * Components that are not defined for this address translation (e.g. + * mirror channel number for a non-mirrored address) are set to ~0ull. + */ +int adxl_decode(u64 addr, u64 component_values[]) +{ + union acpi_object argv4[2], *results, *r; + int i, cnt; + + if (!adxl_component_names) + return -EOPNOTSUPP; + + argv4[0].type = ACPI_TYPE_PACKAGE; + argv4[0].package.count = 1; + argv4[0].package.elements = &argv4[1]; + argv4[1].integer.type = ACPI_TYPE_INTEGER; + argv4[1].integer.value = addr; + + results = adxl_dsm(ADXL_IDX_FORWARD_TRANSLATE, argv4); + if (!results) + return -EINVAL; + + r = results->package.elements + 1; + cnt = r->package.count; + if (cnt != adxl_count) { + ACPI_FREE(results); + return -EINVAL; + } + r = r->package.elements; + + for (i = 0; i < cnt; i++) + component_values[i] = r[i].integer.value; + + ACPI_FREE(results); + + return 0; +} +EXPORT_SYMBOL_GPL(adxl_decode); + +static bool adxl_detect(void) +{ + char *path = ACPI_ADXL_PATH; + union acpi_object *p; + acpi_status status; + int i; + + status = acpi_get_handle(NULL, path, &handle); + if (ACPI_FAILURE(status)) { + pr_info("No ACPI handle for path %s\n", path); + return false; + } + + if (!acpi_has_method(handle, "_DSM")) { + pr_info("No DSM method\n"); + return false; + } + + if (!acpi_check_dsm(handle, &adxl_guid, ADXL_REVISION, + ADXL_IDX_GET_ADDR_PARAMS | + ADXL_IDX_FORWARD_TRANSLATE)) { + pr_info("No ADXL DSM methods\n"); + return false; + } + + params = adxl_dsm(ADXL_IDX_GET_ADDR_PARAMS, NULL); + if (!params) { + pr_info("Failed to get params\n"); + return false; + } + + p = params->package.elements + 1; + adxl_count = p->package.count; + if (adxl_count > ADXL_MAX_COMPONENTS) { + pr_info("Insane number of address component names %d\n", adxl_count); + ACPI_FREE(params); + return false; + } + p = p->package.elements; + + adxl_component_names = kcalloc(adxl_count + 1, sizeof(char *), GFP_KERNEL); + if (!adxl_component_names) { + ACPI_FREE(params); + return false; + } + + for (i = 0; i < adxl_count; i++) + adxl_component_names[i] = p[i].string.pointer; + + return true; +} + +static int __init adxl_init(void) +{ + if (!adxl_detect()) + return -ENODEV; + return 0; +} +subsys_initcall(adxl_init); + +#endif /* CONFIG_ACPI_ADXL */ diff --git a/include/linux/adxl.h b/include/linux/adxl.h new file mode 100644 index 000000000000..6023704e5d0b --- /dev/null +++ b/include/linux/adxl.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Address translation interface via ACPI DSM. + * Copyright (C) 2018 Intel Corporation + */ + +#ifndef _LINUX_ADXL_H +#define _LINUX_ADXL_H + +#ifdef CONFIG_ACPI_ADXL +const char * const *adxl_get_component_names(void); +int adxl_decode(u64 addr, u64 component_values[]); +#else +static inline const char * const *adxl_get_component_names(void) +{ + return NULL; +} + +static inline int adxl_decode(u64 addr, u64 component_values[]) +{ + return -EOPNOTSUPP; +} +#endif + +#endif /* _LINUX_ADXL_H */ From patchwork Tue Oct 9 18:33:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 10633129 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 900113CF1 for ; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 88426295B4 for ; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7D00B29434; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 19C24295C5 for ; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726468AbeJJBxi (ORCPT ); Tue, 9 Oct 2018 21:53:38 -0400 Received: from mga18.intel.com ([134.134.136.126]:64070 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726488AbeJJBxh (ORCPT ); Tue, 9 Oct 2018 21:53:37 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Oct 2018 11:35:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,361,1534834800"; d="scan'208";a="97900985" Received: from agluck-desk.sc.intel.com ([10.3.52.160]) by orsmga001.jf.intel.com with ESMTP; 09 Oct 2018 11:33:59 -0700 From: Tony Luck To: "Rafael J. Wysocki" Cc: Qiuxu Zhuo , Tony Luck , Borislav Petkov , Aristeu Rozanski , Mauro Carvalho Chehab , linux-edac@vger.kernel.org, linux-acpi@vger.kernel.org Subject: [PATCH 2/3] EDAC, skx_edac: Clean up debugfs Date: Tue, 9 Oct 2018 11:33:54 -0700 Message-Id: <20181009183355.20597-3-tony.luck@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181009183355.20597-1-tony.luck@intel.com> References: <20181009182932.GA20408@agluck-desk> <20181009183355.20597-1-tony.luck@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Qiuxu Zhuo 1) The skx_edac debugfs node is '/sys/kernel/debug/skx_edac_test, move it under EDAC debugfs root node '/sys/kernel/debug/edac/' 2) Use skx_mce_check_error() instead of skx_decode() for debugfs, then the decoded result is showed via EDAC core. Because EDAC core show the decoded result in a more readable format as like: CPU_SrcID#[0-9]_MC#[0-9]_Chan#[0-9]_DIMM#[0-9]. Signed-off-by: Qiuxu Zhuo Signed-off-by: Tony Luck --- drivers/edac/skx_edac.c | 91 ++++++++++++++++++++--------------------- 1 file changed, 44 insertions(+), 47 deletions(-) diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c index fae095162c01..a710169abdbc 100644 --- a/drivers/edac/skx_edac.c +++ b/drivers/edac/skx_edac.c @@ -894,53 +894,6 @@ static bool skx_decode(struct decoded_addr *res) skx_rir_decode(res) && skx_mad_decode(res); } -#ifdef CONFIG_EDAC_DEBUG -/* - * Debug feature. Make /sys/kernel/debug/skx_edac_test/addr. - * Write an address to this file to exercise the address decode - * logic in this driver. - */ -static struct dentry *skx_test; -static u64 skx_fake_addr; - -static int debugfs_u64_set(void *data, u64 val) -{ - struct decoded_addr res; - - res.addr = val; - skx_decode(&res); - - return 0; -} - -DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); - -static struct dentry *mydebugfs_create(const char *name, umode_t mode, - struct dentry *parent, u64 *value) -{ - return debugfs_create_file(name, mode, parent, value, &fops_u64_wo); -} - -static void setup_skx_debug(void) -{ - skx_test = debugfs_create_dir("skx_edac_test", NULL); - mydebugfs_create("addr", S_IWUSR, skx_test, &skx_fake_addr); -} - -static void teardown_skx_debug(void) -{ - debugfs_remove_recursive(skx_test); -} -#else -static void setup_skx_debug(void) -{ -} - -static void teardown_skx_debug(void) -{ -} -#endif /*CONFIG_EDAC_DEBUG*/ - static void skx_mce_output_error(struct mem_ctl_info *mci, const struct mce *m, struct decoded_addr *res) @@ -1072,6 +1025,50 @@ static struct notifier_block skx_mce_dec = { .priority = MCE_PRIO_EDAC, }; +#ifdef CONFIG_EDAC_DEBUG +/* + * Debug feature. + * Exercise the address decode logic by writing an address to + * /sys/kernel/debug/edac/skx_edac_test/addr. + */ +static struct dentry *skx_test; +static u64 skx_fake_addr; + +static int debugfs_u64_set(void *data, u64 val) +{ + struct mce m; + + m.mcgstatus = 0; + /* ADDRV + MemRd + Unknown channel */ + m.status = MCI_STATUS_ADDRV + 0x90; + /* One corrected error */ + m.status |= 1ULL << MCI_STATUS_CEC_SHIFT; + m.addr = val; + m.socketid = 0; + skx_mce_check_error(NULL, 0, &m); + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(fops_u64_wo, NULL, debugfs_u64_set, "%llu\n"); + +static void setup_skx_debug(void) +{ + skx_test = edac_debugfs_create_dir("skx_edac_test"); + if (!skx_test) + return; + edac_debugfs_create_file("addr", 0200, skx_test, + &skx_fake_addr, &fops_u64_wo); +} + +static void teardown_skx_debug(void) +{ + debugfs_remove_recursive(skx_test); +} +#else +static void setup_skx_debug(void) {} +static void teardown_skx_debug(void) {} +#endif /*CONFIG_EDAC_DEBUG*/ + static void skx_remove(void) { int i, j; From patchwork Tue Oct 9 18:33:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 10633127 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F93A1867 for ; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63F5228409 for ; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5801329434; Tue, 9 Oct 2018 18:35:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 923F1295AD for ; Tue, 9 Oct 2018 18:35:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726485AbeJJBxh (ORCPT ); Tue, 9 Oct 2018 21:53:37 -0400 Received: from mga18.intel.com ([134.134.136.126]:64070 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726434AbeJJBxh (ORCPT ); Tue, 9 Oct 2018 21:53:37 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Oct 2018 11:35:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,361,1534834800"; d="scan'208";a="97900986" Received: from agluck-desk.sc.intel.com ([10.3.52.160]) by orsmga001.jf.intel.com with ESMTP; 09 Oct 2018 11:33:59 -0700 From: Tony Luck To: "Rafael J. Wysocki" Cc: Qiuxu Zhuo , Tony Luck , Borislav Petkov , Aristeu Rozanski , Mauro Carvalho Chehab , linux-edac@vger.kernel.org, linux-acpi@vger.kernel.org Subject: [PATCH 3/3] EDAC, skx_edac: Add address translation for non-volatile DIMMs Date: Tue, 9 Oct 2018 11:33:55 -0700 Message-Id: <20181009183355.20597-4-tony.luck@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181009183355.20597-1-tony.luck@intel.com> References: <20181009182932.GA20408@agluck-desk> <20181009183355.20597-1-tony.luck@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Qiuxu Zhuo Current skx_edac driver doesn't support address translation for non-volatile DIMMs. The ACPI ADXL DSM method support address translation for both volatile DIMMs and non-volatile DIMMs. So switch skx_edac to use the wrapped ACPI DSM methods, if they are supported and there are non-volatile DIMMs populated on the system. Signed-off-by: Qiuxu Zhuo Signed-off-by: Tony Luck --- drivers/edac/Kconfig | 1 + drivers/edac/skx_edac.c | 177 +++++++++++++++++++++++++++++++++++++--- 2 files changed, 165 insertions(+), 13 deletions(-) diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 57304b2e989f..ffd349c12479 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -234,6 +234,7 @@ config EDAC_SKX depends on PCI && X86_64 && X86_MCE_INTEL && PCI_MMCONFIG depends on ACPI_NFIT || !ACPI_NFIT # if ACPI_NFIT=m, EDAC_SKX can't be y select DMI + select ACPI_ADXL help Support for error detection and correction the Intel Skylake server Integrated Memory Controllers. If your diff --git a/drivers/edac/skx_edac.c b/drivers/edac/skx_edac.c index a710169abdbc..2989ebad8970 100644 --- a/drivers/edac/skx_edac.c +++ b/drivers/edac/skx_edac.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -35,6 +36,7 @@ #include "edac_module.h" #define EDAC_MOD_STR "skx_edac" +#define MSG_SIZE 1024 /* * Debug macros @@ -54,6 +56,17 @@ static LIST_HEAD(skx_edac_list); static u64 skx_tolm, skx_tohm; +static int nvdimm_count; +static char component_names[][32] = { + "ProcessorSocketId", + "MemoryControllerId", + "ChannelId", + "DimmSlotId", +}; + +static int component_indices[ARRAY_SIZE(component_names)]; +static int adxl_component_count; +static const char * const *adxl_component_names; #define NUM_IMC 2 /* memory controllers per socket */ #define NUM_CHANNELS 3 /* channels per memory controller */ @@ -102,11 +115,11 @@ struct decoded_addr { u64 addr; int socket; int imc; - int channel; + u64 channel; u64 chan_addr; int sktways; int chanways; - int dimm; + u64 dimm; int rank; int channel_rank; u64 rank_address; @@ -393,6 +406,8 @@ static int get_nvdimm_info(struct dimm_info *dimm, struct skx_imc *imc, u16 flags; u64 size = 0; + nvdimm_count++; + dev_handle = ACPI_NFIT_BUILD_DEVICE_HANDLE(dimmno, chan, imc->lmc, imc->src_id, 0); @@ -682,7 +697,7 @@ static bool skx_sad_decode(struct decoded_addr *res) res->imc = GET_BITFIELD(d->mcroute, lchan * 3, lchan * 3 + 2); res->channel = GET_BITFIELD(d->mcroute, lchan * 2 + 18, lchan * 2 + 19); - edac_dbg(2, "%llx: socket=%d imc=%d channel=%d\n", + edac_dbg(2, "%llx: socket=%d imc=%d channel=%llu\n", res->addr, res->socket, res->imc, res->channel); return true; } @@ -818,7 +833,7 @@ static bool skx_rir_decode(struct decoded_addr *res) res->dimm = chan_rank / 4; res->rank = chan_rank % 4; - edac_dbg(2, "%llx: dimm=%d rank=%d chan_rank=%d rank_addr=%llx\n", + edac_dbg(2, "%llx: dimm=%llu rank=%d chan_rank=%d rank_addr=%llx\n", res->addr, res->dimm, res->rank, res->channel_rank, res->rank_address); return true; @@ -894,12 +909,55 @@ static bool skx_decode(struct decoded_addr *res) skx_rir_decode(res) && skx_mad_decode(res); } +static bool skx_dsm_decode(u64 addr, char *msg, int msglen, + u64 *sock, u64 *imc, + u64 *chan, u64 *dimm) + +{ + u64 *values; + int i, len = 0; + + if (addr >= skx_tohm || (addr >= skx_tolm && addr < BIT_ULL(32))) { + edac_dbg(0, "Address %llx out of range\n", addr); + return false; + } + + values = kcalloc(adxl_component_count, sizeof(*values), GFP_KERNEL); + if (!values) { + edac_dbg(0, "Out of memory\n"); + return false; + } + + if (adxl_decode(addr, values)) { + edac_dbg(0, "Failed to decode 0x%llx\n", addr); + kfree(values); + return false; + } + + *sock = values[component_indices[0]]; + *imc = values[component_indices[1]]; + *chan = values[component_indices[2]]; + *dimm = values[component_indices[3]]; + + for (i = 0; i < adxl_component_count; i++) { + len += snprintf(msg + len, msglen, " %s:0x%llx", + adxl_component_names[i], values[i]); + + if (msglen - len <= 0) + break; + } + + kfree(values); + return true; +} + static void skx_mce_output_error(struct mem_ctl_info *mci, const struct mce *m, - struct decoded_addr *res) + struct decoded_addr *res, + char *msg) { enum hw_event_mc_err_type tp_event; - char *type, *optype, msg[256]; + char *type, *optype; bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0); bool overflow = GET_BITFIELD(m->status, 62, 62); bool uncorrected_error = GET_BITFIELD(m->status, 61, 61); @@ -960,13 +1018,11 @@ static void skx_mce_output_error(struct mem_ctl_info *mci, } } - snprintf(msg, sizeof(msg), - "%s%s err_code:%04x:%04x socket:%d imc:%d rank:%d bg:%d ba:%d row:%x col:%x", + snprintf(msg, MSG_SIZE, + "%s%s err_code:%04x:%04x %s", overflow ? " OVERFLOW" : "", (uncorrected_error && recoverable) ? " recoverable" : "", - mscod, errcode, - res->socket, res->imc, res->rank, - res->bank_group, res->bank_address, res->row, res->column); + mscod, errcode, msg + MSG_SIZE); edac_dbg(0, "%s\n", msg); @@ -977,13 +1033,33 @@ static void skx_mce_output_error(struct mem_ctl_info *mci, optype, msg); } +static struct mem_ctl_info *get_mci(u64 src_id, u64 lmc) +{ + struct skx_dev *d; + + if (lmc > NUM_IMC - 1) { + skx_printk(KERN_ERR, "Bad mc# 0x%llx\n", lmc); + return NULL; + } + + list_for_each_entry(d, &skx_edac_list, list) { + if (d->imc[0].src_id == src_id) + return d->imc[lmc].mci; + } + + skx_printk(KERN_ERR, "No mci for src_id %llx lmc %llx\n", src_id, lmc); + + return NULL; +} + static int skx_mce_check_error(struct notifier_block *nb, unsigned long val, void *data) { struct mce *mce = (struct mce *)data; struct decoded_addr res; struct mem_ctl_info *mci; - char *type; + u64 socket, memctrl; + char *type, *msg; if (edac_get_report_status() == EDAC_REPORTING_DISABLED) return NOTIFY_DONE; @@ -992,11 +1068,35 @@ static int skx_mce_check_error(struct notifier_block *nb, unsigned long val, if ((mce->status & 0xefff) >> 7 != 1 || !(mce->status & MCI_STATUS_ADDRV)) return NOTIFY_DONE; + msg = kzalloc(MSG_SIZE * 2, GFP_KERNEL); + if (!msg) + return NOTIFY_DONE; + + if (adxl_component_count) + goto dsm_decoding; + res.addr = mce->addr; if (!skx_decode(&res)) return NOTIFY_DONE; mci = res.dev->imc[res.imc].mci; + snprintf(msg + MSG_SIZE, MSG_SIZE, + "socket:%d imc:%d chan:%llu dimm:%llu rank:%d bg:%d ba:%d row:%x col:%x", + res.socket, res.imc, res.channel, res.dimm, res.rank, + res.bank_group, res.bank_address, res.row, res.column); + + goto decoded; + +dsm_decoding: + if (!skx_dsm_decode(mce->addr, msg + MSG_SIZE, MSG_SIZE, + &socket, &memctrl, &res.channel, &res.dimm)) + goto out; + + mci = get_mci(socket, memctrl); + if (!mci) + goto out; + +decoded: if (mce->mcgstatus & MCG_STATUS_MCIP) type = "Exception"; else @@ -1015,8 +1115,10 @@ static int skx_mce_check_error(struct notifier_block *nb, unsigned long val, "%u APIC %x\n", mce->cpuvendor, mce->cpuid, mce->time, mce->socketid, mce->apicid); - skx_mce_output_error(mci, mce, &res); + skx_mce_output_error(mci, mce, &res, msg); +out: + kfree(msg); return NOTIFY_DONE; } @@ -1090,6 +1192,44 @@ static void skx_remove(void) } } +static void __init skx_dsm_get(void) +{ + const char * const *names; + int i, j, n; + + names = adxl_get_component_names(); + if (!names) { + skx_printk(KERN_NOTICE, "No firmware supports for address translation."); + skx_printk(KERN_CONT, " Only decoding DDR4 address!\n"); + return; + } + + n = ARRAY_SIZE(component_names); + for (i = 0; i < n; i++) { + for (j = 0; names[j]; j++) { + if (!strcmp(component_names[i], names[j])) { + component_indices[i] = j; + break; + } + } + + if (!names[j]) + goto err; + } + + adxl_component_names = names; + while (*names++) + adxl_component_count++; + + return; +err: + skx_printk(KERN_ERR, "'%s' is not matched from DSM parameters: ", + component_names[i]); + for (j = 0; names[j]; j++) + skx_printk(KERN_CONT, "%s ", names[j]); + skx_printk(KERN_CONT, "\n"); +} + /* * skx_init: * make sure we are running on the correct cpu model @@ -1154,6 +1294,9 @@ static int __init skx_init(void) } } + if (nvdimm_count) + skx_dsm_get(); + /* Ensure that the OPSTATE is set correctly for POLL or NMI */ opstate_init(); @@ -1180,6 +1323,14 @@ module_exit(skx_exit); module_param(edac_op_state, int, 0444); MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state: 0=Poll,1=NMI"); +module_param_string(sock, component_names[0], sizeof(component_names[0]), 0644); +MODULE_PARM_DESC(sock, "String to get socket ID"); +module_param_string(imc, component_names[1], sizeof(component_names[1]), 0644); +MODULE_PARM_DESC(imc, "String to get IMC ID"); +module_param_string(chan, component_names[2], sizeof(component_names[2]), 0644); +MODULE_PARM_DESC(chan, "String to get channel ID"); +module_param_string(dimm, component_names[3], sizeof(component_names[3]), 0644); +MODULE_PARM_DESC(dimm, "String to get DIMM ID"); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Tony Luck");