From patchwork Fri Jul 22 23:33:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 12926976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20AFBCCA473 for ; Fri, 22 Jul 2022 23:33:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233902AbiGVXdx (ORCPT ); Fri, 22 Jul 2022 19:33:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232070AbiGVXdw (ORCPT ); Fri, 22 Jul 2022 19:33:52 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 800F688CC7; Fri, 22 Jul 2022 16:33:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658532831; x=1690068831; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mLYExMQ8UHt3jObNveidiNRuSpjvL66K2gZRbegeAfs=; b=MHKHRZpl4mPptQUbF6u/v+iJMY2tuzs+MhXYTCl09xnVZZrvm0Y0LH1s Xqypzi3PXf08WdPSJRiqhIE8PGTLRpQcjQsd+qSrIbp2ljlLiSXxnhKK0 SGwPqKGHabBvZ51lZrlXHxKeEQ1Og8x38rLPdipS8grTzffPDEB+dd5da IqhWL2e9BYPOTtUjVX5cqI0bIkkVxcP8F/G4WqJrnl3u9hz8rKh8XgKxt ZBUd6Q3PPaa1/tPzSU7lkX+cAwATF24bxltHP8CHliEQRM8KJftKTqOsv EqfaHFY0XOXT5lr5IDM5luDBSkjXxXPRTfPt/L0E86nT9h10oOxyW6LKD Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10416"; a="286185332" X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="286185332" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:48 -0700 X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="574346061" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:47 -0700 From: Tony Luck To: linux-edac@vger.kernel.org Cc: Qiuxu Zhuo , Tony Luck , Aristeu Rozanski , Borislav Petkov , Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, patches@lists.linux.dev Subject: [PATCH 1/4] EDAC/skx_common: Add ChipSelect ADXL component Date: Fri, 22 Jul 2022 16:33:35 -0700 Message-Id: <20220722233338.341567-2-tony.luck@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220722233338.341567-1-tony.luck@intel.com> References: <20220722233338.341567-1-tony.luck@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Qiuxu Zhuo Each pseudo channel of HBM has its own retry_rd_err_log registers. The bit 0 of ChipSelect ADXL component encodes the pseudo channel number of HBM memory. So add ChipSelect ADXL component to get HBM pseudo channel number. Signed-off-by: Qiuxu Zhuo Signed-off-by: Tony Luck --- drivers/edac/skx_common.h | 4 ++++ drivers/edac/skx_common.c | 5 +++++ 2 files changed, 9 insertions(+) diff --git a/drivers/edac/skx_common.h b/drivers/edac/skx_common.h index 03ac067a80b9..70ec4f41911b 100644 --- a/drivers/edac/skx_common.h +++ b/drivers/edac/skx_common.h @@ -108,16 +108,19 @@ enum { INDEX_MEMCTRL, INDEX_CHANNEL, INDEX_DIMM, + INDEX_CS, INDEX_NM_FIRST, INDEX_NM_MEMCTRL = INDEX_NM_FIRST, INDEX_NM_CHANNEL, INDEX_NM_DIMM, + INDEX_NM_CS, INDEX_MAX }; #define BIT_NM_MEMCTRL BIT_ULL(INDEX_NM_MEMCTRL) #define BIT_NM_CHANNEL BIT_ULL(INDEX_NM_CHANNEL) #define BIT_NM_DIMM BIT_ULL(INDEX_NM_DIMM) +#define BIT_NM_CS BIT_ULL(INDEX_NM_CS) struct decoded_addr { struct skx_dev *dev; @@ -129,6 +132,7 @@ struct decoded_addr { int sktways; int chanways; int dimm; + int cs; int rank; int channel_rank; u64 rank_address; diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c index 19c17c5198c5..ee074fb507d8 100644 --- a/drivers/edac/skx_common.c +++ b/drivers/edac/skx_common.c @@ -27,9 +27,11 @@ static const char * const component_names[] = { [INDEX_MEMCTRL] = "MemoryControllerId", [INDEX_CHANNEL] = "ChannelId", [INDEX_DIMM] = "DimmSlotId", + [INDEX_CS] = "ChipSelect", [INDEX_NM_MEMCTRL] = "NmMemoryControllerId", [INDEX_NM_CHANNEL] = "NmChannelId", [INDEX_NM_DIMM] = "NmDimmSlotId", + [INDEX_NM_CS] = "NmChipSelect", }; static int component_indices[ARRAY_SIZE(component_names)]; @@ -139,10 +141,13 @@ static bool skx_adxl_decode(struct decoded_addr *res, bool error_in_1st_level_me (int)adxl_values[component_indices[INDEX_NM_CHANNEL]] : -1; res->dimm = (adxl_nm_bitmap & BIT_NM_DIMM) ? (int)adxl_values[component_indices[INDEX_NM_DIMM]] : -1; + res->cs = (adxl_nm_bitmap & BIT_NM_CS) ? + (int)adxl_values[component_indices[INDEX_NM_CS]] : -1; } else { res->imc = (int)adxl_values[component_indices[INDEX_MEMCTRL]]; res->channel = (int)adxl_values[component_indices[INDEX_CHANNEL]]; res->dimm = (int)adxl_values[component_indices[INDEX_DIMM]]; + res->cs = (int)adxl_values[component_indices[INDEX_CS]]; } if (res->imc > NUM_IMC - 1 || res->imc < 0) { From patchwork Fri Jul 22 23:33:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 12926977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA679CCA473 for ; Fri, 22 Jul 2022 23:33:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234190AbiGVXdy (ORCPT ); Fri, 22 Jul 2022 19:33:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233326AbiGVXdw (ORCPT ); Fri, 22 Jul 2022 19:33:52 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E3DF88CE3; Fri, 22 Jul 2022 16:33:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658532832; x=1690068832; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2eUb6jjFi44WtfaMDFPHfogl5CxsvKABtplSHamEE8o=; b=YSvJiDydWYEBHan/JuhlO1j2yNlB55eEgFT9WxosCwUcVMqzY1tkH4t4 q10n1SNoMDSjS1VI7xBxyxMQs9nWN1YxwZHhF1PKqyNvvqqirpJRrs9eo owUvQjEDdbGWigVv71/ZPATjeSKRh9UIgczFi7uZVZWDIBtq0ZT6VdDu/ bXZMeABTM+o2DisszO5uOTcPzLI95/dLv7fUqUOyk38zrcNZxaavprsYR PY7KXntUko3JRwEEgrU+I4SaqD+Ix/bJRv++ErzPapKxPXliriU7C+nWL d6I79hbyskF+Gz6kpClmChl/OyHs7epbe/E3n1MGJPzBXkGEkp3bZEipM A==; X-IronPort-AV: E=McAfee;i="6400,9594,10416"; a="286185333" X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="286185333" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:48 -0700 X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="574346064" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:47 -0700 From: Tony Luck To: linux-edac@vger.kernel.org Cc: Qiuxu Zhuo , Tony Luck , Aristeu Rozanski , Borislav Petkov , Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, patches@lists.linux.dev Subject: [PATCH 2/4] EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM Date: Fri, 22 Jul 2022 16:33:36 -0700 Message-Id: <20220722233338.341567-3-tony.luck@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220722233338.341567-1-tony.luck@intel.com> References: <20220722233338.341567-1-tony.luck@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Qiuxu Zhuo An HBM memory channel is divided into two pseudo channels. Each pseudo channel has its own retry_rd_err_log registers. Retrieve and print retry_rd_err_log registers of the HBM pseudo channel if the memory error is from HBM. Signed-off-by: Qiuxu Zhuo Signed-off-by: Tony Luck --- drivers/edac/skx_common.h | 4 ++ drivers/edac/i10nm_base.c | 84 +++++++++++++++++++++++++++++++-------- 2 files changed, 71 insertions(+), 17 deletions(-) diff --git a/drivers/edac/skx_common.h b/drivers/edac/skx_common.h index 70ec4f41911b..dbf8e458ad2b 100644 --- a/drivers/edac/skx_common.h +++ b/drivers/edac/skx_common.h @@ -158,7 +158,11 @@ struct res_config { int sad_all_offset; /* Offsets of retry_rd_err_log registers */ u32 *offsets_scrub; + u32 *offsets_scrub_hbm0; + u32 *offsets_scrub_hbm1; u32 *offsets_demand; + u32 *offsets_demand_hbm0; + u32 *offsets_demand_hbm1; }; typedef int (*get_dimm_config_f)(struct mem_ctl_info *mci, diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c index 6cf50ee0b77c..976d8e8a4d1b 100644 --- a/drivers/edac/i10nm_base.c +++ b/drivers/edac/i10nm_base.c @@ -77,18 +77,20 @@ static int retry_rd_err_log; static u32 offsets_scrub_icx[] = {0x22c60, 0x22c54, 0x22c5c, 0x22c58, 0x22c28, 0x20ed8}; static u32 offsets_scrub_spr[] = {0x22c60, 0x22c54, 0x22f08, 0x22c58, 0x22c28, 0x20ed8}; +static u32 offsets_scrub_spr_hbm0[] = {0x2860, 0x2854, 0x2b08, 0x2858, 0x2828, 0x0ed8}; +static u32 offsets_scrub_spr_hbm1[] = {0x2c60, 0x2c54, 0x2f08, 0x2c58, 0x2c28, 0x0fa8}; static u32 offsets_demand_icx[] = {0x22e54, 0x22e60, 0x22e64, 0x22e58, 0x22e5c, 0x20ee0}; static u32 offsets_demand_spr[] = {0x22e54, 0x22e60, 0x22f10, 0x22e58, 0x22e5c, 0x20ee0}; +static u32 offsets_demand_spr_hbm0[] = {0x2a54, 0x2a60, 0x2b10, 0x2a58, 0x2a5c, 0x0ee0}; +static u32 offsets_demand_spr_hbm1[] = {0x2e54, 0x2e60, 0x2f10, 0x2e58, 0x2e5c, 0x0fb0}; -static void __enable_retry_rd_err_log(struct skx_imc *imc, int chan, bool enable) +static void __enable_retry_rd_err_log(struct skx_imc *imc, int chan, bool enable, + u32 *offsets_scrub, u32 *offsets_demand) { u32 s, d; - if (!imc->mbase) - return; - - s = I10NM_GET_REG32(imc, chan, res_cfg->offsets_scrub[0]); - d = I10NM_GET_REG32(imc, chan, res_cfg->offsets_demand[0]); + s = I10NM_GET_REG32(imc, chan, offsets_scrub[0]); + d = I10NM_GET_REG32(imc, chan, offsets_demand[0]); if (enable) { /* Save default configurations */ @@ -115,21 +117,39 @@ static void __enable_retry_rd_err_log(struct skx_imc *imc, int chan, bool enable d &= ~RETRY_RD_ERR_LOG_EN; } - I10NM_SET_REG32(imc, chan, res_cfg->offsets_scrub[0], s); - I10NM_SET_REG32(imc, chan, res_cfg->offsets_demand[0], d); + I10NM_SET_REG32(imc, chan, offsets_scrub[0], s); + I10NM_SET_REG32(imc, chan, offsets_demand[0], d); } static void enable_retry_rd_err_log(bool enable) { + struct skx_imc *imc; struct skx_dev *d; int i, j; edac_dbg(2, "\n"); list_for_each_entry(d, i10nm_edac_list, list) - for (i = 0; i < I10NM_NUM_IMC; i++) - for (j = 0; j < I10NM_NUM_CHANNELS; j++) - __enable_retry_rd_err_log(&d->imc[i], j, enable); + for (i = 0; i < I10NM_NUM_IMC; i++) { + imc = &d->imc[i]; + if (!imc->mbase) + continue; + + for (j = 0; j < I10NM_NUM_CHANNELS; j++) { + if (imc->hbm_mc) { + __enable_retry_rd_err_log(imc, j, enable, + res_cfg->offsets_scrub_hbm0, + res_cfg->offsets_demand_hbm0); + __enable_retry_rd_err_log(imc, j, enable, + res_cfg->offsets_scrub_hbm1, + res_cfg->offsets_demand_hbm1); + } else { + __enable_retry_rd_err_log(imc, j, enable, + res_cfg->offsets_scrub, + res_cfg->offsets_demand); + } + } + } } static void show_retry_rd_err_log(struct decoded_addr *res, char *msg, @@ -140,12 +160,24 @@ static void show_retry_rd_err_log(struct decoded_addr *res, char *msg, u32 corr0, corr1, corr2, corr3; u64 log2a, log5; u32 *offsets; - int n; + int n, pch; if (!imc->mbase) return; - offsets = scrub_err ? res_cfg->offsets_scrub : res_cfg->offsets_demand; + if (imc->hbm_mc) { + pch = res->cs & 1; + + if (pch) + offsets = scrub_err ? res_cfg->offsets_scrub_hbm1 : + res_cfg->offsets_demand_hbm1; + else + offsets = scrub_err ? res_cfg->offsets_scrub_hbm0 : + res_cfg->offsets_demand_hbm0; + } else { + offsets = scrub_err ? res_cfg->offsets_scrub : + res_cfg->offsets_demand; + } log0 = I10NM_GET_REG32(imc, res->channel, offsets[0]); log1 = I10NM_GET_REG32(imc, res->channel, offsets[1]); @@ -163,10 +195,24 @@ static void show_retry_rd_err_log(struct decoded_addr *res, char *msg, log0, log1, log2, log3, log4, log5); } - corr0 = I10NM_GET_REG32(imc, res->channel, 0x22c18); - corr1 = I10NM_GET_REG32(imc, res->channel, 0x22c1c); - corr2 = I10NM_GET_REG32(imc, res->channel, 0x22c20); - corr3 = I10NM_GET_REG32(imc, res->channel, 0x22c24); + if (imc->hbm_mc) { + if (pch) { + corr0 = I10NM_GET_REG32(imc, res->channel, 0x2c18); + corr1 = I10NM_GET_REG32(imc, res->channel, 0x2c1c); + corr2 = I10NM_GET_REG32(imc, res->channel, 0x2c20); + corr3 = I10NM_GET_REG32(imc, res->channel, 0x2c24); + } else { + corr0 = I10NM_GET_REG32(imc, res->channel, 0x2818); + corr1 = I10NM_GET_REG32(imc, res->channel, 0x281c); + corr2 = I10NM_GET_REG32(imc, res->channel, 0x2820); + corr3 = I10NM_GET_REG32(imc, res->channel, 0x2824); + } + } else { + corr0 = I10NM_GET_REG32(imc, res->channel, 0x22c18); + corr1 = I10NM_GET_REG32(imc, res->channel, 0x22c1c); + corr2 = I10NM_GET_REG32(imc, res->channel, 0x22c20); + corr3 = I10NM_GET_REG32(imc, res->channel, 0x22c24); + } if (len - n > 0) snprintf(msg + n, len - n, @@ -420,7 +466,11 @@ static struct res_config spr_cfg = { .sad_all_devfn = PCI_DEVFN(10, 0), .sad_all_offset = 0x300, .offsets_scrub = offsets_scrub_spr, + .offsets_scrub_hbm0 = offsets_scrub_spr_hbm0, + .offsets_scrub_hbm1 = offsets_scrub_spr_hbm1, .offsets_demand = offsets_demand_spr, + .offsets_demand_hbm0 = offsets_demand_spr_hbm0, + .offsets_demand_hbm1 = offsets_demand_spr_hbm1, }; static const struct x86_cpu_id i10nm_cpuids[] = { From patchwork Fri Jul 22 23:33:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 12926975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAA66C43334 for ; Fri, 22 Jul 2022 23:33:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231751AbiGVXdv (ORCPT ); Fri, 22 Jul 2022 19:33:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229667AbiGVXdu (ORCPT ); Fri, 22 Jul 2022 19:33:50 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 707A888CC7; Fri, 22 Jul 2022 16:33:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658532829; x=1690068829; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YLLsmMKFc2+kDIXcFMf/ZTzeLOXAFZpqVHC2oaoMI2k=; b=DrMI2hOvEJLg5+fU5MRpeSAZBHE4vcL4/ddxKjZl12e2VBUGKHTtaZWn x4LWdFqTrm/f4qjb8GHHrqEuiWJlpkqrHSdVcX8+MANFNj45fZMBbJRle MhZEGSWIeJVCPBxl32SFZdHM5KNcNOmNOwTb86vAKsFTheKa+UFYQ7ZR0 3Q3ITAaFTfOOrxqDl12a1a9aBqqdyQZcb07WzWRaRP/EIQMGAobl001Xh dvZKBmOVruQyJZ5wU5RFYPgikk8fhu+fHG8ZP+fcz/wivaIFkfFyAtm7q ItU63VQACRKVA4mU1GENXZMFJ+ccx/g83LKS6EnzJ71EsrWNPZfakeW9E Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10416"; a="373730364" X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="373730364" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:48 -0700 X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="574346067" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:48 -0700 From: Tony Luck To: linux-edac@vger.kernel.org Cc: Qiuxu Zhuo , Tony Luck , Aristeu Rozanski , Borislav Petkov , Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, patches@lists.linux.dev Subject: [PATCH 3/4] EDAC/i10nm: Print an extra register set of retry_rd_err_log Date: Fri, 22 Jul 2022 16:33:37 -0700 Message-Id: <20220722233338.341567-4-tony.luck@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220722233338.341567-1-tony.luck@intel.com> References: <20220722233338.341567-1-tony.luck@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Qiuxu Zhuo Sapphire Rapids server adds an extra register set for logging more retry_rd_err_log data. So add code to print the extra register set. Signed-off-by: Qiuxu Zhuo Signed-off-by: Tony Luck --- drivers/edac/skx_common.h | 2 + drivers/edac/i10nm_base.c | 81 +++++++++++++++++++++++++++++++++------ 2 files changed, 72 insertions(+), 11 deletions(-) diff --git a/drivers/edac/skx_common.h b/drivers/edac/skx_common.h index dbf8e458ad2b..e7226e935718 100644 --- a/drivers/edac/skx_common.h +++ b/drivers/edac/skx_common.h @@ -82,6 +82,7 @@ struct skx_dev { struct pci_dev *edev; u32 retry_rd_err_log_s; u32 retry_rd_err_log_d; + u32 retry_rd_err_log_d2; struct skx_dimm { u8 close_pg; u8 bank_xor_enable; @@ -161,6 +162,7 @@ struct res_config { u32 *offsets_scrub_hbm0; u32 *offsets_scrub_hbm1; u32 *offsets_demand; + u32 *offsets_demand2; u32 *offsets_demand_hbm0; u32 *offsets_demand_hbm1; }; diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c index 976d8e8a4d1b..610ab8cce873 100644 --- a/drivers/edac/i10nm_base.c +++ b/drivers/edac/i10nm_base.c @@ -81,26 +81,38 @@ static u32 offsets_scrub_spr_hbm0[] = {0x2860, 0x2854, 0x2b08, 0x2858, 0x2828, static u32 offsets_scrub_spr_hbm1[] = {0x2c60, 0x2c54, 0x2f08, 0x2c58, 0x2c28, 0x0fa8}; static u32 offsets_demand_icx[] = {0x22e54, 0x22e60, 0x22e64, 0x22e58, 0x22e5c, 0x20ee0}; static u32 offsets_demand_spr[] = {0x22e54, 0x22e60, 0x22f10, 0x22e58, 0x22e5c, 0x20ee0}; +static u32 offsets_demand2_spr[] = {0x22c70, 0x22d80, 0x22f18, 0x22d58, 0x22c64, 0x20f10}; static u32 offsets_demand_spr_hbm0[] = {0x2a54, 0x2a60, 0x2b10, 0x2a58, 0x2a5c, 0x0ee0}; static u32 offsets_demand_spr_hbm1[] = {0x2e54, 0x2e60, 0x2f10, 0x2e58, 0x2e5c, 0x0fb0}; static void __enable_retry_rd_err_log(struct skx_imc *imc, int chan, bool enable, - u32 *offsets_scrub, u32 *offsets_demand) + u32 *offsets_scrub, u32 *offsets_demand, + u32 *offsets_demand2) { - u32 s, d; + u32 s, d, d2; s = I10NM_GET_REG32(imc, chan, offsets_scrub[0]); d = I10NM_GET_REG32(imc, chan, offsets_demand[0]); + if (offsets_demand2) + d2 = I10NM_GET_REG32(imc, chan, offsets_demand2[0]); if (enable) { /* Save default configurations */ imc->chan[chan].retry_rd_err_log_s = s; imc->chan[chan].retry_rd_err_log_d = d; + if (offsets_demand2) + imc->chan[chan].retry_rd_err_log_d2 = d2; s &= ~RETRY_RD_ERR_LOG_NOOVER_UC; s |= RETRY_RD_ERR_LOG_EN; d &= ~RETRY_RD_ERR_LOG_NOOVER_UC; d |= RETRY_RD_ERR_LOG_EN; + + if (offsets_demand2) { + d2 &= ~RETRY_RD_ERR_LOG_UC; + d2 |= RETRY_RD_ERR_LOG_NOOVER; + d2 |= RETRY_RD_ERR_LOG_EN; + } } else { /* Restore default configurations */ if (imc->chan[chan].retry_rd_err_log_s & RETRY_RD_ERR_LOG_UC) @@ -115,10 +127,21 @@ static void __enable_retry_rd_err_log(struct skx_imc *imc, int chan, bool enable d |= RETRY_RD_ERR_LOG_NOOVER; if (!(imc->chan[chan].retry_rd_err_log_d & RETRY_RD_ERR_LOG_EN)) d &= ~RETRY_RD_ERR_LOG_EN; + + if (offsets_demand2) { + if (imc->chan[chan].retry_rd_err_log_d2 & RETRY_RD_ERR_LOG_UC) + d2 |= RETRY_RD_ERR_LOG_UC; + if (!(imc->chan[chan].retry_rd_err_log_d2 & RETRY_RD_ERR_LOG_NOOVER)) + d2 &= ~RETRY_RD_ERR_LOG_NOOVER; + if (!(imc->chan[chan].retry_rd_err_log_d2 & RETRY_RD_ERR_LOG_EN)) + d2 &= ~RETRY_RD_ERR_LOG_EN; + } } I10NM_SET_REG32(imc, chan, offsets_scrub[0], s); I10NM_SET_REG32(imc, chan, offsets_demand[0], d); + if (offsets_demand2) + I10NM_SET_REG32(imc, chan, offsets_demand2[0], d2); } static void enable_retry_rd_err_log(bool enable) @@ -139,14 +162,17 @@ static void enable_retry_rd_err_log(bool enable) if (imc->hbm_mc) { __enable_retry_rd_err_log(imc, j, enable, res_cfg->offsets_scrub_hbm0, - res_cfg->offsets_demand_hbm0); + res_cfg->offsets_demand_hbm0, + NULL); __enable_retry_rd_err_log(imc, j, enable, res_cfg->offsets_scrub_hbm1, - res_cfg->offsets_demand_hbm1); + res_cfg->offsets_demand_hbm1, + NULL); } else { __enable_retry_rd_err_log(imc, j, enable, res_cfg->offsets_scrub, - res_cfg->offsets_demand); + res_cfg->offsets_demand, + res_cfg->offsets_demand2); } } } @@ -158,7 +184,10 @@ static void show_retry_rd_err_log(struct decoded_addr *res, char *msg, struct skx_imc *imc = &res->dev->imc[res->imc]; u32 log0, log1, log2, log3, log4; u32 corr0, corr1, corr2, corr3; + u32 lxg0, lxg1, lxg3, lxg4; + u32 *xffsets = NULL; u64 log2a, log5; + u64 lxg2a, lxg5; u32 *offsets; int n, pch; @@ -175,8 +204,12 @@ static void show_retry_rd_err_log(struct decoded_addr *res, char *msg, offsets = scrub_err ? res_cfg->offsets_scrub_hbm0 : res_cfg->offsets_demand_hbm0; } else { - offsets = scrub_err ? res_cfg->offsets_scrub : - res_cfg->offsets_demand; + if (scrub_err) { + offsets = res_cfg->offsets_scrub; + } else { + offsets = res_cfg->offsets_demand; + xffsets = res_cfg->offsets_demand2; + } } log0 = I10NM_GET_REG32(imc, res->channel, offsets[0]); @@ -185,10 +218,28 @@ static void show_retry_rd_err_log(struct decoded_addr *res, char *msg, log4 = I10NM_GET_REG32(imc, res->channel, offsets[4]); log5 = I10NM_GET_REG64(imc, res->channel, offsets[5]); + if (xffsets) { + lxg0 = I10NM_GET_REG32(imc, res->channel, xffsets[0]); + lxg1 = I10NM_GET_REG32(imc, res->channel, xffsets[1]); + lxg3 = I10NM_GET_REG32(imc, res->channel, xffsets[3]); + lxg4 = I10NM_GET_REG32(imc, res->channel, xffsets[4]); + lxg5 = I10NM_GET_REG64(imc, res->channel, xffsets[5]); + } + if (res_cfg->type == SPR) { log2a = I10NM_GET_REG64(imc, res->channel, offsets[2]); - n = snprintf(msg, len, " retry_rd_err_log[%.8x %.8x %.16llx %.8x %.8x %.16llx]", + n = snprintf(msg, len, " retry_rd_err_log[%.8x %.8x %.16llx %.8x %.8x %.16llx", log0, log1, log2a, log3, log4, log5); + + if (len - n > 0) { + if (xffsets) { + lxg2a = I10NM_GET_REG64(imc, res->channel, xffsets[2]); + n += snprintf(msg + n, len - n, " %.8x %.8x %.16llx %.8x %.8x %.16llx]", + lxg0, lxg1, lxg2a, lxg3, lxg4, lxg5); + } else { + n += snprintf(msg + n, len - n, "]"); + } + } } else { log2 = I10NM_GET_REG32(imc, res->channel, offsets[2]); n = snprintf(msg, len, " retry_rd_err_log[%.8x %.8x %.8x %.8x %.8x %.16llx]", @@ -223,9 +274,16 @@ static void show_retry_rd_err_log(struct decoded_addr *res, char *msg, corr3 & 0xffff, corr3 >> 16); /* Clear status bits */ - if (retry_rd_err_log == 2 && (log0 & RETRY_RD_ERR_LOG_OVER_UC_V)) { - log0 &= ~RETRY_RD_ERR_LOG_OVER_UC_V; - I10NM_SET_REG32(imc, res->channel, offsets[0], log0); + if (retry_rd_err_log == 2) { + if (log0 & RETRY_RD_ERR_LOG_OVER_UC_V) { + log0 &= ~RETRY_RD_ERR_LOG_OVER_UC_V; + I10NM_SET_REG32(imc, res->channel, offsets[0], log0); + } + + if (xffsets && (lxg0 & RETRY_RD_ERR_LOG_OVER_UC_V)) { + lxg0 &= ~RETRY_RD_ERR_LOG_OVER_UC_V; + I10NM_SET_REG32(imc, res->channel, xffsets[0], lxg0); + } } } @@ -469,6 +527,7 @@ static struct res_config spr_cfg = { .offsets_scrub_hbm0 = offsets_scrub_spr_hbm0, .offsets_scrub_hbm1 = offsets_scrub_spr_hbm1, .offsets_demand = offsets_demand_spr, + .offsets_demand2 = offsets_demand2_spr, .offsets_demand_hbm0 = offsets_demand_spr_hbm0, .offsets_demand_hbm1 = offsets_demand_spr_hbm1, }; From patchwork Fri Jul 22 23:33:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 12926978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 432E7C433EF for ; Fri, 22 Jul 2022 23:33:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235202AbiGVXd4 (ORCPT ); Fri, 22 Jul 2022 19:33:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233512AbiGVXdx (ORCPT ); Fri, 22 Jul 2022 19:33:53 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BE8788CE4; Fri, 22 Jul 2022 16:33:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658532832; x=1690068832; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ACBsVgaIHSROQvLoRtjDYJcRX0cG75oSZD+GPyW61TM=; b=ChgBrtXik6vAHv75cWvF6WWfJnmnNh8+lxRL6ng3CtTYpoo4KYv/DamH 8q7RS60x53bmwu98hfe5cRotH54x0ZrA/AVFNi3TCyfqy48TXxeZ4MAtR hz+9lqhXvuM/t1xdd7BQuVgYaek2dM12Ekzbnvoeo71jHdMhnNWAPTLSO b7xDg7OOlUyN3F/MvLR0e4qIdjasCWjkmKNwxowczNDHOxSte2ZpdCRLB YopPDRfX1L3sAIA+kjpTRHl37eJsJzJrYREwsW4tZuoAAD8NV3+k8ubxA vC1ewwfmvjjfFw8hTAwx7S6RE/WLcPHqUOmXqC5UNJIWslGxtfeMQbdOf Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10416"; a="286185334" X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="286185334" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:48 -0700 X-IronPort-AV: E=Sophos;i="5.93,186,1654585200"; d="scan'208";a="574346071" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2022 16:33:48 -0700 From: Tony Luck To: linux-edac@vger.kernel.org Cc: Youquan Song , Tony Luck , Aristeu Rozanski , Borislav Petkov , Mauro Carvalho Chehab , Qiuxu Zhuo , linux-kernel@vger.kernel.org, patches@lists.linux.dev Subject: [PATCH 4/4] x86/sb_edac: Add row column translation for Broadwell Date: Fri, 22 Jul 2022 16:33:38 -0700 Message-Id: <20220722233338.341567-5-tony.luck@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220722233338.341567-1-tony.luck@intel.com> References: <20220722233338.341567-1-tony.luck@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Youquan Song The sb_edac driver lacks translation for DIMM internal address. Add memory address translation for row/column/bank/bank_group on Broadwell. Signed-off-by: Youquan Song Signed-off-by: Tony Luck --- drivers/edac/sb_edac.c | 148 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 138 insertions(+), 10 deletions(-) diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c index 9678ab97c7ac..8e39370fdb5c 100644 --- a/drivers/edac/sb_edac.c +++ b/drivers/edac/sb_edac.c @@ -335,6 +335,12 @@ struct sbridge_info { struct sbridge_channel { u32 ranks; u32 dimms; + struct dimm { + u32 rowbits; + u32 colbits; + u32 bank_xor_enable; + u32 amap_fine; + } dimm[MAX_DIMMS]; }; struct pci_id_descr { @@ -1603,7 +1609,7 @@ static int __populate_dimms(struct mem_ctl_info *mci, banks = 8; for (i = 0; i < channels; i++) { - u32 mtr; + u32 mtr, amap = 0; int max_dimms_per_channel; @@ -1615,6 +1621,7 @@ static int __populate_dimms(struct mem_ctl_info *mci, max_dimms_per_channel = ARRAY_SIZE(mtr_regs); if (!pvt->pci_tad[i]) continue; + pci_read_config_dword(pvt->pci_tad[i], 0x8c, &amap); } for (j = 0; j < max_dimms_per_channel; j++) { @@ -1627,6 +1634,7 @@ static int __populate_dimms(struct mem_ctl_info *mci, mtr_regs[j], &mtr); } edac_dbg(4, "Channel #%d MTR%d = %x\n", i, j, mtr); + if (IS_DIMM_PRESENT(mtr)) { if (!IS_ECC_ENABLED(pvt->info.mcmtr)) { sbridge_printk(KERN_ERR, "CPU SrcID #%d, Ha #%d, Channel #%d has DIMMs, but ECC is disabled\n", @@ -1661,6 +1669,11 @@ static int __populate_dimms(struct mem_ctl_info *mci, dimm->dtype = pvt->info.get_width(pvt, mtr); dimm->mtype = mtype; dimm->edac_mode = mode; + pvt->channel[i].dimm[j].rowbits = order_base_2(rows); + pvt->channel[i].dimm[j].colbits = order_base_2(cols); + pvt->channel[i].dimm[j].bank_xor_enable = + GET_BITFIELD(pvt->info.mcmtr, 9, 9); + pvt->channel[i].dimm[j].amap_fine = GET_BITFIELD(amap, 0, 0); snprintf(dimm->label, sizeof(dimm->label), "CPU_SrcID#%u_Ha#%u_Chan#%u_DIMM#%u", pvt->sbridge_dev->source_id, pvt->sbridge_dev->dom, i, j); @@ -1922,6 +1935,99 @@ static struct mem_ctl_info *get_mci_for_node_id(u8 node_id, u8 ha) return NULL; } +static u8 sb_close_row[] = { + 15, 16, 17, 18, 20, 21, 22, 28, 10, 11, 12, 13, 29, 30, 31, 32, 33 +}; + +static u8 sb_close_column[] = { + 3, 4, 5, 14, 19, 23, 24, 25, 26, 27 +}; + +static u8 sb_open_row[] = { + 14, 15, 16, 20, 28, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33 +}; + +static u8 sb_open_column[] = { + 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 +}; + +static u8 sb_open_fine_column[] = { + 3, 4, 5, 7, 8, 9, 10, 11, 12, 13 +}; + +static int sb_bits(u64 addr, int nbits, u8 *bits) +{ + int i, res = 0; + + for (i = 0; i < nbits; i++) + res |= ((addr >> bits[i]) & 1) << i; + return res; +} + +static int sb_bank_bits(u64 addr, int b0, int b1, int do_xor, int x0, int x1) +{ + int ret = GET_BITFIELD(addr, b0, b0) | (GET_BITFIELD(addr, b1, b1) << 1); + + if (do_xor) + ret ^= GET_BITFIELD(addr, x0, x0) | (GET_BITFIELD(addr, x1, x1) << 1); + + return ret; +} + +static bool sb_decode_ddr4(struct mem_ctl_info *mci, int ch, u8 rank, + u64 rank_addr, char *msg) +{ + int dimmno = 0; + int row, col, bank_address, bank_group; + struct sbridge_pvt *pvt; + u32 bg0 = 0, rowbits = 0, colbits = 0; + u32 amap_fine = 0, bank_xor_enable = 0; + + dimmno = (rank < 12) ? rank / 4 : 2; + pvt = mci->pvt_info; + amap_fine = pvt->channel[ch].dimm[dimmno].amap_fine; + bg0 = amap_fine ? 6 : 13; + rowbits = pvt->channel[ch].dimm[dimmno].rowbits; + colbits = pvt->channel[ch].dimm[dimmno].colbits; + bank_xor_enable = pvt->channel[ch].dimm[dimmno].bank_xor_enable; + + if (pvt->is_lockstep) { + pr_warn_once("LockStep row/column decode is not supported yet!\n"); + msg[0] = '\0'; + return false; + } + + if (pvt->is_close_pg) { + row = sb_bits(rank_addr, rowbits, sb_close_row); + col = sb_bits(rank_addr, colbits, sb_close_column); + col |= 0x400; /* C10 is autoprecharge, always set */ + bank_address = sb_bank_bits(rank_addr, 8, 9, bank_xor_enable, 22, 28); + bank_group = sb_bank_bits(rank_addr, 6, 7, bank_xor_enable, 20, 21); + } else { + row = sb_bits(rank_addr, rowbits, sb_open_row); + if (amap_fine) + col = sb_bits(rank_addr, colbits, sb_open_fine_column); + else + col = sb_bits(rank_addr, colbits, sb_open_column); + bank_address = sb_bank_bits(rank_addr, 18, 19, bank_xor_enable, 22, 23); + bank_group = sb_bank_bits(rank_addr, bg0, 17, bank_xor_enable, 20, 21); + } + + row &= (1u << rowbits) - 1; + + sprintf(msg, "row:0x%x col:0x%x bank_addr:%d bank_group:%d", + row, col, bank_address, bank_group); + return true; +} + +static bool sb_decode_ddr3(struct mem_ctl_info *mci, int ch, u8 rank, + u64 rank_addr, char *msg) +{ + pr_warn_once("DDR3 row/column decode not support yet!\n"); + msg[0] = '\0'; + return false; +} + static int get_memory_error_data(struct mem_ctl_info *mci, u64 addr, u8 *socket, u8 *ha, @@ -1937,12 +2043,13 @@ static int get_memory_error_data(struct mem_ctl_info *mci, int interleave_mode, shiftup = 0; unsigned int sad_interleave[MAX_INTERLEAVE]; u32 reg, dram_rule; - u8 ch_way, sck_way, pkg, sad_ha = 0; + u8 ch_way, sck_way, pkg, sad_ha = 0, rankid = 0; u32 tad_offset; u32 rir_way; u32 mb, gb; u64 ch_addr, offset, limit = 0, prv = 0; - + u64 rank_addr; + enum mem_type mtype; /* * Step 0) Check if the address is at special memory ranges @@ -2226,6 +2333,28 @@ static int get_memory_error_data(struct mem_ctl_info *mci, pci_read_config_dword(pvt->pci_tad[base_ch], rir_offset[n_rir][idx], ®); *rank = RIR_RNK_TGT(pvt->info.type, reg); + if (pvt->info.type == BROADWELL) { + if (pvt->is_close_pg) + shiftup = 6; + else + shiftup = 13; + + rank_addr = ch_addr >> shiftup; + rank_addr /= (1 << rir_way); + rank_addr <<= shiftup; + rank_addr |= ch_addr & GENMASK_ULL(shiftup - 1, 0); + rank_addr -= RIR_OFFSET(pvt->info.type, reg); + + mtype = pvt->info.get_memory_type(pvt); + rankid = *rank; + if (mtype == MEM_DDR4 || mtype == MEM_RDDR4) + sb_decode_ddr4(mci, base_ch, rankid, rank_addr, msg); + else + sb_decode_ddr3(mci, base_ch, rankid, rank_addr, msg); + } else { + msg[0] = '\0'; + } + edac_dbg(0, "RIR#%d: channel address 0x%08Lx < 0x%08Lx, RIR interleave %d, index %d\n", n_rir, ch_addr, @@ -2950,7 +3079,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci, struct mem_ctl_info *new_mci; struct sbridge_pvt *pvt = mci->pvt_info; enum hw_event_mc_err_type tp_event; - char *optype, msg[256]; + char *optype, msg[256], msg_full[512]; bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0); bool overflow = GET_BITFIELD(m->status, 62, 62); bool uncorrected_error = GET_BITFIELD(m->status, 61, 61); @@ -3089,18 +3218,17 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci, */ if (!pvt->is_lockstep && !pvt->is_cur_addr_mirrored && !pvt->is_close_pg) channel = first_channel; - - snprintf(msg, sizeof(msg), - "%s%s area:%s err_code:%04x:%04x socket:%d ha:%d channel_mask:%ld rank:%d", + snprintf(msg_full, sizeof(msg_full), + "%s%s area:%s err_code:%04x:%04x socket:%d ha:%d channel_mask:%ld rank:%d %s", overflow ? " OVERFLOW" : "", (uncorrected_error && recoverable) ? " recoverable" : "", area_type, mscod, errcode, socket, ha, channel_mask, - rank); + rank, msg); - edac_dbg(0, "%s\n", msg); + edac_dbg(0, "%s\n", msg_full); /* FIXME: need support for channel mask */ @@ -3111,7 +3239,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci, edac_mc_handle_error(tp_event, mci, core_err_cnt, m->addr >> PAGE_SHIFT, m->addr & ~PAGE_MASK, 0, channel, dimm, -1, - optype, msg); + optype, msg_full); return; err_parsing: edac_mc_handle_error(tp_event, mci, core_err_cnt, 0, 0, 0,