From patchwork Tue Nov 19 05:18:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kroah-Hartman X-Patchwork-Id: 11250851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 112EE6C1 for ; Tue, 19 Nov 2019 05:58:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E5420222ED for ; Tue, 19 Nov 2019 05:58:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1574143106; bh=UfxJx9uniduwgP9FegUeu7x2UFPsH3p8h2KakbrthM8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=x2R0b54sMjE47EB7m5pRGS5NTrw+gdk0l7HMLTGgEtwEJyAOAuG9MpOZ2Vm/b8uFS ZgXtmiSyV/EiFo5GFNhk2aNzghQvWRpUsbkJaEpUOHUfFcWHpkdwbapCba6OUA9WS/ 96IWntrPq834vMnfUnRPJbc5+CQisKvc3ODmY9Ws= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729815AbfKSF6Z (ORCPT ); Tue, 19 Nov 2019 00:58:25 -0500 Received: from mail.kernel.org ([198.145.29.99]:46246 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730917AbfKSFtb (ORCPT ); Tue, 19 Nov 2019 00:49:31 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A31C020721; Tue, 19 Nov 2019 05:49:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1574142570; bh=UfxJx9uniduwgP9FegUeu7x2UFPsH3p8h2KakbrthM8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fWvOkmQzjKpa4lQ9EWRrgw3PnH4m4YUzNL6tC0IiaIGyIMQVQgJj7347nZTvjt5c4 G7/aBC9jQV6KRWfcYQBsmotslRQ7J6jeVJZiorZ0xSZx1h4rQJpq3WFkC0LaRpEqxV Wk4GJz8aONnjFBw3oHBtA3RKzwwevVOMbg/vX0qg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Qiuxu Zhuo , Aristeu Rozanski , Mauro Carvalho Chehab , linux-edac , Tony Luck , Borislav Petkov , Sasha Levin Subject: [PATCH 4.14 086/239] EDAC, sb_edac: Return early on ADDRV bit and address type test Date: Tue, 19 Nov 2019 06:18:06 +0100 Message-Id: <20191119051318.011363831@linuxfoundation.org> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191119051255.850204959@linuxfoundation.org> References: <20191119051255.850204959@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Qiuxu Zhuo [ Upstream commit dcc960b225ceb2bd66c45e0845d03e577f7010f9 ] Users of the mce_register_decode_chain() are called for every logged error. EDAC drivers should check: 1) Is this a memory error? [bit 7 in status register] 2) Is there a valid address? [bit 58 in status register] 3) Is the address a system address? [bitfield 8:6 in misc register] The sb_edac driver performed test "1" twice. Waited far too long to perform check "2". Didn't do check "3" at all. Fix it by moving the test for valid address from sbridge_mce_output_error() into sbridge_mce_check_error() and add a test for the type immediately after. Delete the redundant check for the type of the error from sbridge_mce_output_error(). Signed-off-by: Qiuxu Zhuo Cc: Aristeu Rozanski Cc: Mauro Carvalho Chehab Cc: Qiuxu Zhuo Cc: linux-edac Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com [ Re-word commit message. ] Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Signed-off-by: Sasha Levin --- drivers/edac/sb_edac.c | 68 ++++++++++++++++++++++-------------------- 1 file changed, 35 insertions(+), 33 deletions(-) diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c index b0b390a1da154..ddd5990211f8a 100644 --- a/drivers/edac/sb_edac.c +++ b/drivers/edac/sb_edac.c @@ -2915,35 +2915,27 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci, * cccc = channel * If the mask doesn't match, report an error to the parsing logic */ - if (! ((errcode & 0xef80) == 0x80)) { - optype = "Can't parse: it is not a mem"; - } else { - switch (optypenum) { - case 0: - optype = "generic undef request error"; - break; - case 1: - optype = "memory read error"; - break; - case 2: - optype = "memory write error"; - break; - case 3: - optype = "addr/cmd error"; - break; - case 4: - optype = "memory scrubbing error"; - break; - default: - optype = "reserved"; - break; - } + switch (optypenum) { + case 0: + optype = "generic undef request error"; + break; + case 1: + optype = "memory read error"; + break; + case 2: + optype = "memory write error"; + break; + case 3: + optype = "addr/cmd error"; + break; + case 4: + optype = "memory scrubbing error"; + break; + default: + optype = "reserved"; + break; } - /* Only decode errors with an valid address (ADDRV) */ - if (!GET_BITFIELD(m->status, 58, 58)) - return; - if (pvt->info.type == KNIGHTS_LANDING) { if (channel == 14) { edac_dbg(0, "%s%s err_code:%04x:%04x EDRAM bank %d\n", @@ -3049,17 +3041,11 @@ static int sbridge_mce_check_error(struct notifier_block *nb, unsigned long val, { struct mce *mce = (struct mce *)data; struct mem_ctl_info *mci; - struct sbridge_pvt *pvt; char *type; if (edac_get_report_status() == EDAC_REPORTING_DISABLED) return NOTIFY_DONE; - mci = get_mci_for_node_id(mce->socketid, IMC0); - if (!mci) - return NOTIFY_DONE; - pvt = mci->pvt_info; - /* * Just let mcelog handle it if the error is * outside the memory controller. A memory error @@ -3069,6 +3055,22 @@ static int sbridge_mce_check_error(struct notifier_block *nb, unsigned long val, if ((mce->status & 0xefff) >> 7 != 1) return NOTIFY_DONE; + /* Check ADDRV bit in STATUS */ + if (!GET_BITFIELD(mce->status, 58, 58)) + return NOTIFY_DONE; + + /* Check MISCV bit in STATUS */ + if (!GET_BITFIELD(mce->status, 59, 59)) + return NOTIFY_DONE; + + /* Check address type in MISC (physical address only) */ + if (GET_BITFIELD(mce->misc, 6, 8) != 2) + return NOTIFY_DONE; + + mci = get_mci_for_node_id(mce->socketid, IMC0); + if (!mci) + return NOTIFY_DONE; + if (mce->mcgstatus & MCG_STATUS_MCIP) type = "Exception"; else From patchwork Tue Nov 19 05:20:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kroah-Hartman X-Patchwork-Id: 11250849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 78EDE14E5 for ; Tue, 19 Nov 2019 05:56:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 58CA421850 for ; Tue, 19 Nov 2019 05:56:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1574142981; bh=UupwU4breSByp96+YTz2uZ3ED3NUHaAkfOoof3kWAeI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=Oys11g8dVlMjD3lImkrXAj4bBMfs8V+uCc4GdEtCAh9WyCtgwSjoTeIggm6NfPnXV v08oryteVh0egvz0mvc39bRVVBVY3/1jb+1Ln5mT1M9mu3hb7yekEvYNfp4Y83NO2b iSbGHal7bGKdtoCGXHvVTqRPHjvPlbyh4J761igE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731387AbfKSFxn (ORCPT ); Tue, 19 Nov 2019 00:53:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:51644 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727888AbfKSFxl (ORCPT ); Tue, 19 Nov 2019 00:53:41 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 451EF21783; Tue, 19 Nov 2019 05:53:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1574142820; bh=UupwU4breSByp96+YTz2uZ3ED3NUHaAkfOoof3kWAeI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tPR8l6KIhQSdak79Qr/RjEVkNj1gw229oW0EJAFYQbuEc3zLfNBpC+8h7qHP7Xzxg 6gAEXypwdJCHH6n4xVv7jFwANYFWvthfATY/FaxHFpmSN2Ipn9eda8IXLMccp5BYqL NdzDkvTYVovAJJs1dxDiSwIHz0IWv7KQ6pD/Hdf8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Justin Ernst , Borislav Petkov , Russ Anderson , Mauro Carvalho Chehab , linux-edac@vger.kernel.org, Sasha Levin Subject: [PATCH 4.14 209/239] EDAC: Raise the maximum number of memory controllers Date: Tue, 19 Nov 2019 06:20:09 +0100 Message-Id: <20191119051337.711593264@linuxfoundation.org> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191119051255.850204959@linuxfoundation.org> References: <20191119051255.850204959@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Justin Ernst [ Upstream commit 6b58859419554fb824e09cfdd73151a195473cbc ] We observe an oops in the skx_edac module during boot: EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0 EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0 ... EDAC MC13: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 EDAC MC14: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0 EDAC MC15: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1 Too many memory controllers: 16 EDAC MC: Removed device 0 for skx_edac Skylake Socket#0 IMC#0 We observe there are two memory controllers per socket, with a limit of 16. Raise the maximum number of memory controllers from 16 to 2 * MAX_NUMNODES (1024). [ bp: This is just a band-aid fix until we've sorted out the whole issue with the bus_type association and handling in EDAC and can get rid of this arbitrary limit. ] Signed-off-by: Justin Ernst Signed-off-by: Borislav Petkov Acked-by: Russ Anderson Cc: Mauro Carvalho Chehab Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20180925143449.284634-1-justin.ernst@hpe.com Signed-off-by: Sasha Levin --- include/linux/edac.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/edac.h b/include/linux/edac.h index cd75c173fd00b..90f72336aea66 100644 --- a/include/linux/edac.h +++ b/include/linux/edac.h @@ -17,6 +17,7 @@ #include #include #include +#include #define EDAC_DEVICE_NAME_LEN 31 @@ -667,6 +668,6 @@ struct mem_ctl_info { /* * Maximum number of memory controllers in the coherent fabric. */ -#define EDAC_MAX_MCS 16 +#define EDAC_MAX_MCS 2 * MAX_NUMNODES #endif