From patchwork Sun Dec 29 17:18:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312341 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC92B14B7 for ; Sun, 29 Dec 2019 18:16:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8AF51207FF for ; Sun, 29 Dec 2019 18:16:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577643419; bh=ABkLrZKoElGRtWFppwkvdptsazTThG4UmudwIkKFSsc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=qLthYWG97SfruGHWCR0VPZkI2nRGtmoXXHTyDbHW5FlBSitptB2KFic8N46NQgaxS Ze0PQDEtbCnqXwB9kcsSZ5eJiDRsX7WbQjtliZARiU3k/963reGOPM9z2VaxwlpCyP f2hEGqjfmXoEOMg7cNj0/qrJjrN5+mjaT8LMoiqg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727242AbfL2RXk (ORCPT ); Sun, 29 Dec 2019 12:23:40 -0500 Received: from mail.kernel.org ([198.145.29.99]:41404 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727566AbfL2RXh (ORCPT ); Sun, 29 Dec 2019 12:23:37 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 43F1220722; Sun, 29 Dec 2019 17:23:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640216; bh=ABkLrZKoElGRtWFppwkvdptsazTThG4UmudwIkKFSsc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZcZj2dwce1c2CdnjHf0pO4UxwdM8CurNiN2EInWS0zDGuGf7m9B7r6iXbkdr1QaH4 HBGpyhAPtpfdZQjKWGfx26mxbhSdvvJzcm9G7kDpTwbD63VsDMRvXpDhQGmwcn/2zB hWbZJSwNDr2ym6ywZssvzXhfJn/z2Vs5pSBzyPKs= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Benjamin Berg , Borislav Petkov , Hans de Goede , Christian Kellner , "H. Peter Anvin" , Ingo Molnar , linux-edac , Peter Zijlstra , Srinivas Pandruvada , Thomas Gleixner , Tony Luck , x86-ml , Sasha Levin Subject: [PATCH 4.14 066/161] x86/mce: Lower throttling MCE messages priority to warning Date: Sun, 29 Dec 2019 18:18:34 +0100 Message-Id: <20191229162419.667121768@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162355.500086350@linuxfoundation.org> References: <20191229162355.500086350@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Benjamin Berg [ Upstream commit 9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7 ] On modern CPUs it is quite normal that the temperature limits are reached and the CPU is throttled. In fact, often the thermal design is not sufficient to cool the CPU at full load and limits can quickly be reached when a burst in load happens. This will even happen with technologies like RAPL limitting the long term power consumption of the package. Also, these limits are "softer", as Srinivas explains: "CPU temperature doesn't have to hit max(TjMax) to get these warnings. OEMs ha[ve] an ability to program a threshold where a thermal interrupt can be generated. In some systems the offset is 20C+ (Read only value). In recent systems, there is another offset on top of it which can be programmed by OS, once some agent can adjust power limits dynamically. By default this is set to low by the firmware, which I guess the prime motivation of Benjamin to submit the patch." So these messages do not usually indicate a hardware issue (e.g. insufficient cooling). Log them as warnings to avoid confusion about their severity. [ bp: Massage commit mesage. ] Signed-off-by: Benjamin Berg Signed-off-by: Borislav Petkov Reviewed-by: Hans de Goede Tested-by: Christian Kellner Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Peter Zijlstra Cc: Srinivas Pandruvada Cc: Thomas Gleixner Cc: Tony Luck Cc: x86-ml Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com Signed-off-by: Sasha Levin --- arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index ee229ceee745..ec6a07b04fdb 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -185,7 +185,7 @@ static void therm_throt_process(bool new_event, int event, int level) /* if we just entered the thermal event */ if (new_event) { if (event == THERMAL_THROTTLING_EVENT) - pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", + pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", this_cpu, level == CORE_LEVEL ? "Core" : "Package", state->count); From patchwork Sun Dec 29 17:19:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312339 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CAA17139A for ; Sun, 29 Dec 2019 18:15:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A8830207FD for ; Sun, 29 Dec 2019 18:15:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577643330; bh=UtjqdOVa/KZHKY7sXd8vUgrnNeXDFcZoscPGOtTCDYc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=cgz4Y5DWyBQG0KINuBxMVwJKg4vdd9Ocxb9WGo8abaEBaQAdSinj3sap3zlLBxygn BtJ+Brc8wA58spMjXVzKcaBoUhMt3lTJzsnJYRx+CWWqH1nRoBDgv+xmyKdSPtLDcH tscN/btRWoqYfYkrMOdzd7+Pu4F/+oxmd5pmatZs= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726726AbfL2RZ3 (ORCPT ); Sun, 29 Dec 2019 12:25:29 -0500 Received: from mail.kernel.org ([198.145.29.99]:45332 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727474AbfL2RZ2 (ORCPT ); Sun, 29 Dec 2019 12:25:28 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 26FF620409; Sun, 29 Dec 2019 17:25:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640327; bh=UtjqdOVa/KZHKY7sXd8vUgrnNeXDFcZoscPGOtTCDYc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lCPqqlz+t+xZ4SeaVsHDqbotOI6V36x2PrwKRUo4BXFhTOS9yxO5DfytQeRni1mEL XLzPWz36bloEHf4ZJdNwXAoa6aOlECq3+pi/V3nIrgwVgqlph73pNGId/4Gn840PiS CpksaEjBM1lTSAHJyACyrDRhCVseytqtmR5LPe8Q= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, James Morse , Robert Richter , Borislav Petkov , Mauro Carvalho Chehab , "linux-edac@vger.kernel.org" , Tony Luck , Sasha Levin Subject: [PATCH 4.14 111/161] EDAC/ghes: Fix grain calculation Date: Sun, 29 Dec 2019 18:19:19 +0100 Message-Id: <20191229162430.774697006@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162355.500086350@linuxfoundation.org> References: <20191229162355.500086350@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Robert Richter [ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ] The current code to convert a physical address mask to a grain (defined as granularity in bytes) is: e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); This is broken in several ways: 1) It calculates to wrong grain values. E.g., a physical address mask of ~0xfff should give a grain of 0x1000. Without considering PAGE_MASK, there is an off-by-one. Things are worse when also filtering it with ~PAGE_MASK. This will calculate to a grain with the upper bits set. In the example it even calculates to ~0. 2) The grain does not depend on and is unrelated to the kernel's page-size. The page-size only matters when unmapping memory in memory_failure(). Smaller grains are wrongly rounded up to the page-size, on architectures with a configurable page-size (e.g. arm64) this could round up to the even bigger page-size of the hypervisor. Fix this with: e->grain = ~mem_err->physical_addr_mask + 1; The grain_bits are defined as: grain = 1 << grain_bits; Change also the grain_bits calculation accordingly, it is the same formula as in edac_mc.c now and the code can be unified. The value in ->physical_addr_mask coming from firmware is assumed to be contiguous, but this is not sanity-checked. However, in case the mask is non-contiguous, a conversion to grain_bits effectively converts the grain bit mask to a power of 2 by rounding it up. Suggested-by: James Morse Signed-off-by: Robert Richter Signed-off-by: Borislav Petkov Reviewed-by: Mauro Carvalho Chehab Cc: "linux-edac@vger.kernel.org" Cc: Tony Luck Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com Signed-off-by: Sasha Levin --- drivers/edac/ghes_edac.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 6f80eb65c26c..acae39278669 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -187,6 +187,7 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, /* Cleans the error report buffer */ memset(e, 0, sizeof (*e)); e->error_count = 1; + e->grain = 1; strcpy(e->label, "unknown label"); e->msg = pvt->msg; e->other_detail = pvt->other_detail; @@ -282,7 +283,7 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, /* Error grain */ if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) - e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); + e->grain = ~mem_err->physical_addr_mask + 1; /* Memory error location, mapped on e->location */ p = e->location; @@ -389,8 +390,13 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, if (p > pvt->other_detail) *(p - 1) = '\0'; + /* Sanity-check driver-supplied grain value. */ + if (WARN_ON_ONCE(!e->grain)) + e->grain = 1; + + grain_bits = fls_long(e->grain - 1); + /* Generate the trace event */ - grain_bits = fls_long(e->grain); snprintf(pvt->detail_location, sizeof(pvt->detail_location), "APEI location: %s %s", e->location, e->other_detail); trace_mc_event(type, e->msg, e->label, e->error_count, From patchwork Sun Dec 29 17:20:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312337 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20721139A for ; Sun, 29 Dec 2019 18:14:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E9744207E0 for ; Sun, 29 Dec 2019 18:14:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577643255; bh=a5E1LuBaxO6rkL8BYot1Thj811ghBqQuy8yf6944W+k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=k+0yj7D8Z23J+kaOakFH7G3WG9RN0N9EEu4o/DT3hQzJ8qKvsCTbxijT6a+UClTHM 3ACjUvIck1fIMbFGLEbJ2qZArL7VLUGPT9DF+f1UygJ7l8kBE98b6PQhlV7JpWMvoN Oh3bWFrOz70GD7cQUxz+fDsiqPX9RAq+9XS+S7uE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728381AbfL2R1O (ORCPT ); Sun, 29 Dec 2019 12:27:14 -0500 Received: from mail.kernel.org ([198.145.29.99]:49222 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727692AbfL2R1O (ORCPT ); Sun, 29 Dec 2019 12:27:14 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A9F8B207FF; Sun, 29 Dec 2019 17:27:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640433; bh=a5E1LuBaxO6rkL8BYot1Thj811ghBqQuy8yf6944W+k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oqPZXh/khDZb6LyCXnwixQ8rIJtFBYGEJyRQPgo5cxI067IitOVXA95l92ic5xdXQ HjrsTCCA2RolZjl23ENsvwPGVOhElXerquiuAZcCZxzEeo1YlUzGgTGg9TBc9LS/BE HH4t+ZYociCYUh6HlZdcrSkZp0R9n2WTmGcYSD6M= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Konstantin Khlebnikov , Borislav Petkov , Yazen Ghannam , "H. Peter Anvin" , Ingo Molnar , linux-edac , Thomas Gleixner , Tony Luck , x86-ml Subject: [PATCH 4.14 155/161] x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure() Date: Sun, 29 Dec 2019 18:20:03 +0100 Message-Id: <20191229162448.980647435@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162355.500086350@linuxfoundation.org> References: <20191229162355.500086350@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Konstantin Khlebnikov commit 246ff09f89e54fdf740a8d496176c86743db3ec7 upstream. ... because interrupts are disabled that early and sending IPIs can deadlock: BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 no locks held by swapper/1/0. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [] copy_process+0x8b9/0x1ca0 softirqs last enabled at (0): [] copy_process+0x8b9/0x1ca0 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [] start_secondary+0x3b/0x190 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 Call Trace: dump_stack ___might_sleep.cold.92 wait_for_completion ? generic_exec_single rdmsr_safe_on_cpu ? wrmsr_on_cpus mce_amd_feature_init mcheck_cpu_init identify_cpu identify_secondary_cpu smp_store_cpu_info start_secondary secondary_startup_64 The function smca_configure() is called only on the current CPU anyway, therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid the IPI. [ bp: Update commit message. ] Signed-off-by: Konstantin Khlebnikov Signed-off-by: Borislav Petkov Reviewed-by: Yazen Ghannam Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Cc: Thomas Gleixner Cc: Tony Luck Cc: x86-ml Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -231,7 +231,7 @@ static void smca_configure(unsigned int if (smca_banks[bank].hwid) return; - if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { + if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { pr_warn("Failed to read MCA_IPID for bank %d\n", bank); return; } From patchwork Sun Dec 29 17:20:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312269 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3DFCF109A for ; Sun, 29 Dec 2019 17:27:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1C6BD222C4 for ; Sun, 29 Dec 2019 17:27:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640439; bh=BactduNqTs4sPIgTbO6UqdHepaukg3SW1+IJ6I1vWKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=hWYFXGsZeaGslPwebommBXmF3SffQISMD/BO6Kr4orwGYlQg3SPmXVeXiZgNnL9ez lLQoofyVbBvcMm2d/or9PXdCnJfBqKN+NvcYAARXer22szo4DJBRfb2dciJQQqV8Oz 5oLlR/7fLT2Y/BUx4NOcy1y3NDinVHN+Hg2tQMwo= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728089AbfL2R1S (ORCPT ); Sun, 29 Dec 2019 12:27:18 -0500 Received: from mail.kernel.org ([198.145.29.99]:49322 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728057AbfL2R1Q (ORCPT ); Sun, 29 Dec 2019 12:27:16 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4F8C4208E4; Sun, 29 Dec 2019 17:27:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640435; bh=BactduNqTs4sPIgTbO6UqdHepaukg3SW1+IJ6I1vWKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PLhZrcffw0/ymqi+TO5d5qSd1mWvy4jYV2j4xnITX4BDg9fzMphc+1l10zR8Gy80Q lQnXg9ovHpGmWHQLGYc0gvhF0zmEKdCIE2QVU346cjR5OFSk2OhZPJ2pEudxXuhq6h ulXvspxzitJSBaYV4Am0UrVZfaqyL9lrunXt5GBM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Yazen Ghannam , Borislav Petkov , "H. Peter Anvin" , Ingo Molnar , linux-edac , Thomas Gleixner , Tony Luck , x86-ml Subject: [PATCH 4.14 156/161] x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[] Date: Sun, 29 Dec 2019 18:20:04 +0100 Message-Id: <20191229162449.201703020@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162355.500086350@linuxfoundation.org> References: <20191229162355.500086350@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Yazen Ghannam commit 966af20929ac24360ba3fac5533eb2ab003747da upstream. Each logical CPU in Scalable MCA systems controls a unique set of MCA banks in the system. These banks are not shared between CPUs. The bank types and ordering will be the same across CPUs on currently available systems. However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while other CPUs do not. In this case, the bank seen as Reserved on one CPU is assumed to be the same type as the bank seen as a known type on another CPU. In general, this occurs when the hardware represented by the MCA bank is disabled, e.g. disabled memory controllers on certain models, etc. The MCA bank is disabled in the hardware, so there is no possibility of getting an MCA/MCE from it even if it is assumed to have a known type. For example: Full system: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | UMC 2 | CS | CS System with hardware disabled: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | RAZ 2 | CS | CS For this reason, there is a single, global struct smca_banks[] that is initialized at boot time. This array is initialized on each CPU as it comes online. However, the array will not be updated if an entry already exists. This works as expected when the first CPU (usually CPU0) has all possible MCA banks enabled. But if the first CPU has a subset, then it will save a "Reserved" type in smca_banks[]. Successive CPUs will then not be able to update smca_banks[] even if they encounter a known bank type. This may result in unexpected behavior. Depending on the system configuration, a user may observe issues enumerating the MCA thresholding sysfs interface. The issues may be as trivial as sysfs entries not being available, or as severe as system hangs. For example: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | RAZ | UMC 2 | CS | CS Extend the smca_banks[] entry check to return if the entry is a non-reserved type. Otherwise, continue so that CPUs that encounter a known bank type can update smca_banks[]. Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type") Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Cc: Thomas Gleixner Cc: Tony Luck Cc: x86-ml Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -228,7 +228,7 @@ static void smca_configure(unsigned int } /* Return early if this bank was already initialized. */ - if (smca_banks[bank].hwid) + if (smca_banks[bank].hwid && smca_banks[bank].hwid->hwid_mcatype != 0) return; if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {