From patchwork Sun Dec 29 17:18:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312275 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D834A139A for ; Sun, 29 Dec 2019 17:31:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B7834207FF for ; Sun, 29 Dec 2019 17:31:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640664; bh=ABkLrZKoElGRtWFppwkvdptsazTThG4UmudwIkKFSsc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ECpH2sxsigTnS9JhcjnODH3r+gb236pf8OfEUHozxfIJLuFhYjTmj9OEsosNlt9K4 IneBq5zH68oW/1bhIQ2QF5KkLcexuXx+I8vPhnf5W09K2IohBIrbmJQWbtOPP5f1dI TOnc6QDSEwVkrtRlb5DTxWS0AbgpZFGzezYUhwjc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728809AbfL2RbE (ORCPT ); Sun, 29 Dec 2019 12:31:04 -0500 Received: from mail.kernel.org ([198.145.29.99]:57794 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728633AbfL2RbD (ORCPT ); Sun, 29 Dec 2019 12:31:03 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3C9F0207FD; Sun, 29 Dec 2019 17:31:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640662; bh=ABkLrZKoElGRtWFppwkvdptsazTThG4UmudwIkKFSsc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=0Ulz6EKdIn59ABN6hSXkecVjY2glMeg3mMMvU8rHnvmZIKwv6LWyY5vC3XIgJvlCK 307qFc3a0K3TGtAMib68ILiLaFJ2y4buqCKLJ7hdaCMglPM/rvIG2Ey2Y4foyPJ9px RSX9twpThS9PpG7koZ0djY08cMAdKFbQKNuASe/M= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Benjamin Berg , Borislav Petkov , Hans de Goede , Christian Kellner , "H. Peter Anvin" , Ingo Molnar , linux-edac , Peter Zijlstra , Srinivas Pandruvada , Thomas Gleixner , Tony Luck , x86-ml , Sasha Levin Subject: [PATCH 4.19 085/219] x86/mce: Lower throttling MCE messages priority to warning Date: Sun, 29 Dec 2019 18:18:07 +0100 Message-Id: <20191229162518.333245399@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162508.458551679@linuxfoundation.org> References: <20191229162508.458551679@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Benjamin Berg [ Upstream commit 9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7 ] On modern CPUs it is quite normal that the temperature limits are reached and the CPU is throttled. In fact, often the thermal design is not sufficient to cool the CPU at full load and limits can quickly be reached when a burst in load happens. This will even happen with technologies like RAPL limitting the long term power consumption of the package. Also, these limits are "softer", as Srinivas explains: "CPU temperature doesn't have to hit max(TjMax) to get these warnings. OEMs ha[ve] an ability to program a threshold where a thermal interrupt can be generated. In some systems the offset is 20C+ (Read only value). In recent systems, there is another offset on top of it which can be programmed by OS, once some agent can adjust power limits dynamically. By default this is set to low by the firmware, which I guess the prime motivation of Benjamin to submit the patch." So these messages do not usually indicate a hardware issue (e.g. insufficient cooling). Log them as warnings to avoid confusion about their severity. [ bp: Massage commit mesage. ] Signed-off-by: Benjamin Berg Signed-off-by: Borislav Petkov Reviewed-by: Hans de Goede Tested-by: Christian Kellner Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Peter Zijlstra Cc: Srinivas Pandruvada Cc: Thomas Gleixner Cc: Tony Luck Cc: x86-ml Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com Signed-off-by: Sasha Levin --- arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index ee229ceee745..ec6a07b04fdb 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -185,7 +185,7 @@ static void therm_throt_process(bool new_event, int event, int level) /* if we just entered the thermal event */ if (new_event) { if (event == THERMAL_THROTTLING_EVENT) - pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", + pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", this_cpu, level == CORE_LEVEL ? "Core" : "Package", state->count); From patchwork Sun Dec 29 17:19:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312277 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67C5E138C for ; Sun, 29 Dec 2019 17:33:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 45B7B207FF for ; Sun, 29 Dec 2019 17:33:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640815; bh=tuRipZte5EqskJXk8kD1/fPZoHKEC/8f/pLxDLZ6hIE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=O66zAy2ztSEspBHfmlOLvQdyemedbIfRkxwh8K70SMzjr2XM9cg4TiQUeZl4Rk4Zb cnk5441aFIocruNXeiXnLHOcYbaBZOSQ5l0ws4wjkCRXwXGnXCIVYRtHXJZn6nYtaO +F7mQM6tbLYOg8zPgIgMIckk9jf7moAQpmMHAIpE= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729439AbfL2Rde (ORCPT ); Sun, 29 Dec 2019 12:33:34 -0500 Received: from mail.kernel.org ([198.145.29.99]:35474 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729459AbfL2Rde (ORCPT ); Sun, 29 Dec 2019 12:33:34 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9E555207FD; Sun, 29 Dec 2019 17:33:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640813; bh=tuRipZte5EqskJXk8kD1/fPZoHKEC/8f/pLxDLZ6hIE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bURsVCl9+Rui7rCzX+5Ewe6WTDVExQC9FWnVI0+Jzxc1vgTFzi1c/1IZW8Y3mkOM3 crfZe2Ba8xqnTKTikc7KN3BIa3fZnPPUDC60ElI77XKjAS6vOi+R8UWa+cKMaIhTzy 7J+imMxIMEc9mRc5MQ0a02KhHeEHlE3IIThwZlJg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, James Morse , Robert Richter , Borislav Petkov , Mauro Carvalho Chehab , "linux-edac@vger.kernel.org" , Tony Luck , Sasha Levin Subject: [PATCH 4.19 149/219] EDAC/ghes: Fix grain calculation Date: Sun, 29 Dec 2019 18:19:11 +0100 Message-Id: <20191229162530.890383917@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162508.458551679@linuxfoundation.org> References: <20191229162508.458551679@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Robert Richter [ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ] The current code to convert a physical address mask to a grain (defined as granularity in bytes) is: e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); This is broken in several ways: 1) It calculates to wrong grain values. E.g., a physical address mask of ~0xfff should give a grain of 0x1000. Without considering PAGE_MASK, there is an off-by-one. Things are worse when also filtering it with ~PAGE_MASK. This will calculate to a grain with the upper bits set. In the example it even calculates to ~0. 2) The grain does not depend on and is unrelated to the kernel's page-size. The page-size only matters when unmapping memory in memory_failure(). Smaller grains are wrongly rounded up to the page-size, on architectures with a configurable page-size (e.g. arm64) this could round up to the even bigger page-size of the hypervisor. Fix this with: e->grain = ~mem_err->physical_addr_mask + 1; The grain_bits are defined as: grain = 1 << grain_bits; Change also the grain_bits calculation accordingly, it is the same formula as in edac_mc.c now and the code can be unified. The value in ->physical_addr_mask coming from firmware is assumed to be contiguous, but this is not sanity-checked. However, in case the mask is non-contiguous, a conversion to grain_bits effectively converts the grain bit mask to a power of 2 by rounding it up. Suggested-by: James Morse Signed-off-by: Robert Richter Signed-off-by: Borislav Petkov Reviewed-by: Mauro Carvalho Chehab Cc: "linux-edac@vger.kernel.org" Cc: Tony Luck Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com Signed-off-by: Sasha Levin --- drivers/edac/ghes_edac.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 574bce603337..78c339da19b5 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -210,6 +210,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) /* Cleans the error report buffer */ memset(e, 0, sizeof (*e)); e->error_count = 1; + e->grain = 1; strcpy(e->label, "unknown label"); e->msg = pvt->msg; e->other_detail = pvt->other_detail; @@ -305,7 +306,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) /* Error grain */ if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) - e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); + e->grain = ~mem_err->physical_addr_mask + 1; /* Memory error location, mapped on e->location */ p = e->location; @@ -412,8 +413,13 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) if (p > pvt->other_detail) *(p - 1) = '\0'; + /* Sanity-check driver-supplied grain value. */ + if (WARN_ON_ONCE(!e->grain)) + e->grain = 1; + + grain_bits = fls_long(e->grain - 1); + /* Generate the trace event */ - grain_bits = fls_long(e->grain); snprintf(pvt->detail_location, sizeof(pvt->detail_location), "APEI location: %s %s", e->location, e->other_detail); trace_mc_event(type, e->msg, e->label, e->error_count, From patchwork Sun Dec 29 17:20:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312281 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DC6F138C for ; Sun, 29 Dec 2019 17:36:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4CC312253D for ; Sun, 29 Dec 2019 17:36:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640963; bh=a5E1LuBaxO6rkL8BYot1Thj811ghBqQuy8yf6944W+k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=zt0JzV6OMC7oYZ0fVuKf9R+fFvDZfyH5tXDnuSW2KTvYiCByLEm1lZ664+TnaV3dR 2FLZRFiKkI+m7jc4WvyPydAX9e1IA+hkHhwq4U0sklYKrdri/BknkcDEoo3qnNAFBU PbXRMrDp5f5EThO8igCsJS5vh7d+h3Xj+0J4/N58= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729978AbfL2RgC (ORCPT ); Sun, 29 Dec 2019 12:36:02 -0500 Received: from mail.kernel.org ([198.145.29.99]:40886 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729971AbfL2RgB (ORCPT ); Sun, 29 Dec 2019 12:36:01 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 45759206CB; Sun, 29 Dec 2019 17:36:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640960; bh=a5E1LuBaxO6rkL8BYot1Thj811ghBqQuy8yf6944W+k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NDr4OUlr2wHVIS+/88apRv637817Uuo04/Byb0gymoDxloYWQdoQoQ4toGSs1F0AS /4KGgYeaa0mVfaZt+N26sUWDXsXKs85vJ7frDnbaMnhw6RPBWlVGmlHdpC5ysWMmOa oPEdNt4sdpV15fUvgsLTtN5opRnDvJK3U7z+hJn8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Konstantin Khlebnikov , Borislav Petkov , Yazen Ghannam , "H. Peter Anvin" , Ingo Molnar , linux-edac , Thomas Gleixner , Tony Luck , x86-ml Subject: [PATCH 4.19 209/219] x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure() Date: Sun, 29 Dec 2019 18:20:11 +0100 Message-Id: <20191229162541.742257114@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162508.458551679@linuxfoundation.org> References: <20191229162508.458551679@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Konstantin Khlebnikov commit 246ff09f89e54fdf740a8d496176c86743db3ec7 upstream. ... because interrupts are disabled that early and sending IPIs can deadlock: BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 no locks held by swapper/1/0. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [] copy_process+0x8b9/0x1ca0 softirqs last enabled at (0): [] copy_process+0x8b9/0x1ca0 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [] start_secondary+0x3b/0x190 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 Call Trace: dump_stack ___might_sleep.cold.92 wait_for_completion ? generic_exec_single rdmsr_safe_on_cpu ? wrmsr_on_cpus mce_amd_feature_init mcheck_cpu_init identify_cpu identify_secondary_cpu smp_store_cpu_info start_secondary secondary_startup_64 The function smca_configure() is called only on the current CPU anyway, therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid the IPI. [ bp: Update commit message. ] Signed-off-by: Konstantin Khlebnikov Signed-off-by: Borislav Petkov Reviewed-by: Yazen Ghannam Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Cc: Thomas Gleixner Cc: Tony Luck Cc: x86-ml Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -231,7 +231,7 @@ static void smca_configure(unsigned int if (smca_banks[bank].hwid) return; - if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { + if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) { pr_warn("Failed to read MCA_IPID for bank %d\n", bank); return; } From patchwork Sun Dec 29 17:20:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11312285 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57B4D138C for ; Sun, 29 Dec 2019 17:37:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 36EBC21D7E for ; Sun, 29 Dec 2019 17:37:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577641030; bh=BactduNqTs4sPIgTbO6UqdHepaukg3SW1+IJ6I1vWKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=DRmVRDQdl1dEN8jU4jjrlezpkVxonpIv0VLUi2bNL9rfDM3YeqjIrdt20Es644EKz cWS3v0qyPiFf2iEnIMMBM+jkK4YL/Lf/8Jj1xgq+0PQ7iF/tHFQrzlOKeaBxJg4OmW CYkNFePawAY1jS13MWJnE058MXZX1P1J9je8niGc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729981AbfL2RgJ (ORCPT ); Sun, 29 Dec 2019 12:36:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:41080 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729971AbfL2RgG (ORCPT ); Sun, 29 Dec 2019 12:36:06 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 18F0921744; Sun, 29 Dec 2019 17:36:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1577640965; bh=BactduNqTs4sPIgTbO6UqdHepaukg3SW1+IJ6I1vWKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Qj3pNeuMqsppso/9lKODlLfIHsqzT6PgP+F4Pknw1disK08QysVYxXbKbkZjt+izG PbiAtZmXg+athzn4oB/r1WHPUJhiQpOa3gdXFXcrap1mFXtey0v9ft0f1Iof/pcBUX rmQ05duZ+n0E8gNPG7Nu3/f1bzopRaqT5HV/4wMs= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Yazen Ghannam , Borislav Petkov , "H. Peter Anvin" , Ingo Molnar , linux-edac , Thomas Gleixner , Tony Luck , x86-ml Subject: [PATCH 4.19 210/219] x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[] Date: Sun, 29 Dec 2019 18:20:12 +0100 Message-Id: <20191229162541.818222862@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191229162508.458551679@linuxfoundation.org> References: <20191229162508.458551679@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Yazen Ghannam commit 966af20929ac24360ba3fac5533eb2ab003747da upstream. Each logical CPU in Scalable MCA systems controls a unique set of MCA banks in the system. These banks are not shared between CPUs. The bank types and ordering will be the same across CPUs on currently available systems. However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while other CPUs do not. In this case, the bank seen as Reserved on one CPU is assumed to be the same type as the bank seen as a known type on another CPU. In general, this occurs when the hardware represented by the MCA bank is disabled, e.g. disabled memory controllers on certain models, etc. The MCA bank is disabled in the hardware, so there is no possibility of getting an MCA/MCE from it even if it is assumed to have a known type. For example: Full system: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | UMC 2 | CS | CS System with hardware disabled: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | RAZ 2 | CS | CS For this reason, there is a single, global struct smca_banks[] that is initialized at boot time. This array is initialized on each CPU as it comes online. However, the array will not be updated if an entry already exists. This works as expected when the first CPU (usually CPU0) has all possible MCA banks enabled. But if the first CPU has a subset, then it will save a "Reserved" type in smca_banks[]. Successive CPUs will then not be able to update smca_banks[] even if they encounter a known bank type. This may result in unexpected behavior. Depending on the system configuration, a user may observe issues enumerating the MCA thresholding sysfs interface. The issues may be as trivial as sysfs entries not being available, or as severe as system hangs. For example: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | RAZ | UMC 2 | CS | CS Extend the smca_banks[] entry check to return if the entry is a non-reserved type. Otherwise, continue so that CPUs that encounter a known bank type can update smca_banks[]. Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type") Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Cc: Thomas Gleixner Cc: Tony Luck Cc: x86-ml Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -228,7 +228,7 @@ static void smca_configure(unsigned int } /* Return early if this bank was already initialized. */ - if (smca_banks[bank].hwid) + if (smca_banks[bank].hwid && smca_banks[bank].hwid->hwid_mcatype != 0) return; if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {