From patchwork Thu Jan 2 22:06:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11316143 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99D0B14E3 for ; Thu, 2 Jan 2020 22:48:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7862320848 for ; Thu, 2 Jan 2020 22:48:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578005304; bh=ce8sunsG4amcUwA6+ItXrcDL/VQudn8Nnp9EgFOR4FM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=Xidog+e9k8eF4ZqoDKzRYY9BuSQNMGrYq3kV22sxtOyL8JO9VnH8HXSda+J7fHoRp l5+UOE/f0/QISZ4HifaHXk4yhp/OSESvlYpcEU7lzQrE9t/JUkhB8K3agdyLYWh0Ap GZvyo7uqziNQQZvD4eqPX89UhiP6M2gGiK+DARnI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729940AbgABW2Y (ORCPT ); Thu, 2 Jan 2020 17:28:24 -0500 Received: from mail.kernel.org ([198.145.29.99]:58084 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729732AbgABW2X (ORCPT ); Thu, 2 Jan 2020 17:28:23 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BEEFB2253D; Thu, 2 Jan 2020 22:28:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578004103; bh=ce8sunsG4amcUwA6+ItXrcDL/VQudn8Nnp9EgFOR4FM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TPrcExjbWAQmr+XkEjGZJ2pM3/qB01FZORpirM0ESvmycXCbnqA3xMcXzKgb8NQ/6 +8uh04VxYnmaVw9tgDe5C78TZ8J+6hbBECw+yCjEk+Gk1mwZ+Q41B49Le8b9R1k3a+ iVf/XAMzWQC5VLlrCmpSfdn6eUA6OdkM66/xoFck= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Benjamin Berg , Borislav Petkov , Hans de Goede , Christian Kellner , "H. Peter Anvin" , Ingo Molnar , linux-edac , Peter Zijlstra , Srinivas Pandruvada , Thomas Gleixner , Tony Luck , x86-ml , Sasha Levin Subject: [PATCH 4.9 038/171] x86/mce: Lower throttling MCE messages priority to warning Date: Thu, 2 Jan 2020 23:06:09 +0100 Message-Id: <20200102220552.307572770@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200102220546.960200039@linuxfoundation.org> References: <20200102220546.960200039@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Benjamin Berg [ Upstream commit 9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7 ] On modern CPUs it is quite normal that the temperature limits are reached and the CPU is throttled. In fact, often the thermal design is not sufficient to cool the CPU at full load and limits can quickly be reached when a burst in load happens. This will even happen with technologies like RAPL limitting the long term power consumption of the package. Also, these limits are "softer", as Srinivas explains: "CPU temperature doesn't have to hit max(TjMax) to get these warnings. OEMs ha[ve] an ability to program a threshold where a thermal interrupt can be generated. In some systems the offset is 20C+ (Read only value). In recent systems, there is another offset on top of it which can be programmed by OS, once some agent can adjust power limits dynamically. By default this is set to low by the firmware, which I guess the prime motivation of Benjamin to submit the patch." So these messages do not usually indicate a hardware issue (e.g. insufficient cooling). Log them as warnings to avoid confusion about their severity. [ bp: Massage commit mesage. ] Signed-off-by: Benjamin Berg Signed-off-by: Borislav Petkov Reviewed-by: Hans de Goede Tested-by: Christian Kellner Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Peter Zijlstra Cc: Srinivas Pandruvada Cc: Thomas Gleixner Cc: Tony Luck Cc: x86-ml Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com Signed-off-by: Sasha Levin --- arch/x86/kernel/cpu/mcheck/therm_throt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index c460c91d0c8f..be2439592b0e 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -190,7 +190,7 @@ static int therm_throt_process(bool new_event, int event, int level) /* if we just entered the thermal event */ if (new_event) { if (event == THERMAL_THROTTLING_EVENT) - pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", + pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", this_cpu, level == CORE_LEVEL ? "Core" : "Package", state->count); From patchwork Thu Jan 2 22:06:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11316139 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E9BC14E3 for ; Thu, 2 Jan 2020 22:47:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0846121835 for ; Thu, 2 Jan 2020 22:47:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578005256; bh=Cx/xZ2CdBultJ1QQYrebdLtgIDl5PklKrm8PpPvqR0s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=D1EgGc9/jQhimaGIL5JTQI5W0rXEdMJmrO/iImcvNA9EyHJ29zVQ1R6vXeV//oYko HF1pzfhiBkP5Cl6qWlLB9HpDFO+IFHv3PlV3CDzam8yyflUEYy/jdoPBg9cwKCN0W0 AHR0dmjB+I529JSsn82TUc9TW1hR34CCJ+5FDzTQ= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729373AbgABW33 (ORCPT ); Thu, 2 Jan 2020 17:29:29 -0500 Received: from mail.kernel.org ([198.145.29.99]:60504 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730048AbgABW31 (ORCPT ); Thu, 2 Jan 2020 17:29:27 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id ADD0F20866; Thu, 2 Jan 2020 22:29:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578004167; bh=Cx/xZ2CdBultJ1QQYrebdLtgIDl5PklKrm8PpPvqR0s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P3H8pw/JyDAJ/SKVbJOCQMj6y74xb8dwAOza2SY/4VbsCZplymYMvFZ1c8kCd0Uhf a5MytjlElslrx6/InlWL7mbIzNxoZJSUczhoWSkeEUiLVDlR+ANofNv0NYlFAkF0IH 3gJ9WPkh+P/qn0AVY6YQXMer01ISAPNVPrIVzFFw= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, James Morse , Robert Richter , Borislav Petkov , Mauro Carvalho Chehab , "linux-edac@vger.kernel.org" , Tony Luck , Sasha Levin Subject: [PATCH 4.9 065/171] EDAC/ghes: Fix grain calculation Date: Thu, 2 Jan 2020 23:06:36 +0100 Message-Id: <20200102220555.906743841@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200102220546.960200039@linuxfoundation.org> References: <20200102220546.960200039@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Robert Richter [ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ] The current code to convert a physical address mask to a grain (defined as granularity in bytes) is: e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); This is broken in several ways: 1) It calculates to wrong grain values. E.g., a physical address mask of ~0xfff should give a grain of 0x1000. Without considering PAGE_MASK, there is an off-by-one. Things are worse when also filtering it with ~PAGE_MASK. This will calculate to a grain with the upper bits set. In the example it even calculates to ~0. 2) The grain does not depend on and is unrelated to the kernel's page-size. The page-size only matters when unmapping memory in memory_failure(). Smaller grains are wrongly rounded up to the page-size, on architectures with a configurable page-size (e.g. arm64) this could round up to the even bigger page-size of the hypervisor. Fix this with: e->grain = ~mem_err->physical_addr_mask + 1; The grain_bits are defined as: grain = 1 << grain_bits; Change also the grain_bits calculation accordingly, it is the same formula as in edac_mc.c now and the code can be unified. The value in ->physical_addr_mask coming from firmware is assumed to be contiguous, but this is not sanity-checked. However, in case the mask is non-contiguous, a conversion to grain_bits effectively converts the grain bit mask to a power of 2 by rounding it up. Suggested-by: James Morse Signed-off-by: Robert Richter Signed-off-by: Borislav Petkov Reviewed-by: Mauro Carvalho Chehab Cc: "linux-edac@vger.kernel.org" Cc: Tony Luck Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com Signed-off-by: Sasha Levin --- drivers/edac/ghes_edac.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index e3fa4390f846..4ddbf6604e2a 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -189,6 +189,7 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, /* Cleans the error report buffer */ memset(e, 0, sizeof (*e)); e->error_count = 1; + e->grain = 1; strcpy(e->label, "unknown label"); e->msg = pvt->msg; e->other_detail = pvt->other_detail; @@ -284,7 +285,7 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, /* Error grain */ if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) - e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); + e->grain = ~mem_err->physical_addr_mask + 1; /* Memory error location, mapped on e->location */ p = e->location; @@ -391,8 +392,13 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, if (p > pvt->other_detail) *(p - 1) = '\0'; + /* Sanity-check driver-supplied grain value. */ + if (WARN_ON_ONCE(!e->grain)) + e->grain = 1; + + grain_bits = fls_long(e->grain - 1); + /* Generate the trace event */ - grain_bits = fls_long(e->grain); snprintf(pvt->detail_location, sizeof(pvt->detail_location), "APEI location: %s %s", e->location, e->other_detail); trace_mc_event(type, e->msg, e->label, e->error_count, From patchwork Thu Jan 2 22:08:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 11316135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F4222138D for ; Thu, 2 Jan 2020 22:43:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D1A3922B48 for ; Thu, 2 Jan 2020 22:43:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578005039; bh=fW9Gz4rN4vuGWaB4xKKJYFcn8j7Hz++IV5tpiplwfIU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=dzb3VYzhn12Xl2Nu4r4CFiPtmydHXhvJVRfqJFS3i2QzpxVldBd7VpKFgjCe+w5ey I3NwXzLeuZ0JqBDCDg5KUtVMCtFxNyQgCYefseSvE75IyKXWusp6PVc8iWV8ZG3H4x n930K9YrtnVpMNXt/KRmZY1CD2srfFa3sd0Wjj2s= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730195AbgABWeC (ORCPT ); Thu, 2 Jan 2020 17:34:02 -0500 Received: from mail.kernel.org ([198.145.29.99]:42010 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730434AbgABWeB (ORCPT ); Thu, 2 Jan 2020 17:34:01 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3011E20863; Thu, 2 Jan 2020 22:34:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1578004440; bh=fW9Gz4rN4vuGWaB4xKKJYFcn8j7Hz++IV5tpiplwfIU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LNgvP6wwQjaNJ+4hf1vh+qrGgSLn2THms5pT8RDhNrj09GTwvX8m+n0H6anxfHw/7 7EbajV4egoaFNz2nJd5XSww4ndDt7mrpwdOMkPUNp/F9COb01gQG+f9u0TDTELItjJ Gx5666Y+SJ89O1hcctf/0VAyDvJy1eg0StBA4Qt0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, " =?utf-8?q?Jan_H_=2E__Sch=C3=B6nherr?= " , Borislav Petkov , Tony Luck , "H. Peter Anvin" , Ingo Molnar , linux-edac , Thomas Gleixner , x86-ml , Yazen Ghannam , Sasha Levin Subject: [PATCH 4.9 152/171] x86/mce: Fix possibly incorrect severity calculation on AMD Date: Thu, 2 Jan 2020 23:08:03 +0100 Message-Id: <20200102220607.986637001@linuxfoundation.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200102220546.960200039@linuxfoundation.org> References: <20200102220546.960200039@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Jan H. Schönherr [ Upstream commit a3a57ddad061acc90bef39635caf2b2330ce8f21 ] The function mce_severity_amd_smca() requires m->bank to be initialized for correct operation. Fix the one case, where mce_severity() is called without doing so. Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems") Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()") Signed-off-by: Jan H. Schönherr Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-edac Cc: Cc: Thomas Gleixner Cc: x86-ml Cc: Yazen Ghannam Link: https://lkml.kernel.org/r/20191210000733.17979-4-jschoenh@amazon.de Signed-off-by: Sasha Levin --- arch/x86/kernel/cpu/mcheck/mce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index d3b2c5b25c9c..07188a012492 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -782,8 +782,8 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, if (quirk_no_way_out) quirk_no_way_out(i, m, regs); + m->bank = i; if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { - m->bank = i; mce_read_aux(m, i); *msg = tmp; return 1;