From patchwork Thu Dec 12 14:27:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Karolina Stolarek X-Patchwork-Id: 13905252 X-Patchwork-Delegate: bhelgaas@google.com Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E198D2147FE for ; Thu, 12 Dec 2024 14:27:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013665; cv=none; b=fqTuLvl1JJXLAq3l9yEjWNC2NNufUWhUM7iMv1iqXzNBP5Exz9H2QM/iNQ5FdEn8cP6e1V5R04CEcIUeCwcrJM4D8+mqpCoRtXOfqxxSFDgopn4YtCZdM8JZn/774yxWbR/CWSV2Ea7/G/cjvv5XMG7geWmFVsosgobEF6BfYMU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013665; c=relaxed/simple; bh=cE+CkEfdYfYmu0Pe/5GPYizeJq6asGOES1Wjx2/ScKI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hUQQFmcll05y6asSuEW6XTVOz4aernmIDtYVUouoHaaJLDsUJWKcM9KvryEXGvXK0w6pUBKDxhgvxKsMWjbjWZeIXW8TE6z1hF61WMCe3R3UOFj9wEnP2VXuSNrNtkNa9L+H/u/G1GYev04zNyfn58oJ/5pdWsAE1hyUxu4Xr9s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=aLtem5CJ; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="aLtem5CJ" Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCCZwiJ026612; Thu, 12 Dec 2024 14:27:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=E1N2H JvPbEq4aJi2zzN6ZikHJ+Zv4mP4gsJPpOeMWK4=; b=aLtem5CJ2ownv70ad786U 5XePXjlU4+EFtOh/m+vRUtUKEbuOQe9D1a2g/WhwSJIwuib/YJxhTCEz3inCqR6k 9mHmWF3N9PonjW6P0tPnh1L3xFaGdIzodq6NZmdHaA1pwK5KnIVfOqjlLsj4+ACc CmY3FmVp8bWu/QrRyC3y2SUn/Y/CjvPEC2eibW+WzOTjP/1zT8OMYJRrjl8mqLgI /oHYhonwRZqD1FJUWDEoXh/Ho1hwiqr8gl1fTi2sZnuy5K5+uKLjUpe8EVDKeSh9 8mYvbnR9bOMfibxdQtPbcr49Tfrs+KNcjrizYURTNL+cXBILVVUNueZoRPKseUgV w== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 43cedcb8se-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 Dec 2024 14:27:41 +0000 (GMT) Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCDD4bb038147; Thu, 12 Dec 2024 14:27:41 GMT Received: from kstolare-e5-ol8.osdevelopmeniad.oraclevcn.com (kstolare-e5-ol8.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.254.20]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 43cctht4wt-2; Thu, 12 Dec 2024 14:27:40 +0000 From: Karolina Stolarek To: linux-pci@vger.kernel.org Cc: Bjorn Helgaas Subject: [RFC 1/4] PCI/AER: Use the same log level for all messages Date: Thu, 12 Dec 2024 14:27:29 +0000 Message-ID: <87d03ea88576594a4326f376018e91f12c57abea.1734005191.git.karolina.stolarek@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2024-12-12_09,2024-12-12_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 adultscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2412120104 X-Proofpoint-ORIG-GUID: 231BamdU1yOsfpj2BszwubM22Gkfo15h X-Proofpoint-GUID: 231BamdU1yOsfpj2BszwubM22Gkfo15h When reporting an AER error, we check its type multiple times to determine the log level for each message. Do this check only in the top-level function and propagate the result down the call chain. Make aer_print_port_info output to match the level of the reported error. Signed-off-by: Karolina Stolarek --- drivers/pci/pci.h | 2 +- drivers/pci/pcie/aer.c | 64 ++++++++++++++++++++++-------------------- drivers/pci/pcie/dpc.c | 2 +- 3 files changed, 36 insertions(+), 32 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 2e40fc63ba31..139ea4f01448 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -546,7 +546,7 @@ struct aer_err_info { }; int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info); -void aer_print_error(struct pci_dev *dev, struct aer_err_info *info); +void aer_print_error(struct pci_dev *dev, struct aer_err_info *info, const char *level); #endif /* CONFIG_PCIEAER */ #ifdef CONFIG_PCIEPORTBUS diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 80c5ba8d8296..b13690fd172f 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -672,20 +672,18 @@ static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t) } static void __aer_print_error(struct pci_dev *dev, - struct aer_err_info *info) + struct aer_err_info *info, + const char *level) { const char **strings; unsigned long status = info->status & ~info->mask; - const char *level, *errmsg; + const char *errmsg; int i; - if (info->severity == AER_CORRECTABLE) { + if (info->severity == AER_CORRECTABLE) strings = aer_correctable_error_string; - level = KERN_WARNING; - } else { + else strings = aer_uncorrectable_error_string; - level = KERN_ERR; - } for_each_set_bit(i, &status, 32) { errmsg = strings[i]; @@ -698,11 +696,11 @@ static void __aer_print_error(struct pci_dev *dev, pci_dev_aer_stats_incr(dev, info); } -void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) +void aer_print_error(struct pci_dev *dev, struct aer_err_info *info, + const char *level) { int layer, agent; int id = pci_dev_id(dev); - const char *level; if (!info->status) { pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", @@ -713,8 +711,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) layer = AER_GET_LAYER_ERROR(info->severity, info->status); agent = AER_GET_AGENT(info->severity, info->status); - level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR; - pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n", aer_error_severity_string[info->severity], aer_error_layer[layer], aer_agent_string[agent]); @@ -722,7 +718,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", dev->vendor, dev->device, info->status, info->mask); - __aer_print_error(dev, info); + __aer_print_error(dev, info, level); if (info->tlp_header_valid) __print_tlp_header(dev, &info->tlp); @@ -735,16 +731,17 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) info->severity, info->tlp_header_valid, &info->tlp); } -static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) +static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info, + const char *level) { u8 bus = info->id >> 8; u8 devfn = info->id & 0xff; - pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n", - info->multi_error_valid ? "Multiple " : "", - aer_error_severity_string[info->severity], - pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), - PCI_FUNC(devfn)); + pci_printk(level, dev, "%s%s error message received from %04x:%02x:%02x.%d\n", + info->multi_error_valid ? "Multiple " : "", + aer_error_severity_string[info->severity], + pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), + PCI_FUNC(devfn)); } #ifdef CONFIG_ACPI_APEI_PCIEAER @@ -767,15 +764,18 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, { int layer, agent, tlp_header_valid = 0; u32 status, mask; + const char *level; struct aer_err_info info; if (aer_severity == AER_CORRECTABLE) { status = aer->cor_status; mask = aer->cor_mask; + level = KERN_WARNING; } else { status = aer->uncor_status; mask = aer->uncor_mask; tlp_header_valid = status & AER_LOG_TLP_MASKS; + level = KERN_ERR; } layer = AER_GET_LAYER_ERROR(aer_severity, status); @@ -787,14 +787,14 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, info.mask = mask; info.first_error = PCI_ERR_CAP_FEP(aer->cap_control); - pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); - __aer_print_error(dev, &info); - pci_err(dev, "aer_layer=%s, aer_agent=%s\n", - aer_error_layer[layer], aer_agent_string[agent]); + pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); + __aer_print_error(dev, &info, level); + pci_printk(level, dev, "aer_layer=%s, aer_agent=%s\n", + aer_error_layer[layer], aer_agent_string[agent]); if (aer_severity != AER_CORRECTABLE) - pci_err(dev, "aer_uncor_severity: 0x%08x\n", - aer->uncor_severity); + pci_printk(level, dev, "aer_uncor_severity: 0x%08x\n", + aer->uncor_severity); if (tlp_header_valid) __print_tlp_header(dev, &aer->header_log); @@ -1255,14 +1255,15 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) return 1; } -static inline void aer_process_err_devices(struct aer_err_info *e_info) +static inline void aer_process_err_devices(struct aer_err_info *e_info, + const char *level) { int i; /* Report all before handle them, not to lost records by reset etc. */ for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { if (aer_get_device_error_info(e_info->dev[i], e_info)) - aer_print_error(e_info->dev[i], e_info); + aer_print_error(e_info->dev[i], e_info, level); } for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { if (aer_get_device_error_info(e_info->dev[i], e_info)) @@ -1280,6 +1281,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc, { struct pci_dev *pdev = rpc->rpd; struct aer_err_info e_info; + const char *level; pci_rootport_aer_stats_incr(pdev, e_src); @@ -1290,19 +1292,21 @@ static void aer_isr_one_error(struct aer_rpc *rpc, if (e_src->status & PCI_ERR_ROOT_COR_RCV) { e_info.id = ERR_COR_ID(e_src->id); e_info.severity = AER_CORRECTABLE; + level = KERN_WARNING; if (e_src->status & PCI_ERR_ROOT_MULTI_COR_RCV) e_info.multi_error_valid = 1; else e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); + aer_print_port_info(pdev, &e_info, level); if (find_source_device(pdev, &e_info)) - aer_process_err_devices(&e_info); + aer_process_err_devices(&e_info, level); } if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) { e_info.id = ERR_UNCOR_ID(e_src->id); + level = KERN_ERR; if (e_src->status & PCI_ERR_ROOT_FATAL_RCV) e_info.severity = AER_FATAL; @@ -1314,10 +1318,10 @@ static void aer_isr_one_error(struct aer_rpc *rpc, else e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); + aer_print_port_info(pdev, &e_info, level); if (find_source_device(pdev, &e_info)) - aer_process_err_devices(&e_info); + aer_process_err_devices(&e_info, level); } } diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 2b6ef7efa3c1..9e48d571d9e7 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -291,7 +291,7 @@ void dpc_process_error(struct pci_dev *pdev) else if (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_UNCOR && dpc_get_aer_uncorrect_severity(pdev, &info) && aer_get_device_error_info(pdev, &info)) { - aer_print_error(pdev, &info); + aer_print_error(pdev, &info, KERN_ERR); pci_aer_clear_nonfatal_status(pdev); pci_aer_clear_fatal_status(pdev); } From patchwork Thu Dec 12 14:27:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Karolina Stolarek X-Patchwork-Id: 13905253 X-Patchwork-Delegate: bhelgaas@google.com Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67696214A86 for ; Thu, 12 Dec 2024 14:27:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013667; cv=none; b=dmPO+PeGbptZfgUpgF45o60LglwxDzA+0K4qk1Q4QGcQK9ug91U7YaIfHzdzKrMnOr0t+TygxBpVxNHx/KjfW3zlOvp6enWOYme9EUfFX2F+ZgzkWU7/JL1t8NssWPVZjpDi3X+jrgrZuNFrCPDdCDtTqjWpB8+YPRFthLps8f8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013667; c=relaxed/simple; bh=ElqTo+vtN/Kww+FtFQ4e47GSLOACoYdCGl7dofgTBlQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Q/w0pqgnr0R5UTcrRFHDnID86WQgzNJzIkzRtqveOgzLKpTDBvQ5QqoLhvsIpLvzTu3+GeMUh2e0UR3LKD4ZHTCZQCoFm7w6xZjSwOK3xH/tSFA7Yb/nAQBNZPwJezqZzsPxrrB/EPFRMTX39MJo6U2uuFvDSZm2kVOzwQ3kfe8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=mSNRTl2R; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="mSNRTl2R" Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCCFwcC012411; Thu, 12 Dec 2024 14:27:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=4ptT3 fwY413GOO95nFZErEdJU5uEZJN+0+MYPylFbp8=; b=mSNRTl2RgyzXEqvPHpQ94 pIMqgAhHRPL5xt7Wrc4wYICBpz+U+n5TO7vEuWJTfmiDD+ATwRcsHvixO275uJBt vLD3SPjZlPniD+TlYdtcAGY2uc9DzTTudJjJtr+Ik7nxJx/c9LyDRqBdDTqg21dx 5pb2KQCKmxwkiQccjsV/Eint7Qh5eFaV8wLcy/0Km4KMSZKzfuO4TDCu6/gmTmuh efq5ylZg8qWKU//wbG8E8Qr1KFbTg5Hg1aQSBR8Iopm7nfMK0cX7DarIEt+HjPfX WQozMncktRKchw5F00v61QvbjC7zIhINFD5qSBCleiFiK91wmBkvNfWRdl4g6o41 w== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 43ccy0bc5k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 Dec 2024 14:27:43 +0000 (GMT) Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCDD4bd038147; Thu, 12 Dec 2024 14:27:42 GMT Received: from kstolare-e5-ol8.osdevelopmeniad.oraclevcn.com (kstolare-e5-ol8.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.254.20]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 43cctht4wt-3; Thu, 12 Dec 2024 14:27:41 +0000 From: Karolina Stolarek To: linux-pci@vger.kernel.org Cc: Bjorn Helgaas Subject: [RFC 2/4] PCI/AER: Add Correctable Errors rate limiting Date: Thu, 12 Dec 2024 14:27:30 +0000 Message-ID: X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2024-12-12_09,2024-12-12_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 adultscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2412120104 X-Proofpoint-GUID: mv4bQIA2Z30hX8kpLB0ui-fRl0XN6_kg X-Proofpoint-ORIG-GUID: mv4bQIA2Z30hX8kpLB0ui-fRl0XN6_kg In the case of a compromised Link integrity, we may see excessive logging of Correctable Errors. This kind of errors is handled by the hardware, so the messages are purely informational. It should suffice to report the error once in a while, and inform how many messages were suppressed over that time. Add a ratelimit_state to control the number of printed Correctable Errors per Root Port and check it each time a Correctable Error is to be reported. Signed-off-by: Karolina Stolarek --- drivers/pci/pcie/aer.c | 44 ++++++++++++++++++++++++++++-------------- include/linux/pci.h | 1 + 2 files changed, 31 insertions(+), 14 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index b13690fd172f..5c34cc2b5bf3 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -40,6 +40,8 @@ #define AER_MAX_TYPEOF_COR_ERRS 16 /* as per PCI_ERR_COR_STATUS */ #define AER_MAX_TYPEOF_UNCOR_ERRS 27 /* as per PCI_ERR_UNCOR_STATUS*/ +#define AER_COR_ERR_INTERVAL (2 * HZ) + struct aer_err_source { u32 status; /* PCI_ERR_ROOT_STATUS */ u32 id; /* PCI_ERR_ROOT_ERR_SRC */ @@ -375,6 +377,9 @@ void pci_aer_init(struct pci_dev *dev) dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL); + /* Allow Root Port to report a Correctable Error message every 2 seconds */ + ratelimit_state_init(&dev->cor_rs, AER_COR_ERR_INTERVAL, 1); + /* * We save/restore PCI_ERR_UNCOR_MASK, PCI_ERR_UNCOR_SEVER, * PCI_ERR_COR_MASK, and PCI_ERR_CAP. Root and Root Complex Event @@ -766,11 +771,13 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, u32 status, mask; const char *level; struct aer_err_info info; + bool no_ratelimit = true; if (aer_severity == AER_CORRECTABLE) { status = aer->cor_status; mask = aer->cor_mask; level = KERN_WARNING; + no_ratelimit = __ratelimit(&dev->cor_rs); } else { status = aer->uncor_status; mask = aer->uncor_mask; @@ -787,17 +794,20 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, info.mask = mask; info.first_error = PCI_ERR_CAP_FEP(aer->cap_control); - pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); - __aer_print_error(dev, &info, level); - pci_printk(level, dev, "aer_layer=%s, aer_agent=%s\n", - aer_error_layer[layer], aer_agent_string[agent]); + if (no_ratelimit) { + pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", + status, mask); + __aer_print_error(dev, &info, level); + pci_printk(level, dev, "aer_layer=%s, aer_agent=%s\n", + aer_error_layer[layer], aer_agent_string[agent]); - if (aer_severity != AER_CORRECTABLE) - pci_printk(level, dev, "aer_uncor_severity: 0x%08x\n", - aer->uncor_severity); + if (aer_severity != AER_CORRECTABLE) + pci_printk(level, dev, "aer_uncor_severity: 0x%08x\n", + aer->uncor_severity); - if (tlp_header_valid) - __print_tlp_header(dev, &aer->header_log); + if (tlp_header_valid) + __print_tlp_header(dev, &aer->header_log); + } trace_aer_event(dev_name(&dev->dev), (status & ~mask), aer_severity, tlp_header_valid, &aer->header_log); @@ -1256,13 +1266,14 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) } static inline void aer_process_err_devices(struct aer_err_info *e_info, - const char *level) + const char *level, + bool no_ratelimit) { int i; /* Report all before handle them, not to lost records by reset etc. */ for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { - if (aer_get_device_error_info(e_info->dev[i], e_info)) + if (aer_get_device_error_info(e_info->dev[i], e_info) && no_ratelimit) aer_print_error(e_info->dev[i], e_info, level); } for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { @@ -1282,6 +1293,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc, struct pci_dev *pdev = rpc->rpd; struct aer_err_info e_info; const char *level; + bool no_ratelimit = true; pci_rootport_aer_stats_incr(pdev, e_src); @@ -1298,10 +1310,14 @@ static void aer_isr_one_error(struct aer_rpc *rpc, e_info.multi_error_valid = 1; else e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info, level); + + no_ratelimit = __ratelimit(&pdev->cor_rs); + + if (no_ratelimit) + aer_print_port_info(pdev, &e_info, level); if (find_source_device(pdev, &e_info)) - aer_process_err_devices(&e_info, level); + aer_process_err_devices(&e_info, level, no_ratelimit); } if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) { @@ -1321,7 +1337,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc, aer_print_port_info(pdev, &e_info, level); if (find_source_device(pdev, &e_info)) - aer_process_err_devices(&e_info, level); + aer_process_err_devices(&e_info, level, no_ratelimit); } } diff --git a/include/linux/pci.h b/include/linux/pci.h index db9b47ce3eef..3dfa2aac31b4 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -347,6 +347,7 @@ struct pci_dev { #ifdef CONFIG_PCIEAER u16 aer_cap; /* AER capability offset */ struct aer_stats *aer_stats; /* AER stats for this device */ + struct ratelimit_state cor_rs; /* Correctable Errors Ratelimit */ #endif #ifdef CONFIG_PCIEPORTBUS struct rcec_ea *rcec_ea; /* RCEC cached endpoint association */ From patchwork Thu Dec 12 14:27:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Karolina Stolarek X-Patchwork-Id: 13905254 X-Patchwork-Delegate: bhelgaas@google.com Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9BB32153C3 for ; Thu, 12 Dec 2024 14:27:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013667; cv=none; b=kEO5olgMCeH7KTf4N1qrGGVF9kh64LQ1n5L9dezANq+aYPwmChnHQTdrvPpnbimhEii1732wR6f94b9Bt7JMXGbbMDsajxU0GRILf469D0KvRPB5A9oz0utDN+k0NcpzJZ8PxFPwXdPGd+QbJpRZGBNqAlGMt0VCFHm4AcAl/8Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013667; c=relaxed/simple; bh=ymt+tVJmIsLHhymI8bHnpOC+4RHXX48X2vjqcKWdDR4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Fy18w/N+Xfxb1Lo86vvSEH28BuQTX1s/Zpj46N57ySqKnmNuyvREfBxYDKsI3XFqo3cvbsmVJgoX14+mF6/7CHnqX7H9NGMVEzv10VYBtJem94ix5RfP3wQ6HXJ8HAhW0RyhOtmpNZDcjefadDEU17oRyDXlMIHpx/8CksvM8Cw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=Bv4zNVdi; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Bv4zNVdi" Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCBBdnq025508; Thu, 12 Dec 2024 14:27:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=fhnAt vw2i/OBGU+6vL2zbrx871RCJ1tVK9V72P9UK7E=; b=Bv4zNVdirWclLU7V8c/DI XEqDbXTYLymQgT65J5bl8rhJaez+x4U0O0cCahY6/F6l+vGeTzOAiucCfT2ENCwJ pcmGjlCtNxPBMAuSgaBOjOY3d0HVoQh1MnmQmy5R/E6qoKlR0wQKQhgLQmU+Y1CP mOYQQSqX7umGhTCFdaqUuywPgyJHVmP6gLZ/LoS/5im9S23W7HL9ONGU7vHhct1W UyIEOBjWFa+v7Y+44kmJq8BTTJzgLu6ts/vwmqDqeV5jS3GUX7nc4sXG3k//8n+A Ry+CRv8KvqO0mpnwcMig5IU+a45x/R85MMYmYV3BpzO8UFytvN0KuewaMVpolQYA A== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 43dx5s8b7t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 Dec 2024 14:27:43 +0000 (GMT) Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCDD4be038147; Thu, 12 Dec 2024 14:27:43 GMT Received: from kstolare-e5-ol8.osdevelopmeniad.oraclevcn.com (kstolare-e5-ol8.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.254.20]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 43cctht4wt-4; Thu, 12 Dec 2024 14:27:43 +0000 From: Karolina Stolarek To: linux-pci@vger.kernel.org Cc: Bjorn Helgaas Subject: [RFC 3/4] PCI/AER: Increase the rate limit interval after threshold Date: Thu, 12 Dec 2024 14:27:31 +0000 Message-ID: <8e44f971e4e2abf89f610688a396485d8999c569.1734005191.git.karolina.stolarek@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2024-12-12_09,2024-12-12_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 adultscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2412120104 X-Proofpoint-ORIG-GUID: DynWjBimJAIQ5BWTcq-0sQn_9Tufk5LM X-Proofpoint-GUID: DynWjBimJAIQ5BWTcq-0sQn_9Tufk5LM In extreme circumstances, the default rate limit might not be enough and a longer timeout is needed. To avoid spamming the logs, update the interval to 30 seconds for the specific Root Port after it observes over 1000 Correctable Errors. Signed-off-by: Karolina Stolarek --- drivers/pci/pcie/aer.c | 22 +++++++++++++++++++++- include/linux/pci.h | 1 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 5c34cc2b5bf3..98bf8bbadc07 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -40,7 +40,9 @@ #define AER_MAX_TYPEOF_COR_ERRS 16 /* as per PCI_ERR_COR_STATUS */ #define AER_MAX_TYPEOF_UNCOR_ERRS 27 /* as per PCI_ERR_UNCOR_STATUS*/ +#define AER_COR_ERR_THRESHOLD 1000 #define AER_COR_ERR_INTERVAL (2 * HZ) +#define AER_COR_ERR_LONG_INTERVAL (30 * HZ) struct aer_err_source { u32 status; /* PCI_ERR_ROOT_STATUS */ @@ -670,6 +672,24 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev, } } +static bool report_aer_cor_err(struct pci_dev *pdev) +{ + struct ratelimit_state *rs = &pdev->cor_rs; + struct aer_stats *aer_stats = pdev->aer_stats; + unsigned int total_cor_errs = aer_stats->rootport_total_cor_errs; + + /* A significant number of errors reported, increase the rate limit */ + if (total_cor_errs > AER_COR_ERR_THRESHOLD && !pdev->cor_err_throttled) { + pci_warn(pdev, + "Over %d Correctable Errors reported, increasing the rate limit", + AER_COR_ERR_THRESHOLD); + rs->interval = AER_COR_ERR_LONG_INTERVAL; + pdev->cor_err_throttled = 1; + } + + return __ratelimit(&pdev->cor_rs); +} + static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t) { pci_err(dev, " TLP Header: %08x %08x %08x %08x\n", @@ -1311,7 +1331,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc, else e_info.multi_error_valid = 0; - no_ratelimit = __ratelimit(&pdev->cor_rs); + no_ratelimit = report_aer_cor_err(pdev); if (no_ratelimit) aer_print_port_info(pdev, &e_info, level); diff --git a/include/linux/pci.h b/include/linux/pci.h index 3dfa2aac31b4..b01bfb339e4e 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -348,6 +348,7 @@ struct pci_dev { u16 aer_cap; /* AER capability offset */ struct aer_stats *aer_stats; /* AER stats for this device */ struct ratelimit_state cor_rs; /* Correctable Errors Ratelimit */ + unsigned int cor_err_throttled:1; #endif #ifdef CONFIG_PCIEPORTBUS struct rcec_ea *rcec_ea; /* RCEC cached endpoint association */ From patchwork Thu Dec 12 14:27:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Karolina Stolarek X-Patchwork-Id: 13905255 X-Patchwork-Delegate: bhelgaas@google.com Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B1E82153D2 for ; Thu, 12 Dec 2024 14:27:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013669; cv=none; b=kkzD2G2W/KeR612kfpIa8/exvgXSLVIynRuaqV12bntudLu4fNu2qioicV61HY+tgLmoAKCNlb/dV+AcxZBtloDlvDwiXxF99EiVAkBRZZ+SwMAX17aobJAj8aRhsUAEC0rseynPXRDstF8tKacXWtQj1QWZL8H/6jiv2kDqnbQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734013669; c=relaxed/simple; bh=GRiIT5pvwD31mvZyR0X6blzW3J8WpkGphNv+YVNMVQw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dM5VMibyxP2ou3uJpJHq9hFFp5jBsMzP4DaTZNR4r+zUQUAkT01x3U3XXaEcLj0doBf+1GAEhNby5CKBXf/yHQWQ/lZf/gaXqhcEw70P/2R42+wUogCyY28NenDXXIXv7h5gyDmpnpjhv6r9uUYjIsv9D5EbeXrtxdoZ0BqIE/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=QxtHhaRf; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="QxtHhaRf" Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCCWuJK021240; Thu, 12 Dec 2024 14:27:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=HSGZe 4lY7VIrUUYET6MDemMEQV7I2+pvJCFHu2LOc9s=; b=QxtHhaRfwu9uxo4uOrR+N 9SA4W37Z8jcc6r8OTBUStituRvvOJiSaPfbwCuUYoSO+DdzGIgmXVTI7SjABlgSk FoLm3L8FVP3xyl3METLWw854FX9ItGv7h7dyQJ4olljpwMqHj/txLlS2mMY5XEZw i37bCUl6qdp3Gx+CKAsPUdhRDf61h5PK5rv+nZLTrQsuUT1Dv0l/pEKw/BSQDhCd rAl6fRjjKSfWTTzwT2CTtvcCQRZyjKq54QOAjaZaq4xkg24khvXMwXvSj8CShngK OAVN8ZOnOabFWoTe/OOJH9yGpHfA3rv/B+9URMJJeqIlvlrlPhc0CqGGsPtFNdD8 A== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 43ce89b1vt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 Dec 2024 14:27:45 +0000 (GMT) Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4BCDD4bf038147; Thu, 12 Dec 2024 14:27:44 GMT Received: from kstolare-e5-ol8.osdevelopmeniad.oraclevcn.com (kstolare-e5-ol8.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.254.20]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 43cctht4wt-5; Thu, 12 Dec 2024 14:27:43 +0000 From: Karolina Stolarek To: linux-pci@vger.kernel.org Cc: Bjorn Helgaas Subject: [RFC 4/4] PCI: Add 'cor_err_reporting_enable' attribute Date: Thu, 12 Dec 2024 14:27:32 +0000 Message-ID: <79d9894fd866714cbce7438390924f2622448d69.1734005191.git.karolina.stolarek@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2024-12-12_09,2024-12-12_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 adultscore=0 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2411120000 definitions=main-2412120104 X-Proofpoint-GUID: RuPbTHaLxXLxJRzD5p27MuF8s-opuDS3 X-Proofpoint-ORIG-GUID: RuPbTHaLxXLxJRzD5p27MuF8s-opuDS3 In some cases, the number of Correctable Error messages is overwhelming, and even with the rate limit imposed, they fill up the logs. The system cannot do much about such errors, so a user might wish to silence them completely. Add a sysfs attribute to control reporting of the Correctable Error Messages per device. Signed-off-by: Karolina Stolarek --- Documentation/ABI/testing/sysfs-bus-pci | 7 +++++ drivers/pci/pci-sysfs.c | 42 +++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 5da6a14dc326..dba72ee37ce4 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -479,6 +479,13 @@ Description: The file is writable if the PF is bound to a driver that implements ->sriov_set_msix_vec_count(). +What: /sys/bus/pci/devices/.../cor_err_reporting_enable +Date: December 2024 +Contact: Linux PCI developers +Description: + This file exposes a bit to control sending of Correctable Error + Messages. The value comes from the Device Control register. + What: /sys/bus/pci/devices/.../resourceN_resize Date: September 2022 Contact: Alex Williamson diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 6f1bb7514efb..f7f0d7971ad7 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -186,6 +186,47 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_RO(resource); +static ssize_t cor_err_reporting_enable_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + u16 reg; + int err; + + err = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, ®); + + if (err) + return pcibios_err_to_errno(err); + + return sysfs_emit(buf, "%u\n", reg & PCI_EXP_DEVCTL_CERE); +} + +static ssize_t cor_err_reporting_enable_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pci_dev *pdev = to_pci_dev(dev); + u16 reg; + u8 val; + int err; + + if (kstrtou8(buf, 0, &val) < 0) + return -EINVAL; + + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, ®); + + reg &= ~PCI_EXP_DEVCTL_CERE; + reg |= val; + err = pcie_capability_write_word(pdev, PCI_EXP_DEVCTL, reg); + + if (err) + return pcibios_err_to_errno(err); + + return count; +} +static DEVICE_ATTR_RW(cor_err_reporting_enable); + static ssize_t max_link_speed_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -659,6 +700,7 @@ static struct attribute *pcie_dev_attrs[] = { &dev_attr_current_link_width.attr, &dev_attr_max_link_width.attr, &dev_attr_max_link_speed.attr, + &dev_attr_cor_err_reporting_enable.attr, NULL, };