From patchwork Thu Feb 3 17:49:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72F8EC433FE for ; Thu, 3 Feb 2022 17:50:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353078AbiBCRua (ORCPT ); Thu, 3 Feb 2022 12:50:30 -0500 Received: from mail-mw2nam10on2066.outbound.protection.outlook.com ([40.107.94.66]:60800 "EHLO NAM10-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353076AbiBCRu1 (ORCPT ); Thu, 3 Feb 2022 12:50:27 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mvLD5G/eTc6j0HxWSTd3SA/NFrjI2h2a8YWhUjKCQyDJo5DlnE0QRG8TVhD1hKLY++M8IcpqBd/bI+91/ZxSwYqiTLR3w+Vd52xftuxha9dnRY5wqFDxt0iWfT9aFhCiGMomEve7TLsLurQ/hxacQ5rZE6Ol+8s60FGnyok6cV5FvhC2plMmThjHZ4MG5r3n9bcOEW8lMcUo1R2VRIZEc5TTOBUll6vVRr6paZlyVveRNJyw6hrTIeZx3mkbXyVmL1DCTimSNXQbUhVqGz0bwARcApGCPK4HxF6oGfXdns8tckINSSAp0pPoRHJ5EGIZxoQC4IE0Mq3aXakRRPuWSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HMecDP09xjHLYQXna7DlZz+3OW992d6j6ajme8aekbY=; b=aSmmcwZ8noN/vEy2mEsDKulm7fbBPN339BJzGSIbDQTOQR040icNyu1ZADmi3I5993GeGXRwU837qWvJ3fOqo1VmwPjXMzX9qiZDHYKvj/q7+0XbH7kYzdDHP8SaOm3oVg+aVESvSzZQ0OkVal+VZo8It/JG9J438HIYQvNbvAmmS/lgXiV3wW5j7wYtXyvKIPOyYQJeV1r/MVsYRFmijbl4dVq9+Tifsxn1UPrSHCYySvBumhBTM5XVXhmhLPj6vO/6ynOT6SprMV0iUMXso5Qh5aCNPgxfSkfaWjMLxIwGAQsYQc2HUFp8F5MHyweGxyzI5yLxiUKYcGwTMvGSsw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HMecDP09xjHLYQXna7DlZz+3OW992d6j6ajme8aekbY=; b=uVdtgQHklU1PZEkV6wnBWozXh4DkwHm5eiDo+6/JuwThdacqw/r0LDI5SkTHTmYZ5svetDlxk1ZfG9vCceH/4lRPuGgEQJZp0QtM1EHEnCOWEVfTdYmH840F2mOkcEjydB6641RjNhPKc20F0uNke1dqfhUdY5hGCCMHqy1Lg+Y= Received: from MW4PR02CA0024.namprd02.prod.outlook.com (2603:10b6:303:16d::9) by BL1PR12MB5361.namprd12.prod.outlook.com (2603:10b6:208:31f::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:25 +0000 Received: from CO1NAM11FT007.eop-nam11.prod.protection.outlook.com (2603:10b6:303:16d:cafe::76) by MW4PR02CA0024.outlook.office365.com (2603:10b6:303:16d::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT007.mail.protection.outlook.com (10.13.174.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:25 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:22 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v7 01/12] EDAC/amd64: Document heterogeneous enumeration Date: Thu, 3 Feb 2022 11:49:31 -0600 Message-ID: <20220203174942.31630-2-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 29a84de5-69d0-4fce-0ce3-08d9e73da852 X-MS-TrafficTypeDiagnostic: BL1PR12MB5361:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8882; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: B7QRSIYexqPPBmvUAlkTAnddGq7GyFMuQyovMWENJv9HstlEjaPci1dXS9or2u5weKNyHcG5vX6PuxdN7nM4RNusB3GG9JaWVuEaO+sIk66yo/tl8OtNOa0LSSqGVK4YztE6nU/5pY+vIhLYYtUBbTaJCz1QXmoKWQJf6AJljbQct7iW1f8W46im9hfYxSFxyJyF2/PLEnYNc3FA+gdIaTyUgwBlrZVI0qJthafLfAboFNQ50V5KxkTNa10KPbcumC6WY2h+9xnGkvIYrcfxQqtwO0eDYEnK69K1tELcVPZ5kQ2e4iI/oZZCLf0nMXW9dj5PpRE4cy90z/s/dsJWYkRIYacbpu3rJFks2ZlQhIJClgz6Umdr5kxcVcW0vBGkMqCw/hE60yj2EZUAcoUtGXSUEdvj5yVHDjQgw4WXjxTZEp1PSCQZqQTGwzzirv2wTn0T6GieiJjJlpycx6SrGx97Xet5UkBE43DTy4KDHpmcPUnNrHqaPHwq3AW39I5H05+uebrVE4alh8gZ6UaGAXfGEvpR6+ykputsViA0+3QjC2AAi1m8YKej9fPpevE+Af3J2PeRXb5m5U4nBokEcSybV+8COnAbs1dfX6TaTOWZWLnCzKJST58b98+9KDady80TVT5ccznzhk2NQUF93I0SthT3xuZCsQaVeDHTACaEW35dtW7QybLnQT1HFS+gPf+Wrvhb1P/kB3OwfcG5Ng== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(40470700004)(46966006)(82310400004)(83380400001)(336012)(2616005)(2906002)(110136005)(8936002)(426003)(36860700001)(54906003)(5660300002)(4326008)(26005)(316002)(1076003)(7696005)(70586007)(70206006)(40460700003)(36756003)(6666004)(8676002)(47076005)(356005)(81166007)(16526019)(508600001)(186003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:25.0888 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 29a84de5-69d0-4fce-0ce3-08d9e73da852 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT007.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5361 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K The Documentation notes have been added in amd64_edac.h and will be referring to driver-api wherever needed. Explains how the physical topology is enumerated in the software and edac module populates the sysfs ABIs. Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Reviewed-by: Yazen Ghannam --- v6->v7: * New in v7 Documentation/driver-api/edac.rst | 9 +++ drivers/edac/amd64_edac.h | 101 ++++++++++++++++++++++++++++++ 2 files changed, 110 insertions(+) diff --git a/Documentation/driver-api/edac.rst b/Documentation/driver-api/edac.rst index b8c742aa0a71..0dd07d0d0e47 100644 --- a/Documentation/driver-api/edac.rst +++ b/Documentation/driver-api/edac.rst @@ -106,6 +106,15 @@ will occupy those chip-select rows. This term is avoided because it is unclear when needing to distinguish between chip-select rows and socket sets. +* High Bandwidth Memory (HBM) + +HBM is a new type of memory chip with low power consumption and ultra-wide +communication lanes. It uses vertically stacked memory chips (DRAM dies) +interconnected by microscopic wires called "through-silicon vias," or TSVs. + +Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast +interconnect called the “interposer". So that HBM’s characteristics are +nearly indistinguishable from on-chip integrated RAM. Memory Controllers ------------------ diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 6f8147abfa71..6a112270a84b 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -559,3 +559,104 @@ static inline u32 dct_sel_baseaddr(struct amd64_pvt *pvt) } return (pvt)->dct_sel_lo & 0xFFFFF800; } + +/* + * AMD Heterogeneous system support on EDAC subsystem + * -------------------------------------------------- + * + * An AMD heterogeneous system built by connecting the data fabrics of both CPUs + * and GPUs via custom xGMI links. So, the Data Fabric on the GPU nodes can be + * accessed the same way as the Data Fabric on CPU nodes. + * + * An Aldebaran GPUs has 2 Data Fabrics, each GPU DF contains four Unified + * Memory Controllers (UMC). Each UMC contains eight Channels. Each UMC Channel + * controls one 128-bit HBM2e (2GB) channel (equivalent to 8 X 2GB ranks), + * this creates a total of 4096-bits of DRAM data bus. + * + * While UMC is interfacing a 16GB (8H X 2GB DRAM) HBM stack, each UMC channel is + * interfacing 2GB of DRAM (represented as rank). + * + * Memory controllers on AMD GPU nodes can be represented in EDAC is as below: + * GPU DF / GPU Node -> EDAC MC + * GPU UMC -> EDAC CSROW + * GPU UMC channel -> EDAC CHANNEL + * + * Eg: An heterogeneous system with 1 AMD CPU is connected to 4 Aldebaran GPUs using xGMI. + * + * AMD GPU Nodes are enumerated in sequential order based on the PCI hierarchy, and the + * first GPU node is assumed to have an "Node ID" value after CPU Nodes are fully + * populated. + * + * $ ls /sys/devices/system/edac/mc/ + * mc0 - CPU MC node 0 + * mc1 | + * mc2 |- GPU card[0] => node 0(mc1), node 1(mc2) + * mc3 | + * mc4 |- GPU card[1] => node 0(mc3), node 1(mc4) + * mc5 | + * mc6 |- GPU card[2] => node 0(mc5), node 1(mc6) + * mc7 | + * mc8 |- GPU card[3] => node 0(mc7), node 1(mc8) + * + * sysfs entries will be populated as below: + * + * CPU # CPU node + * ├── mc 0 + * + * GPU Nodes are enumerated sequentially after CPU nodes are populated + * GPU card 1 # Each Aldebaran GPU has 2 nodes/mcs + * ├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs + * │   ├── csrow 0 # UMC 0 + * │   │   ├── channel 0 # Each UMC has 8 channels + * │   │   ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB + * │   │   ├── channel 2 + * │   │   ├── channel 3 + * │   │   ├── channel 4 + * │   │   ├── channel 5 + * │   │   ├── channel 6 + * │   │   ├── channel 7 + * │   ├── csrow 1 # UMC 1 + * │   │   ├── channel 0 + * │   │   ├── .. + * │   │   ├── channel 7 + * │   ├── .. .. + * │   ├── csrow 3 # UMC 3 + * │   │   ├── channel 0 + * │   │   ├── .. + * │   │   ├── channel 7 + * │   ├── rank 0 + * │   ├── .. .. + * │   ├── rank 31 # total 32 ranks/dimms from 4 UMCs + * ├ + * ├── mc 2 # GPU node 1 == mc2 + * │   ├── .. # each GPU has total 64 GB + * + * GPU card 2 + * ├── mc 3 + * │   ├── .. + * ├── mc 4 + * │   ├── .. + * + * GPU card 3 + * ├── mc 5 + * │   ├── .. + * ├── mc 6 + * │   ├── .. + * + * GPU card 4 + * ├── mc 7 + * │   ├── .. + * ├── mc 8 + * │   ├── .. + * + * + * Heterogeneous hardware details for above context as below: + * - The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC. + * They have chip selects (csrows) and channels. However, the layouts are different + * for performance, physical layout, or other reasons. + * - CPU UMCs use 1 channel. So we say UMC = EDAC Channel. This follows the + * marketing speak, example. CPU has X memory channels, etc. + * - CPU UMCs use up to 4 chip selects. So we say UMC chip select = EDAC CSROW. + * - GPU UMCs use 1 chip select. So we say UMC = EDAC CSROW. + * - GPU UMCs use 8 channels. So we say UMC Channel = EDAC Channel. + */ From patchwork Thu Feb 3 17:49:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734496 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA83DC4332F for ; Thu, 3 Feb 2022 17:50:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353158AbiBCRuj (ORCPT ); Thu, 3 Feb 2022 12:50:39 -0500 Received: from mail-dm6nam12on2077.outbound.protection.outlook.com ([40.107.243.77]:20992 "EHLO NAM12-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353102AbiBCRua (ORCPT ); Thu, 3 Feb 2022 12:50:30 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dOfIkk71lkwRzRBSNvGycrxhAOTlLIGcYlHmNeAYKThrKCpu3w7CLugX1HWjOeW8ZIxN9kIYVxmd26MX0aPjkIfFb528l4hop8FLKg1hROLU8p3oGtsmVH63CbvDrnbZBDWTSDabgffnq3Xk4KhuzoMo9YjibL/cXayXJsqxGMcDmyo+Ej96vWAS7wvJk4FZu0BM6qkBB7Ra5JNY+mi2GkzOzAojwVV92vopL7Kyc/IauiQBYrHsWjLBS6U9pBWJqbJFvPJrnA18bvLUzaKOkPMp2wjfhks1ijzG9h3PTr+TvRhbeeoxnVyydYr4Yp12bAXRIBYqvE4sflh02B9Nig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AoSjIfLDHJWcguDs0+rp/3HRa+VtjUJfsc/r7bRSYMU=; b=doGUoXAfyYw2puEbO0qVG8g4tfaaEdtLmfqb5RADx+91pHZJsmx41YMsG+TL2oxPbrKfjBP4Rr2FVbQp4YaAlzsVpnx3o5byglEdHq1BfafiYE9/P9MKfxdNizS/f8b7+G0154fZwSyNb3Z505W2em5izmXZUM/NICcQg2XGZ9o+EuFiWXziGBi0JCJC3ffAfgj+wykdYx1EOhRwnDCEv6FZNU278gsUIHjCwNahWEMtNvUZMjzgwZhgiPwGEBQZ2DJXEdoE0pbaX6JqpRLu4QB2aXKPtBjmSP7RQDsAZFuAW5ruOMQjWJc9Y8v5f0YiEpPpxcKRNIsp5ah104f+Ng== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AoSjIfLDHJWcguDs0+rp/3HRa+VtjUJfsc/r7bRSYMU=; b=1DyoenPDaHoHUX++9C/ecuROQxBnnW55gg9g54ixLd1xaO/zxGuXGr4PKME4smvVQM177eMvH0q1KKV6jImM5l7exEtdlGOODmln6M6qU0IjNb44g3Lo4z21BU5eqaYnc0UKvU49AraEfB9G5O00WlrYcrHnyN8uHQHiqoJTXkA= Received: from MW4PR02CA0030.namprd02.prod.outlook.com (2603:10b6:303:16d::18) by MN2PR12MB2973.namprd12.prod.outlook.com (2603:10b6:208:cc::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.19; Thu, 3 Feb 2022 17:50:27 +0000 Received: from CO1NAM11FT007.eop-nam11.prod.protection.outlook.com (2603:10b6:303:16d:cafe::e6) by MW4PR02CA0030.outlook.office365.com (2603:10b6:303:16d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT007.mail.protection.outlook.com (10.13.174.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:26 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:23 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v7 02/12] x86/amd_nb: Add support for northbridges on Aldebaran Date: Thu, 3 Feb 2022 11:49:32 -0600 Message-ID: <20220203174942.31630-3-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a7f036b3-910d-492a-c0ff-08d9e73da94e X-MS-TrafficTypeDiagnostic: MN2PR12MB2973:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:393; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gn8Q7qjiwQ8mz5AyJj1O87CW3ANKl0O3lB7IOQoKNt0qanB3ZZpa1pOor/WIUkYC+jWp9/HYnuZTXHBSKUq0k6OXKB39mj9apFBj2bBAlLw6DIOndnp5D+OdEMvpwhqRsJk3ybOYzaef+ilbHGs2KtWvXJvKVnJyR+DStf6e9GQfuHdouBVLXOhE//dPL+DduHR5opLl7OTd5MrMbt0f6+xgxT9EWdU5fT4gI1C3ZK4hYvEkXax5oIOMpfkKsL3oKWlkPRiao/zq3xYUoLt7CuOYY5qLemdmN+7iXQMga13/xs0+Ro+jy4DaOQZsWqHz128pFc8VkP8W8QOltTRe3UJLhIFxtPAmC61T3ArnmXOFtNBfa62zFcJxPz+utSLs6t30bZX7edSjFjHLj/vMBjcc+e9G+pnBRtbJDekhvzIQIf34uqhn+ZmvbWMHspYFVs01xgM2l0jPhcsaWnZH6VjJCRlmXV1FTmMNvqYeAvFuQncwDkLQbldVXew4b4NYqg9koj7tPb26LRrlOlX+R0carzvV2v2hqKZjuH0OLUzTBkkdTvpbnMzUFJ5fGz9tiBCBIFrNzjGy0ouOwsw/4+vG+IsMKfXKT/zxwyaJ2laDC0eGLSi2cHheLLYpmF0h9azQ9wjJcrFk+WgrbFjYHHLj6eibdPWeuA/QJsFTlGDb6bYx86fP6olYL30dEL1k0fYfMHZfIn5KmG7Nxw8Ryzwk4gldDZe4KQimeMQiakC5tbmedU7lqg7UDm4sbH6hLjSDBHMynPAdY8NGiJ1XzwsFXdpX2gyyM/MT/EMpb8I= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(40470700004)(46966006)(4326008)(70206006)(110136005)(40460700003)(81166007)(8936002)(356005)(6666004)(36860700001)(54906003)(7696005)(70586007)(8676002)(966005)(2906002)(186003)(26005)(82310400004)(426003)(2616005)(508600001)(47076005)(36756003)(1076003)(5660300002)(336012)(316002)(83380400001)(16526019)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:26.7449 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a7f036b3-910d-492a-c0ff-08d9e73da94e X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT007.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB2973 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K On newer systems the CPUs manage MCA errors reported from the GPUs. Enumerate the GPU nodes with the AMD NB framework to support EDAC. GPU nodes are enumerated in sequential order based on the PCI hierarchy, and the first GPU node is assumed to have an "AMD Node ID" value after CPU Nodes are fully populated. Aldebaran is an AMD GPU, GPU drivers are part of the DRM framework https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html Each Aldebaran GPU has 2 Data Fabrics, which are enumerated as 2 nodes. With this implementation detail, the Data Fabric on the GPU nodes can be accessed the same way as the Data Fabric on CPU nodes. Handled new device support and enumeration by calling separate function in init_amd_nbs with completely separate data structures. Signed-off-by: Muralidhara M K Co-developed-by: Naveen Krishna Chatradhi Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20211028130106.15701-2-nchatrad@amd.com v6->v7: * API amd_gpu_node_start_id() introduced to reuse in future patches. v5->v6: * Defined seperate structure for GPU NBs and handled GPu enumearation seperately by defining new function amd_cache_gpu_devices * Defined amd_get_gpu_node_id which will be used by mce module v4->v5: * Modified amd_get_node_map() and checking return value v3->v4: * renamed struct name from nmap to nodemap * split amd_get_node_map and addressed minor comments v2->v3: * Use word "gpu" instead of "noncpu" in the patch * Do not create pci_dev_ids arrays for gpu nodes * Identify the gpu node start index from DF18F1 registers on the GPU nodes. Export cpu node count and gpu start node id v1->v2: * Added Reviewed-by Yazen Ghannam v0->v1 * Modified the commit message and comments in the code * Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs" arch/x86/include/asm/amd_nb.h | 9 ++ arch/x86/kernel/amd_nb.c | 149 +++++++++++++++++++++++++++++++++- include/linux/pci_ids.h | 1 + 3 files changed, 155 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h index 00d1a400b7a1..cb8bc59d17a0 100644 --- a/arch/x86/include/asm/amd_nb.h +++ b/arch/x86/include/asm/amd_nb.h @@ -73,6 +73,12 @@ struct amd_northbridge_info { struct amd_northbridge *nb; }; +struct amd_gpu_nb_info { + u16 gpu_num; + struct amd_northbridge *gpu_nb; + u16 gpu_node_start_id; +}; + #define AMD_NB_GART BIT(0) #define AMD_NB_L3_INDEX_DISABLE BIT(1) #define AMD_NB_L3_PARTITIONING BIT(2) @@ -80,8 +86,11 @@ struct amd_northbridge_info { #ifdef CONFIG_AMD_NB u16 amd_nb_num(void); +u16 amd_gpu_nb_num(void); +u16 amd_gpu_node_start_id(void); bool amd_nb_has_feature(unsigned int feature); struct amd_northbridge *node_to_amd_nb(int node); +int amd_get_gpu_node_system_id(u64 ipid); static inline u16 amd_pci_dev_to_node_id(struct pci_dev *pdev) { diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 020c906f7934..dfa7c7516321 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -20,6 +20,7 @@ #define PCI_DEVICE_ID_AMD_17H_M30H_ROOT 0x1480 #define PCI_DEVICE_ID_AMD_17H_M60H_ROOT 0x1630 #define PCI_DEVICE_ID_AMD_19H_M10H_ROOT 0x14a4 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb #define PCI_DEVICE_ID_AMD_17H_DF_F4 0x1464 #define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494 @@ -30,6 +31,14 @@ #define PCI_DEVICE_ID_AMD_19H_M40H_ROOT 0x14b5 #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4 + +/* GPU Data Fabric ID Device 24 Function 1 */ +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1 0x14d1 + +/* DF18xF1 register on Aldebaran GPU */ +#define REG_LOCAL_NODE_TYPE_MAP 0x144 + /* Protect the PCI config register pairs used for SMN. */ static DEFINE_MUTEX(smn_mutex); @@ -104,6 +113,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = { {} }; +static const struct pci_device_id amd_gpu_root_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) }, + {} +}; + +static const struct pci_device_id amd_gpu_misc_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) }, + {} +}; + +static const struct pci_device_id amd_gpu_link_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) }, + {} +}; + const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = { { 0x00, 0x18, 0x20 }, { 0xff, 0x00, 0x20 }, @@ -112,6 +136,8 @@ const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = { }; static struct amd_northbridge_info amd_northbridges; +/* GPU specific structure declaration */ +static struct amd_gpu_nb_info amd_gpu_nbs; u16 amd_nb_num(void) { @@ -119,6 +145,20 @@ u16 amd_nb_num(void) } EXPORT_SYMBOL_GPL(amd_nb_num); +/* Total number of GPU nbs in a system */ +u16 amd_gpu_nb_num(void) +{ + return amd_gpu_nbs.gpu_num; +} +EXPORT_SYMBOL_GPL(amd_gpu_nb_num); + +/* Start index of hardware provided GPU node ID */ +u16 amd_gpu_node_start_id(void) +{ + return amd_gpu_nbs.gpu_node_start_id; +} +EXPORT_SYMBOL_GPL(amd_gpu_node_start_id); + bool amd_nb_has_feature(unsigned int feature) { return ((amd_northbridges.flags & feature) == feature); @@ -127,10 +167,60 @@ EXPORT_SYMBOL_GPL(amd_nb_has_feature); struct amd_northbridge *node_to_amd_nb(int node) { - return (node < amd_northbridges.num) ? &amd_northbridges.nb[node] : NULL; + if (node < amd_northbridges.num) + return &amd_northbridges.nb[node]; + else if (node >= amd_northbridges.num && + node < amd_gpu_nbs.gpu_num + amd_northbridges.num) + return &amd_gpu_nbs.gpu_nb[node - amd_northbridges.num]; + else + return NULL; } EXPORT_SYMBOL_GPL(node_to_amd_nb); +/* + * On SMCA banks of the GPU nodes, the hardware node id information is + * available in register MCA_IPID[47:44](InstanceIdHi). + * + * Convert the hardware node ID to a value used by linux where + * GPU nodes are sequentially after the CPU nodes. + */ +int amd_get_gpu_node_system_id(u64 ipid) +{ + return ((ipid >> 44 & 0xF) - amd_gpu_node_start_id() + + amd_northbridges.num); +} +EXPORT_SYMBOL_GPL(amd_get_gpu_node_system_id); + +/* + * AMD CPUs and GPUs whose data fabrics can be connected via custom xGMI + * links, come with registers to gather local and remote node type map info. + * + * This function, reads the register in DF function 1 from "Local Node Type" + * which refers to GPU node. + */ +static int amd_gpu_df_nodemap(void) +{ + struct pci_dev *pdev; + u32 tmp; + + pdev = pci_get_device(PCI_VENDOR_ID_AMD, + PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, NULL); + if (!pdev) { + pr_debug("DF Func1 PCI device not found on this node.\n"); + return -ENODEV; + } + + if (pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp)) + goto out; + amd_gpu_nbs.gpu_node_start_id = tmp & 0xFFF; + + return 0; +out: + pr_warn("PCI config access failed\n"); + pci_dev_put(pdev); + return -ENODEV; +} + static struct pci_dev *next_northbridge(struct pci_dev *dev, const struct pci_device_id *ids) { @@ -147,7 +237,7 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write) struct pci_dev *root; int err = -ENODEV; - if (node >= amd_northbridges.num) + if (node >= amd_northbridges.num + amd_gpu_nbs.gpu_num) goto out; root = node_to_amd_nb(node)->root; @@ -210,14 +300,14 @@ int amd_cache_northbridges(void) } misc = NULL; - while ((misc = next_northbridge(misc, misc_ids)) != NULL) + while ((misc = next_northbridge(misc, misc_ids))) misc_count++; if (!misc_count) return -ENODEV; root = NULL; - while ((root = next_northbridge(root, root_ids)) != NULL) + while ((root = next_northbridge(root, root_ids))) root_count++; if (root_count) { @@ -292,6 +382,56 @@ int amd_cache_northbridges(void) } EXPORT_SYMBOL_GPL(amd_cache_northbridges); +int amd_cache_gpu_devices(void) +{ + const struct pci_device_id *misc_ids = amd_gpu_misc_ids; + const struct pci_device_id *link_ids = amd_gpu_link_ids; + const struct pci_device_id *root_ids = amd_gpu_root_ids; + struct pci_dev *root, *misc, *link; + struct amd_northbridge *gpu_nb; + u16 misc_count = 0; + u16 root_count = 0; + int ret; + u16 i; + + if (amd_gpu_nbs.gpu_num) + return 0; + + ret = amd_gpu_df_nodemap(); + if (ret) + return ret; + + misc = NULL; + while ((misc = next_northbridge(misc, misc_ids))) + misc_count++; + + if (!misc_count) + return -ENODEV; + + root = NULL; + while ((root = next_northbridge(root, root_ids))) + root_count++; + + gpu_nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); + if (!gpu_nb) + return -ENOMEM; + + amd_gpu_nbs.gpu_nb = gpu_nb; + amd_gpu_nbs.gpu_num = misc_count; + + link = NULL, misc = NULL, root = NULL; + for (i = amd_northbridges.num; i < (amd_northbridges.num + amd_gpu_nbs.gpu_num); i++) { + node_to_amd_nb(i)->root = root = + next_northbridge(root, root_ids); + node_to_amd_nb(i)->misc = misc = + next_northbridge(misc, misc_ids); + node_to_amd_nb(i)->link = link = + next_northbridge(link, link_ids); + } + return 0; +} +EXPORT_SYMBOL_GPL(amd_cache_gpu_devices); + /* * Ignores subdevice/subvendor but as far as I can figure out * they're useless anyways @@ -497,6 +637,7 @@ static __init int init_amd_nbs(void) { amd_cache_northbridges(); amd_cache_gart(); + amd_cache_gpu_devices(); fix_erratum_688(); diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index aad54c666407..27fad5e1bf80 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -558,6 +558,7 @@ #define PCI_DEVICE_ID_AMD_19H_M10H_DF_F3 0x14b0 #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F3 0x167c #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3 #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 #define PCI_DEVICE_ID_AMD_LANCE 0x2000 #define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001 From patchwork Thu Feb 3 17:49:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734494 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05928C43219 for ; Thu, 3 Feb 2022 17:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353164AbiBCRuj (ORCPT ); Thu, 3 Feb 2022 12:50:39 -0500 Received: from mail-mw2nam08on2089.outbound.protection.outlook.com ([40.107.101.89]:4577 "EHLO NAM04-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353100AbiBCRua (ORCPT ); Thu, 3 Feb 2022 12:50:30 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ObHU79N4dAzeTOZ5kwDJM6vYGMBZJbMWKKz6O4CV4lVCXnY74Wxd2cEd3YcDaBID5v0dxKOyAcRNg72qaLljKvoRh4rIH4vsdgsalBjU0BlsqaeMhZv9OLvSoJgtiLNicZuyzGqF0pC4t6OGvcOfCJb5t48W0HSG7Ie9pYA4ppwGg/zqxWhGFvp8M8e06DCqCm5qnUvXti3HZJi4+LHFQ6EC6PFNPieF/2cfeTOS08Ee66Jo8G+5Ah+RmbQ3n8sU76qVVNng6bgeU/XXPvNTyUEIvq5Qmng4gwDTlz3dsO+lVYYou+hGpaz++cKfQxuZNBjOeXzW65vTNPQ+no94NA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dacqSjElAAlHBGIucZMEYpHNvlXdyVmNL46AKxuFSn0=; b=iRfqdJl50fMK3HGoLHuvGxHEB4cKCVBqsjxZaeQRlVa6MVqPAWlEuLV/4cmZdP6cAR78ZCxiqHaSEqhYFkqyMK5X6798XQHBBW/TVNJzESVHGPr83/flp3rh6IaM2du9RkFwNu3th3eVEXlTet60FFeoK9YiATkI15kfvM7aUvQLbEWITx1DN1eqwQNbIOl8pqK+AhbNlv9Sqmh41VXJhMVymEORSulbeECkrb7wiO2mzAcF88PrbzLg8k7UPwAlllM06JNBjCSevG8ffHlPcg0ug4Sjgv3PjoZOu0eD64LXleWt4BRgj2NWsPAlbwUi6DvHpSxY14Fggc7jm+iKbA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dacqSjElAAlHBGIucZMEYpHNvlXdyVmNL46AKxuFSn0=; b=IbySyt4a1LakUry54vof43pmNGt3MLhxvoJ8qZcFZDxTl9ng1RukXMFrCB2jwjmST5dau0PAFjHSjXirk3UdS5tWj84H7jifPFfUDWE2a3+VkDenWe/mveKQ5P53eV75itNneLQGyBrob+v7odfm6m31qpQKi4ApFbQkjtEzBCU= Received: from MW4PR02CA0022.namprd02.prod.outlook.com (2603:10b6:303:16d::15) by MW3PR12MB4554.namprd12.prod.outlook.com (2603:10b6:303:55::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:28 +0000 Received: from CO1NAM11FT007.eop-nam11.prod.protection.outlook.com (2603:10b6:303:16d:cafe::6) by MW4PR02CA0022.outlook.office365.com (2603:10b6:303:16d::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT007.mail.protection.outlook.com (10.13.174.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:28 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:24 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi , Muralidhara M K Subject: [PATCH v7 03/12] EDAC/mce_amd: Extract node id from MCA_IPID Date: Thu, 3 Feb 2022 11:49:33 -0600 Message-ID: <20220203174942.31630-4-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ba715e63-87dd-487c-9c0c-08d9e73daa52 X-MS-TrafficTypeDiagnostic: MW3PR12MB4554:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1060; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Nzu4RkH5goZzEuB4Ag8KcnMBuwfZ04MYuVLcN++dbNdoMex6QXA69atfYrM6DbZ/zItzbMAyanzpZnQW8P+Pcw+mBCBOneCsTSXEJBRpYtZHfmup9iAD3YjYjZIMrEZ1tBQtqtYRQ4pdZkkY6jQM+ETPbkkMi/zlK96Rk7E7F3bn87PnpewutGZGFZpwDNZXCySzb/0Cz3MyFpcWOWxf86UAI0B914tnR5Z9vWo1aFW/JHj+Y5ymtledfjx167nI23t4MAbZjnxekii1ePc64R0FSrq1RvO3jGj5iRKUa5JyUGsvOkwBE4oFF+w26Q35Ns8D1z1hS+duA20K4ascQWN8fMo/0DBTCbC63bxYKmDbc190nwkZyPexWsDxEuYVB7dyxyk0fvDhS1n5ROPNRX25Mapj6mCCwkoDLaNA2s7Pm7vLZjV/h4lSjHnC2sAebMP8G4Mv92xm3uRDMidoXaBWOUPnJUf6C2H2BMpM5vhLBkzou+HQWJ9x4L2rQRafqBfduBwy28qy3tpghL5UJg2kGGrXvwpDYtc85Zmco+mpkivzLYNVvjCbRbdQmNolXQ0ae+y2SNv0Q+WRIAPtiA2/QeXnncya/3ZF0K6FQoNYN49sFYxCtEli3wSo4X46RgxdS+TaOFU0B7R70IdxjNGMDHHlsR2uvGpKEEax9yfyNPAMqfxFIGaHb0rQ/mQwhEunajxUrZvKv5gaq+pxwcfJQ8fAshKcLe1bcGSEurtGsuUvV/fvqdDlEshYKHVBflY+7ML6ho0uFWoqqk4E5xgCqf/JhHMXpojntbOKvAQ= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(426003)(6666004)(8936002)(70206006)(8676002)(336012)(5660300002)(70586007)(4326008)(16526019)(26005)(966005)(1076003)(7696005)(508600001)(83380400001)(186003)(82310400004)(2906002)(36860700001)(81166007)(356005)(2616005)(316002)(47076005)(110136005)(54906003)(40460700003)(36756003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:28.4322 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ba715e63-87dd-487c-9c0c-08d9e73daa52 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT007.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4554 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On SMCA banks of the GPU nodes, the node id information is available in register MCA_IPID[47:44](InstanceIdHi). Convert the hardware node ID to a value used by Linux where GPU nodes are sequentially after the CPU nodes. Co-developed-by: Muralidhara M K Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20211028130106.15701-3-nchatrad@amd.com v6->v7 * None v5->v6: * Called amd_get_gpu_node_id function to get node_id v4->v5: * None v3->v4: * Add reviewed by Yazen v2->v3: * Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count. Which is required to map the hardware node id to node id enumerated by Linux. v1->v2: * Modified subject and commit message * Added Reviewed by Yazen Ghannam v0->v1: * Modified the commit message * Rearranged the conditions before calling decode_dram_ecc() drivers/edac/mce_amd.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index cc5c63feb26a..865a925ccef0 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -2,6 +2,7 @@ #include #include +#include #include #include "mce_amd.h" @@ -1186,8 +1187,26 @@ static void decode_smca_error(struct mce *m) if (xec < smca_mce_descs[bank_type].num_descs) pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) - decode_dram_ecc(topology_die_id(m->extcpu), m); + if (xec == 0 && decode_dram_ecc) { + int node_id = 0; + + if (bank_type == SMCA_UMC) { + node_id = topology_die_id(m->extcpu); + } else if (bank_type == SMCA_UMC_V2) { + /* + * SMCA_UMC_V2 exists on GPU nodes, extract the node id + * from register MCA_IPID[47:44](InstanceIdHi). + * The InstanceIdHi field represents the instance ID of the GPU. + * Which needs to be mapped to a value used by Linux, + * where GPU nodes are simply numerically after the CPU nodes. + */ + node_id = amd_get_gpu_node_system_id(m->ipid); + } else { + return; + } + + decode_dram_ecc(node_id, m); + } } static inline void amd_decode_err_code(u16 ec) From patchwork Thu Feb 3 17:49:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2E59C433FE for ; Thu, 3 Feb 2022 17:50:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353141AbiBCRug (ORCPT ); Thu, 3 Feb 2022 12:50:36 -0500 Received: from mail-bn8nam11on2065.outbound.protection.outlook.com ([40.107.236.65]:11873 "EHLO NAM11-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353111AbiBCRuc (ORCPT ); Thu, 3 Feb 2022 12:50:32 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=I58p0OGsu7SJgiSDq6lw9XpFSOZKUDpmjifiKuK9IeOry0G8M+/p37+jeZCeGjHF072+dE5ePbHG34qwJWS+du+6kYGvWHuB3hUfDX9srYLCDee74KfEcFlCuk+gsFFKEtkEtC6gUo6weUdceju+FAXEvInT9SFTvG3eoe3z8t/Q8fh9RoXDsgJB7/9UD8mdDVjjz0+/OK7vd1oUmTxh9a268aOMza2eIUjx8Ty7LOBNhw3O38OGsi1De93HkL5MJUHellqdC3NdrLIeV6GkmwUBsLbRaaFumSCx2QHfLkT8PfXpvdWQVUImFj+AWrBv/mXg02O85zDLm+hyjCp79g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OHnfsBmE4N6OLz+pgjV7A7BZmQF1PrIQYQWIQZoL2iU=; b=Zl1Rwh88oVzfQAHhobAO6a98YzMw3FtC6TWLTc6JDdQnfLtrtWKRGTpaTePaoE401km0BCPxoD0PmBjXsdwRr9pShJZQx+yTdfcg1XAm/7GA2pW3u8ulZzxVWEhyLa2pNubD4dS58Dnu5UJ5633LkEDdbsmy+jIAJ6K6DKIynufanwCZf+Cd4F3aa9tR6J/tso3agvOpfVYVMpY/0A9aGiypRBnLAgOyXw7uyi2qs9PZHa0egXSx/saPs82oHIQpJs9+n7f5nzJ/QunaFfantnFdchXS/V4xu12L7KLdWvthvVXJkUClOOJNSIOqtg3tsSLsNkSi5QUTfdU0TXSmcQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OHnfsBmE4N6OLz+pgjV7A7BZmQF1PrIQYQWIQZoL2iU=; b=UhVs4Yw0d7oSbIPAqqAaN7MLulU8k6gbqVKdC6niqe1IG4mtZ3zgU+1WPcFiz9osbHaFwoNHxvm9IqBXSvj+rHmo8xV+Jl67k83FBspstavsQdIpd7lMk4JVZSSz52q3HX1pODMk9xbuyhvuRt3rm0gAvTELWgp5KYlm6vBhDX4= Received: from MW4PR04CA0311.namprd04.prod.outlook.com (2603:10b6:303:82::16) by DM5PR12MB1756.namprd12.prod.outlook.com (2603:10b6:3:108::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:29 +0000 Received: from CO1NAM11FT019.eop-nam11.prod.protection.outlook.com (2603:10b6:303:82:cafe::d8) by MW4PR04CA0311.outlook.office365.com (2603:10b6:303:82::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.11 via Frontend Transport; Thu, 3 Feb 2022 17:50:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT019.mail.protection.outlook.com (10.13.175.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:28 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:24 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K Subject: [PATCH v7 04/12] EDAC/amd64: Move struct fam_type variables into amd64_pvt structure Date: Thu, 3 Feb 2022 11:49:34 -0600 Message-ID: <20220203174942.31630-5-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6b84956f-cc56-41de-99d8-08d9e73daaa1 X-MS-TrafficTypeDiagnostic: DM5PR12MB1756:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1148; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: HckCeAxpttPxg+y2RharpEll+EXU9TaNpTB43inC+66M3lim59BqBun/dUC9KuIepJ705lu7QKrP1uqRGdF+oDDj2KUdja+glmH68E/XOYjxlmye9k75Q/jqNxrI6e+7Ts8DWjWHiOGMA2/TcZ/1dM42unf00nE86Yxd0m3H+O+lp2PrRzUqbrl/VQhUizmkpPZLkw/RI89rhkphSE6qReSI7MZFRrUMEXKK8ARAcQMIDAlyO4VzEKpeubaBKf48Oex0aot2FvSLJ48ngbKP9MF/cUNtvsvjsVJ9xeBdprcWH189Qisnmb8dyVMzJgKfdgbSgKoUIJjc/E4vKgb+eOuVGVsIJS0Ux7Zj80w+DVf6LMIR3SoLZhjoo1AYnjy6QpxNq4mQOMjCIgLRNZvRCiDaHS7LXm55KcjFfXTiTwlNW3V7TFM3xdBYFmaaHqfqhGzOUOKFmu0DyZrRswB1BLJ75W9bPal9ujbwaroWwH70b+jRhuAw+tLzm4r7CcjF3de+C9UKAJkjOIP8mNEWfqxccIME+qw9IQV1kPqp/mUwlUkrOviPp6EaaJgFTmwZaweApIf5BtXRCHyhqnImZQGwmS/dtUuE5Iub8+XJ4c6SxVVT6E5deLOx/1zEier4+oLVKNM5mvYCn4zH99oHFg6+eilL6r8R6ijK0ByeS1UfVA5PmPHXuOywf62HaIcMjDKq7EXgFtq2BPcYK/IusR0iazfFM/Xs3IXyaAjxBQa5h2SpHkXt4aTxqmnptmq6H3yhhAgWwpjadwD/z16FyVvRoYiGdnzLFL7NJgIqT7313wpuOE7JYDCXDoQKodXSY4MGCWso7uWB/3HM2OjP7w== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(36840700001)(46966006)(336012)(83380400001)(26005)(1076003)(186003)(16526019)(2616005)(426003)(47076005)(36756003)(36860700001)(316002)(2906002)(30864003)(5660300002)(8936002)(4326008)(8676002)(40460700003)(82310400004)(70586007)(70206006)(508600001)(81166007)(54906003)(6666004)(7696005)(966005)(110136005)(356005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:28.9499 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6b84956f-cc56-41de-99d8-08d9e73daaa1 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT019.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB1756 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K On heterogeneous systems, the GPU nodes are probed after the CPU nodes and will overwrites the family type set by CPU nodes. Removed struct fam_type and made all family-specific assignments dynamic and get rid of static definitions of family_types and ops, This would simplify adding support for future platforms. Signed-off-by: Muralidhara M K --- Link: https://lkml.kernel.org/r/20211028130106.15701-5-nchatrad@amd.com v6->v7: * rebased on top of\ https://lore.kernel.org/all/20220202144307.2678405-1-yazen.ghannam@amd.com/ v4->v5: * Added reviewed by Yazen v1->v4: * New change in v4 drivers/edac/amd64_edac.c | 384 +++++++++++++------------------------- drivers/edac/amd64_edac.h | 64 +++---- 2 files changed, 153 insertions(+), 295 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 782e286d5390..4cac43840ccc 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -13,11 +13,9 @@ module_param(ecc_enable_override, int, 0644); static struct msr __percpu *msrs; -static struct amd64_family_type *fam_type; - -static inline u32 get_umc_reg(u32 reg) +static inline u32 get_umc_reg(u32 reg, struct amd64_pvt *pvt) { - if (!fam_type->flags.zn_regs_v2) + if (!pvt->flags.zn_regs_v2) return reg; switch (reg) { @@ -463,7 +461,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct, for (i = 0; i < pvt->csels[dct].m_cnt; i++) #define for_each_umc(i) \ - for (i = 0; i < fam_type->max_mcs; i++) + for (i = 0; i < pvt->max_mcs; i++) /* * @input_addr is an InputAddr associated with the node given by mci. Return the @@ -1859,7 +1857,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt) if (umc->dram_type == MEM_LRDDR4 || umc->dram_type == MEM_LRDDR5) { amd_smn_read(pvt->mc_node_id, - umc_base + get_umc_reg(UMCCH_ADDR_CFG), + umc_base + get_umc_reg(UMCCH_ADDR_CFG, pvt), &tmp); edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n", i, 1 << ((tmp >> 4) & 0x3)); @@ -1935,7 +1933,7 @@ static void prep_chip_selects(struct amd64_pvt *pvt) for_each_umc(umc) { pvt->csels[umc].b_cnt = 4; - pvt->csels[umc].m_cnt = fam_type->flags.zn_regs_v2 ? 4 : 2; + pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2; } } else { @@ -1975,7 +1973,7 @@ static void read_umc_base_mask(struct amd64_pvt *pvt) } umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK; - umc_mask_reg_sec = get_umc_base(umc) + get_umc_reg(UMCCH_ADDR_MASK_SEC); + umc_mask_reg_sec = get_umc_base(umc) + get_umc_reg(UMCCH_ADDR_MASK_SEC, pvt); for_each_chip_select_mask(cs, umc, pvt) { mask = &pvt->csels[umc].csmasks[cs]; @@ -2046,7 +2044,7 @@ static void read_dct_base_mask(struct amd64_pvt *pvt) } } -static void _determine_memory_type_df(struct amd64_umc *umc) +static void _determine_memory_type_df(struct amd64_pvt *pvt, struct amd64_umc *umc) { if (!(umc->sdp_ctrl & UMC_SDP_INIT)) { umc->dram_type = MEM_EMPTY; @@ -2057,7 +2055,7 @@ static void _determine_memory_type_df(struct amd64_umc *umc) * Check if the system supports the "DDR Type" field in UMC Config * and has DDR5 DIMMs in use. */ - if (fam_type->flags.zn_regs_v2 && ((umc->umc_cfg & GENMASK(2, 0)) == 0x1)) { + if (pvt->flags.zn_regs_v2 && ((umc->umc_cfg & GENMASK(2, 0)) == 0x1)) { if (umc->dimm_cfg & BIT(5)) umc->dram_type = MEM_LRDDR5; else if (umc->dimm_cfg & BIT(4)) @@ -2082,7 +2080,7 @@ static void determine_memory_type_df(struct amd64_pvt *pvt) for_each_umc(i) { umc = &pvt->umc[i]; - _determine_memory_type_df(umc); + _determine_memory_type_df(pvt, umc); edac_dbg(1, " UMC%d DIMM type: %s\n", i, edac_mem_types[umc->dram_type]); } } @@ -2648,7 +2646,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, */ dimm = csrow_nr >> 1; - if (!fam_type->flags.zn_regs_v2) + if (!pvt->flags.zn_regs_v2) cs_mask_nr >>= 1; /* Asymmetric dual-rank DIMM support. */ @@ -3268,167 +3266,6 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) } } -static struct amd64_family_type family_types[] = { - [K8_CPUS] = { - .ctl_name = "K8", - .f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP, - .f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL, - .max_mcs = 2, - .ops = { - .early_channel_count = k8_early_channel_count, - .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow, - .dbam_to_cs = k8_dbam_to_chip_select, - } - }, - [F10_CPUS] = { - .ctl_name = "F10h", - .f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP, - .f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f10_dbam_to_chip_select, - } - }, - [F15_CPUS] = { - .ctl_name = "F15h", - .f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_dbam_to_chip_select, - } - }, - [F15_M30H_CPUS] = { - .ctl_name = "F15h_M30h", - .f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } - }, - [F15_M60H_CPUS] = { - .ctl_name = "F15h_M60h", - .f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_m60h_dbam_to_chip_select, - } - }, - [F16_CPUS] = { - .ctl_name = "F16h", - .f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } - }, - [F16_M30H_CPUS] = { - .ctl_name = "F16h_M30h", - .f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } - }, - [F17_CPUS] = { - .ctl_name = "F17h", - .f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M10H_CPUS] = { - .ctl_name = "F17h_M10h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M30H_CPUS] = { - .ctl_name = "F17h_M30h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6, - .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M60H_CPUS] = { - .ctl_name = "F17h_M60h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M70H_CPUS] = { - .ctl_name = "F17h_M70h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F19_CPUS] = { - .ctl_name = "F19h", - .f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6, - .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F19_M10H_CPUS] = { - .ctl_name = "F19h_M10h", - .f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6, - .max_mcs = 12, - .flags.zn_regs_v2 = 1, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F19_M50H_CPUS] = { - .ctl_name = "F19h_M50h", - .f0_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, -}; - /* * These are tables of eigenvectors (one per line) which can be used for the * construction of the syndrome tables. The modified syndrome search algorithm @@ -3850,7 +3687,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) umc_base = get_umc_base(i); umc = &pvt->umc[i]; - amd_smn_read(nid, umc_base + get_umc_reg(UMCCH_DIMM_CFG), &umc->dimm_cfg); + amd_smn_read(nid, umc_base + get_umc_reg(UMCCH_DIMM_CFG, pvt), &umc->dimm_cfg); amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg); amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl); amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); @@ -4380,7 +4217,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) mci->edac_cap = determine_edac_cap(pvt); mci->mod_name = EDAC_MOD_STR; - mci->ctl_name = fam_type->ctl_name; + mci->ctl_name = pvt->ctl_name; mci->dev_name = pci_name(pvt->F3); mci->ctl_page_to_phys = NULL; @@ -4392,7 +4229,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) /* * returns a pointer to the family descriptor on success, NULL otherwise. */ -static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) +static void per_family_init(struct amd64_pvt *pvt) { pvt->ext_model = boot_cpu_data.x86_model >> 4; pvt->stepping = boot_cpu_data.x86_stepping; @@ -4401,109 +4238,150 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) switch (pvt->fam) { case 0xf: - fam_type = &family_types[K8_CPUS]; - pvt->ops = &family_types[K8_CPUS].ops; + pvt->ctl_name = "K8"; + pvt->f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP; + pvt->f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL; + pvt->max_mcs = 2; + pvt->ops->early_channel_count = k8_early_channel_count; + pvt->ops->map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow; + pvt->ops->dbam_to_cs = k8_dbam_to_chip_select; break; case 0x10: - fam_type = &family_types[F10_CPUS]; - pvt->ops = &family_types[F10_CPUS].ops; + pvt->ctl_name = "F10h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP; + pvt->f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM; + pvt->max_mcs = 2; + pvt->ops->early_channel_count = f1x_early_channel_count; + pvt->ops->map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow; + pvt->ops->dbam_to_cs = f10_dbam_to_chip_select; break; case 0x15: if (pvt->model == 0x30) { - fam_type = &family_types[F15_M30H_CPUS]; - pvt->ops = &family_types[F15_M30H_CPUS].ops; - break; + pvt->ctl_name = "F15h_M30h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2; + pvt->ops->dbam_to_cs = f16_dbam_to_chip_select; } else if (pvt->model == 0x60) { - fam_type = &family_types[F15_M60H_CPUS]; - pvt->ops = &family_types[F15_M60H_CPUS].ops; - break; - /* Richland is only client */ + pvt->ctl_name = "F15h_M60h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2; + pvt->ops->dbam_to_cs = f15_m60h_dbam_to_chip_select; } else if (pvt->model == 0x13) { - return NULL; + /* Richland is only client */ + return; } else { - fam_type = &family_types[F15_CPUS]; - pvt->ops = &family_types[F15_CPUS].ops; + pvt->ctl_name = "F15h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2; + pvt->ops->dbam_to_cs = f15_dbam_to_chip_select; } + pvt->max_mcs = 2; + pvt->ops->early_channel_count = f1x_early_channel_count; + pvt->ops->map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow; break; case 0x16: if (pvt->model == 0x30) { - fam_type = &family_types[F16_M30H_CPUS]; - pvt->ops = &family_types[F16_M30H_CPUS].ops; - break; + pvt->ctl_name = "F16h_M30h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2; + } else { + pvt->ctl_name = "F16h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2; } - fam_type = &family_types[F16_CPUS]; - pvt->ops = &family_types[F16_CPUS].ops; + pvt->max_mcs = 2; + pvt->ops->early_channel_count = f1x_early_channel_count; + pvt->ops->map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow; + pvt->ops->dbam_to_cs = f16_dbam_to_chip_select; break; case 0x17: if (pvt->model >= 0x10 && pvt->model <= 0x2f) { - fam_type = &family_types[F17_M10H_CPUS]; - pvt->ops = &family_types[F17_M10H_CPUS].ops; - df_ops = &df2_ops; - break; + pvt->ctl_name = "F17h_M10h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6; + pvt->max_mcs = 2; + df_ops = &df2_ops; } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) { - fam_type = &family_types[F17_M30H_CPUS]; - pvt->ops = &family_types[F17_M30H_CPUS].ops; - df_ops = &df3_ops; - break; + pvt->ctl_name = "F17h_M30h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6; + pvt->max_mcs = 8; + df_ops = &df3_ops; } else if (pvt->model >= 0x60 && pvt->model <= 0x6f) { - fam_type = &family_types[F17_M60H_CPUS]; - pvt->ops = &family_types[F17_M60H_CPUS].ops; - df_ops = &df3_ops; - break; + pvt->ctl_name = "F17h_M60h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6; + pvt->max_mcs = 2; + df_ops = &df3_ops; } else if (pvt->model >= 0x70 && pvt->model <= 0x7f) { - fam_type = &family_types[F17_M70H_CPUS]; - pvt->ops = &family_types[F17_M70H_CPUS].ops; - df_ops = &df3_ops; - break; + pvt->ctl_name = "F17h_M70h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6; + pvt->max_mcs = 2; + df_ops = &df3_ops; + } else { + pvt->ctl_name = "F17h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6; + pvt->max_mcs = 2; + df_ops = &df2_ops; } fallthrough; case 0x18: - fam_type = &family_types[F17_CPUS]; - pvt->ops = &family_types[F17_CPUS].ops; - df_ops = &df2_ops; - - if (pvt->fam == 0x18) - family_types[F17_CPUS].ctl_name = "F18h"; + pvt->ops->early_channel_count = f17_early_channel_count; + pvt->ops->dbam_to_cs = f17_addr_mask_to_cs_size; + + if (pvt->fam == 0x18) { + pvt->ctl_name = "F18h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6; + pvt->max_mcs = 2; + df_ops = &df2_ops; + } break; case 0x19: if (pvt->model >= 0x10 && pvt->model <= 0x1f) { - fam_type = &family_types[F19_M10H_CPUS]; - pvt->ops = &family_types[F19_M10H_CPUS].ops; - break; + pvt->ctl_name = "F19h_M10h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6; + pvt->max_mcs = 12; + pvt->flags.zn_regs_v2 = 1; } else if (pvt->model >= 0x20 && pvt->model <= 0x2f) { - fam_type = &family_types[F17_M70H_CPUS]; - pvt->ops = &family_types[F17_M70H_CPUS].ops; - fam_type->ctl_name = "F19h_M20h"; - df_ops = &df3_ops; - break; + pvt->ctl_name = "F19h_M20h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6; + pvt->max_mcs = 2; + df_ops = &df3_ops; } else if (pvt->model >= 0x50 && pvt->model <= 0x5f) { - fam_type = &family_types[F19_M50H_CPUS]; - pvt->ops = &family_types[F19_M50H_CPUS].ops; - fam_type->ctl_name = "F19h_M50h"; - break; + pvt->ctl_name = "F19h_M50h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F6; + pvt->max_mcs = 2; } else if (pvt->model >= 0xa0 && pvt->model <= 0xaf) { - fam_type = &family_types[F19_M10H_CPUS]; - pvt->ops = &family_types[F19_M10H_CPUS].ops; - fam_type->ctl_name = "F19h_MA0h"; - break; + pvt->ctl_name = "F19h_M10h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6; + pvt->max_mcs = 2; + } else { + pvt->ctl_name = "F19h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6; + pvt->max_mcs = 8; + df_ops = &df3_ops; } - fam_type = &family_types[F19_CPUS]; - pvt->ops = &family_types[F19_CPUS].ops; - family_types[F19_CPUS].ctl_name = "F19h"; - df_ops = &df3_ops; + pvt->ops->early_channel_count = f17_early_channel_count; + pvt->ops->dbam_to_cs = f17_addr_mask_to_cs_size; break; default: amd64_err("Unsupported family!\n"); - return NULL; + return; } - - return fam_type; } static const struct attribute_group *amd64_edac_attr_groups[] = { @@ -4520,15 +4398,15 @@ static int hw_info_get(struct amd64_pvt *pvt) int ret; if (pvt->fam >= 0x17) { - pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL); + pvt->umc = kcalloc(pvt->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL); if (!pvt->umc) return -ENOMEM; - pci_id1 = fam_type->f0_id; - pci_id2 = fam_type->f6_id; + pci_id1 = pvt->f0_id; + pci_id2 = pvt->f6_id; } else { - pci_id1 = fam_type->f1_id; - pci_id2 = fam_type->f2_id; + pci_id1 = pvt->f1_id; + pci_id2 = pvt->f2_id; } ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2); @@ -4574,7 +4452,7 @@ static int init_one_instance(struct amd64_pvt *pvt) * only one channel. Also, this simplifies handling later for the price * of a couple of KBs tops. */ - layers[1].size = fam_type->max_mcs; + layers[1].size = pvt->max_mcs; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); @@ -4604,7 +4482,7 @@ static bool instance_has_memory(struct amd64_pvt *pvt) bool cs_enabled = false; int cs = 0, dct = 0; - for (dct = 0; dct < fam_type->max_mcs; dct++) { + for (dct = 0; dct < pvt->max_mcs; dct++) { for_each_chip_select(cs, dct, pvt) cs_enabled |= csrow_enabled(cs, dct, pvt); } @@ -4633,10 +4511,12 @@ static int probe_one_instance(unsigned int nid) pvt->mc_node_id = nid; pvt->F3 = F3; + pvt->ops = kzalloc(sizeof(*pvt->ops), GFP_KERNEL); + if (!pvt->ops) + goto err_out; + ret = -ENODEV; - fam_type = per_family_init(pvt); - if (!fam_type) - goto err_enable; + per_family_init(pvt); ret = hw_info_get(pvt); if (ret < 0) @@ -4674,8 +4554,8 @@ static int probe_one_instance(unsigned int nid) goto err_enable; } - amd64_info("%s %sdetected (node %d).\n", fam_type->ctl_name, - (pvt->fam == 0xf ? + amd64_info("%s %sdetected (node %d).\n", pvt->ctl_name, + (pvt->fam == 0xf ? (pvt->ext_model >= K8_REV_F ? "revF or later " : "revE or earlier ") : ""), pvt->mc_node_id); diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 6a112270a84b..4e3f9755bc73 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -291,25 +291,6 @@ #define UMC_SDP_INIT BIT(31) -enum amd_families { - K8_CPUS = 0, - F10_CPUS, - F15_CPUS, - F15_M30H_CPUS, - F15_M60H_CPUS, - F16_CPUS, - F16_M30H_CPUS, - F17_CPUS, - F17_M10H_CPUS, - F17_M30H_CPUS, - F17_M60H_CPUS, - F17_M70H_CPUS, - F19_CPUS, - F19_M10H_CPUS, - F19_M50H_CPUS, - NUM_FAMILIES, -}; - /* Error injection control structure */ struct error_injection { u32 section; @@ -352,6 +333,16 @@ struct amd64_umc { enum mem_type dram_type; }; +struct amd64_family_flags { + /* + * Indicates that the system supports the new register offsets, etc. + * first introduced with Family 19h Model 10h. + */ + __u64 zn_regs_v2 : 1, + + __reserved : 63; +}; + struct amd64_pvt { struct low_ops *ops; @@ -394,6 +385,12 @@ struct amd64_pvt { /* x4, x8, or x16 syndromes in use */ u8 ecc_sym_sz; + const char *ctl_name; + u16 f0_id, f1_id, f2_id, f6_id; + /* Maximum number of memory controllers per die/node. */ + u8 max_mcs; + + struct amd64_family_flags flags; /* place to store error injection parameters prior to issue */ struct error_injection injection; @@ -479,30 +476,11 @@ struct ecc_settings { * functions and per device encoding/decoding logic. */ struct low_ops { - int (*early_channel_count) (struct amd64_pvt *pvt); - void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr, - struct err_info *); - int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct, - unsigned cs_mode, int cs_mask_nr); -}; - -struct amd64_family_flags { - /* - * Indicates that the system supports the new register offsets, etc. - * first introduced with Family 19h Model 10h. - */ - __u64 zn_regs_v2 : 1, - - __reserved : 63; -}; - -struct amd64_family_type { - const char *ctl_name; - u16 f0_id, f1_id, f2_id, f6_id; - /* Maximum number of memory controllers per die/node. */ - u8 max_mcs; - struct amd64_family_flags flags; - struct low_ops ops; + int (*early_channel_count)(struct amd64_pvt *pvt); + void (*map_sysaddr_to_csrow)(struct mem_ctl_info *mci, u64 sys_addr, + struct err_info *err); + int (*dbam_to_cs)(struct amd64_pvt *pvt, u8 dct, + unsigned int cs_mode, int cs_mask_nr); }; int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset, From patchwork Thu Feb 3 17:49:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6261DC433F5 for ; Thu, 3 Feb 2022 17:50:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353091AbiBCRue (ORCPT ); Thu, 3 Feb 2022 12:50:34 -0500 Received: from mail-dm6nam12on2084.outbound.protection.outlook.com ([40.107.243.84]:22273 "EHLO NAM12-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353108AbiBCRub (ORCPT ); Thu, 3 Feb 2022 12:50:31 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Uxtp5IWkVOBakrZ2BUQm0vNq8tr4tW3AjNwiZwJ0V9DdKnUP/qF1yj0Zrq6Pl3DT6ivZey5ibgURckt+ZEOyj8nAMnE+ut83qmgPioxRmzFuNtoysLrwjKCCYyaPb9IcWd2l2VCWUMtKsSy6deH2JsqZnLQBhytZDwpwo9z/7vI4+14dNm8hn/YOBc9nFaFLjPHG4PFQb5JSTTiJCALtDnSzzHdctPBHa26iaWp+x+0es9MCiSZ/yKeCbaoqk/vQMBOMvkNRw7f/Yth5ym2k8Ks3KDl8WfKJBKHYf3WERXMl5OuVV2uinIjo8tHwJ1P6oz4vAvGB/CJOnyS1bzEkVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Rp/IR2NfxqqJE44ySmGmUUAUyEFaTlJ9/8rCRNmR69w=; b=hLUX3z56Ei24wiMjRbKYaE0wDXhcm50+eSEzJBSUQT0CvS786lmRS+Hejv3yhVoQOwlK/xowcCp5VledGD5f0MpQcmfdhYP6rSGq2naPli2D1IWSG5Zpp/pv91dl6RSeb7+wsDycPHShQcq81jN5v7MKsh2OLDpNCZB9ACt5utNxm0moRbycXA+ZXi4BeZWpWztlZSrHl3Sgm65PCnOA5GuBf1Gxw0eWESPrannmNhQ8QH4PDkbIhtOHwlXQJiLvfsAws50PljBDTeYtPnNx0TtsxZmnp95acBX0mOogrNP1uhQtHOztI8+QD5GVJ1HI5Da0dg+tiKzMnYAcNctG7g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Rp/IR2NfxqqJE44ySmGmUUAUyEFaTlJ9/8rCRNmR69w=; b=o6/UakpRvXTbNypwNkk313ZcuL0LG8uAq2xbp5iF6J1I5ylrkMG6hOKKNYL8gO3FIuMEgvbErHvHO/0yIkQeIKizpNsuZNpHScl/oWf81m6tPtmNSvSX5h6zNuSAvaQPJi0nLXgOwCr82QcOM4b0396JjQ4VNGox3MX0TZlRF1c= Received: from MW4PR02CA0024.namprd02.prod.outlook.com (2603:10b6:303:16d::9) by MW5PR12MB5649.namprd12.prod.outlook.com (2603:10b6:303:19d::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.22; Thu, 3 Feb 2022 17:50:29 +0000 Received: from CO1NAM11FT007.eop-nam11.prod.protection.outlook.com (2603:10b6:303:16d:cafe::84) by MW4PR02CA0024.outlook.office365.com (2603:10b6:303:16d::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT007.mail.protection.outlook.com (10.13.174.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:29 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:25 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v7 05/12] EDAC/amd64: Define dynamic family ops routines Date: Thu, 3 Feb 2022 11:49:35 -0600 Message-ID: <20220203174942.31630-6-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0ecc72cd-809d-4ca9-6ef1-08d9e73daaa8 X-MS-TrafficTypeDiagnostic: MW5PR12MB5649:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2512; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: HvDMr+4M8P9Eveu2L5Ba0IajUxARnKv/Mrpie2r+HWQOmeEQxEv58TECYcyGYgvxgwzY3qL4/SDKAtX0aYSRmD4J2P64oyVGvyQn6I37rKyfD3OUC3vI3LhcdAYR7DZpyvhu9+6p2HJX9rS8QLLCeEkTPRJCsZ/bIKzRF/wBA+OmPkCxPhTamK5OqP81syAjkActCHhBrHnvv4AIDImgn8rBkFtBg280CXXdXwaOkrjxpwrze6dFb0jsTlxue8QSyL8dtccxyjbVPGGPgtsW9sMcdw4VWbxWwt0WNmOSb+HRjrdoKmPGPbkaSkDRN+bx1PUgXL4Paz0ndb1vY2cy7WhAF1+MXdZj56aP+3oYrtdTFoyYVSC95M6kWWmjwt4BSmb+w461jcjApTUXljtv94FMbUGbmv4EcplijHxB216jRlM+4sIFQqBrkTQTwDWqaGdo7DoIaNLarmXPWiI9zU79NZlyI7YVhCaCu3sc1PCySjNmDkeJFCFyWqt1fqrkeoaqzeC06cmw+HY45V6/OQZvOfH00xUyVbyWOC8RZhoZq4xm9ZFHMhRLhy1tEc0WLS2GHJDIRvB9rvA/tImngq93Zw3U8+AkUYSsdLH3Saq0n/919KjCR4P7ChpA5adeyaBGLP+lCIc+P5lVdOgYPKAWO0IjrATSSZV3xFIlOloW70va37fuYZdA1z2v2FM7YU5qXwXWDY9p6oHfLkECVfMd4SJ4ht6KPsUHs435oPtEQZWAN18jCWoNTpba8PcXha/kG08i22zzT+faODwEWlnGvBTeW7G3S9F+sZtIM+Y= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(508600001)(316002)(36860700001)(186003)(54906003)(2616005)(110136005)(30864003)(83380400001)(2906002)(336012)(426003)(16526019)(36756003)(1076003)(47076005)(26005)(8936002)(6666004)(4326008)(8676002)(5660300002)(70586007)(40460700003)(70206006)(81166007)(82310400004)(7696005)(356005)(966005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:29.0103 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0ecc72cd-809d-4ca9-6ef1-08d9e73daaa8 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT007.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW5PR12MB5649 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K Extending family-specific assignments dynamic. This would simplify adding support for future platforms. Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20211028130106.15701-4-nchatrad@amd.com v5->v7: * None v4->v5: * split read_mc_regs for per family ops * Adjusted and Called dump_misc_regs for family ops v3->v4: * Modified k8_prep_chip_selects for ext_model checks * Add read_dct_base_mask to ops * Renamed find_umc_channel and addressed minor comments v2->v3: * Defined new family operation routines v1->v2: * new change drivers/edac/amd64_edac.c | 442 ++++++++++++++++++++++++-------------- drivers/edac/amd64_edac.h | 13 ++ 2 files changed, 297 insertions(+), 158 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 4cac43840ccc..babd25f29845 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1695,36 +1695,40 @@ static int get_channel_from_ecc_syndrome(struct mem_ctl_info *, u16); * Determine if the DIMMs have ECC enabled. ECC is enabled ONLY if all the DIMMs * are ECC capable. */ -static unsigned long determine_edac_cap(struct amd64_pvt *pvt) +static unsigned long f1x_determine_edac_cap(struct amd64_pvt *pvt) { unsigned long edac_cap = EDAC_FLAG_NONE; u8 bit; - if (pvt->umc) { - u8 i, umc_en_mask = 0, dimm_ecc_en_mask = 0; + bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F) + ? 19 + : 17; - for_each_umc(i) { - if (!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT)) - continue; + if (pvt->dclr0 & BIT(bit)) + edac_cap = EDAC_FLAG_SECDED; - umc_en_mask |= BIT(i); + return edac_cap; +} - /* UMC Configuration bit 12 (DimmEccEn) */ - if (pvt->umc[i].umc_cfg & BIT(12)) - dimm_ecc_en_mask |= BIT(i); - } +static unsigned long f17_determine_edac_cap(struct amd64_pvt *pvt) +{ + u8 i, umc_en_mask = 0, dimm_ecc_en_mask = 0; + unsigned long edac_cap = EDAC_FLAG_NONE; - if (umc_en_mask == dimm_ecc_en_mask) - edac_cap = EDAC_FLAG_SECDED; - } else { - bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F) - ? 19 - : 17; + for_each_umc(i) { + if (!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT)) + continue; - if (pvt->dclr0 & BIT(bit)) - edac_cap = EDAC_FLAG_SECDED; + umc_en_mask |= BIT(i); + + /* UMC Configuration bit 12 (DimmEccEn) */ + if (pvt->umc[i].umc_cfg & BIT(12)) + dimm_ecc_en_mask |= BIT(i); } + if (umc_en_mask == dimm_ecc_en_mask) + edac_cap = EDAC_FLAG_SECDED; + return edac_cap; } @@ -1771,6 +1775,13 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan) #define CS_EVEN (CS_EVEN_PRIMARY | CS_EVEN_SECONDARY) #define CS_ODD (CS_ODD_PRIMARY | CS_ODD_SECONDARY) +static int f1x_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) +{ + u32 dbam = ctrl ? pvt->dbam1 : pvt->dbam0; + + return DBAM_DIMM(dimm, dbam); +} + static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) { u8 base, count = 0; @@ -1907,10 +1918,7 @@ static void __dump_misc_regs(struct amd64_pvt *pvt) /* Display and decode various NB registers for debug purposes. */ static void dump_misc_regs(struct amd64_pvt *pvt) { - if (pvt->umc) - __dump_misc_regs_df(pvt); - else - __dump_misc_regs(pvt); + pvt->ops->dump_misc_regs(pvt); edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no"); @@ -1920,28 +1928,51 @@ static void dump_misc_regs(struct amd64_pvt *pvt) /* * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60] */ -static void prep_chip_selects(struct amd64_pvt *pvt) +static void k8_prep_chip_selects(struct amd64_pvt *pvt) { - if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8; - } else if (pvt->fam == 0x15 && pvt->model == 0x30) { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; - } else if (pvt->fam >= 0x17) { - int umc; - - for_each_umc(umc) { - pvt->csels[umc].b_cnt = 4; - pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2; - } + if (pvt->ext_model < K8_REV_F) { + pvt->csels[0].b_cnt = 8; + pvt->csels[1].b_cnt = 8; - } else { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4; + pvt->csels[0].m_cnt = 8; + pvt->csels[1].m_cnt = 8; + } else if (pvt->ext_model >= K8_REV_F) { + pvt->csels[0].b_cnt = 8; + pvt->csels[1].b_cnt = 8; + + pvt->csels[0].m_cnt = 4; + pvt->csels[1].m_cnt = 4; } } +static void f15_m30h_prep_chip_selects(struct amd64_pvt *pvt) +{ + pvt->csels[0].b_cnt = 4; + pvt->csels[1].b_cnt = 4; + + pvt->csels[0].m_cnt = 2; + pvt->csels[1].m_cnt = 2; +} + +static void f17_prep_chip_selects(struct amd64_pvt *pvt) +{ + int umc; + + for_each_umc(umc) { + pvt->csels[umc].b_cnt = 4; + pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2; + } +} + +static void default_prep_chip_selects(struct amd64_pvt *pvt) +{ + pvt->csels[0].b_cnt = 8; + pvt->csels[1].b_cnt = 8; + + pvt->csels[0].m_cnt = 4; + pvt->csels[1].m_cnt = 4; +} + static void read_umc_base_mask(struct amd64_pvt *pvt) { u32 umc_base_reg, umc_base_reg_sec; @@ -2000,11 +2031,6 @@ static void read_dct_base_mask(struct amd64_pvt *pvt) { int cs; - prep_chip_selects(pvt); - - if (pvt->umc) - return read_umc_base_mask(pvt); - for_each_chip_select(cs, 0, pvt) { int reg0 = DCSB0 + (cs * 4); int reg1 = DCSB1 + (cs * 4); @@ -2089,9 +2115,6 @@ static void determine_memory_type(struct amd64_pvt *pvt) { u32 dram_ctrl, dcsm; - if (pvt->umc) - return determine_memory_type_df(pvt); - switch (pvt->fam) { case 0xf: if (pvt->ext_model >= K8_REV_F) @@ -2141,6 +2164,8 @@ static void determine_memory_type(struct amd64_pvt *pvt) WARN(1, KERN_ERR "%s: Family??? 0x%x\n", __func__, pvt->fam); pvt->dram_type = MEM_EMPTY; } + + edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); return; ddr3: @@ -3513,10 +3538,13 @@ static inline void decode_bus_error(int node_id, struct mce *m) * Currently, we can derive the channel number by looking at the 6th nibble in * the instance_id. For example, instance_id=0xYXXXXX where Y is the channel * number. + * + * csrow can be derived from the lower 3 bits of MCA_SYND value. */ -static int find_umc_channel(struct mce *m) +static void update_umc_err_info(struct mce *m, struct err_info *err) { - return (m->ipid & GENMASK(31, 0)) >> 20; + err->channel = (m->ipid & GENMASK(31, 0)) >> 20; + err->csrow = m->synd & 0x7; } static void decode_umc_error(int node_id, struct mce *m) @@ -3538,8 +3566,6 @@ static void decode_umc_error(int node_id, struct mce *m) if (m->status & MCI_STATUS_DEFERRED) ecc_type = 3; - err.channel = find_umc_channel(m); - if (!(m->status & MCI_STATUS_SYNDV)) { err.err_code = ERR_SYND; goto log_error; @@ -3554,7 +3580,7 @@ static void decode_umc_error(int node_id, struct mce *m) err.err_code = ERR_CHANNEL; } - err.csrow = m->synd & 0x7; + pvt->ops->get_umc_err_info(m, &err); if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) { err.err_code = ERR_NORM_ADDR; @@ -3639,26 +3665,11 @@ static void free_mc_sibling_devs(struct amd64_pvt *pvt) } } -static void determine_ecc_sym_sz(struct amd64_pvt *pvt) +static void f1x_determine_ecc_sym_sz(struct amd64_pvt *pvt) { pvt->ecc_sym_sz = 4; - if (pvt->umc) { - u8 i; - - for_each_umc(i) { - /* Check enabled channels only: */ - if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) { - if (pvt->umc[i].ecc_ctrl & BIT(9)) { - pvt->ecc_sym_sz = 16; - return; - } else if (pvt->umc[i].ecc_ctrl & BIT(7)) { - pvt->ecc_sym_sz = 8; - return; - } - } - } - } else if (pvt->fam >= 0x10) { + if (pvt->fam >= 0x10) { u32 tmp; amd64_read_pci_cfg(pvt->F3, EXT_NB_MCA_CFG, &tmp); @@ -3672,6 +3683,47 @@ static void determine_ecc_sym_sz(struct amd64_pvt *pvt) } } +static void f17_determine_ecc_sym_sz(struct amd64_pvt *pvt) +{ + u8 i; + + pvt->ecc_sym_sz = 4; + + for_each_umc(i) { + /* Check enabled channels only: */ + if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) { + if (pvt->umc[i].ecc_ctrl & BIT(9)) { + pvt->ecc_sym_sz = 16; + return; + } else if (pvt->umc[i].ecc_ctrl & BIT(7)) { + pvt->ecc_sym_sz = 8; + return; + } + } + } +} + +static void read_top_mem_registers(struct amd64_pvt *pvt) +{ + u64 msr_val; + + /* + * Retrieve TOP_MEM and TOP_MEM2; no masking off of reserved bits since + * those are Read-As-Zero. + */ + rdmsrl(MSR_K8_TOP_MEM1, pvt->top_mem); + edac_dbg(0, " TOP_MEM: 0x%016llx\n", pvt->top_mem); + + /* Check first whether TOP_MEM2 is enabled: */ + rdmsrl(MSR_AMD64_SYSCFG, msr_val); + if (msr_val & BIT(21)) { + rdmsrl(MSR_K8_TOP_MEM2, pvt->top_mem2); + edac_dbg(0, " TOP_MEM2: 0x%016llx\n", pvt->top_mem2); + } else { + edac_dbg(0, " TOP_MEM2 disabled\n"); + } +} + /* * Retrieve the hardware registers of the memory controller. */ @@ -3693,6 +3745,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi); } + + amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); } /* @@ -3702,30 +3756,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) static void read_mc_regs(struct amd64_pvt *pvt) { unsigned int range; - u64 msr_val; - /* - * Retrieve TOP_MEM and TOP_MEM2; no masking off of reserved bits since - * those are Read-As-Zero. - */ - rdmsrl(MSR_K8_TOP_MEM1, pvt->top_mem); - edac_dbg(0, " TOP_MEM: 0x%016llx\n", pvt->top_mem); - - /* Check first whether TOP_MEM2 is enabled: */ - rdmsrl(MSR_AMD64_SYSCFG, msr_val); - if (msr_val & BIT(21)) { - rdmsrl(MSR_K8_TOP_MEM2, pvt->top_mem2); - edac_dbg(0, " TOP_MEM2: 0x%016llx\n", pvt->top_mem2); - } else { - edac_dbg(0, " TOP_MEM2 disabled\n"); - } - - if (pvt->umc) { - __read_mc_regs_df(pvt); - amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); - - goto skip; - } + read_top_mem_registers(pvt); amd64_read_pci_cfg(pvt->F3, NBCAP, &pvt->nbcap); @@ -3766,16 +3798,6 @@ static void read_mc_regs(struct amd64_pvt *pvt) amd64_read_dct_pci_cfg(pvt, 1, DCLR0, &pvt->dclr1); amd64_read_dct_pci_cfg(pvt, 1, DCHR0, &pvt->dchr1); } - -skip: - read_dct_base_mask(pvt); - - determine_memory_type(pvt); - - if (!pvt->umc) - edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); - - determine_ecc_sym_sz(pvt); } /* @@ -3814,17 +3836,10 @@ static void read_mc_regs(struct amd64_pvt *pvt) */ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig) { - u32 dbam = dct ? pvt->dbam1 : pvt->dbam0; int csrow_nr = csrow_nr_orig; u32 cs_mode, nr_pages; - if (!pvt->umc) { - csrow_nr >>= 1; - cs_mode = DBAM_DIMM(csrow_nr, dbam); - } else { - cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt); - } - + cs_mode = pvt->ops->get_cs_mode(csrow_nr >> 1, dct, pvt); nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr); nr_pages <<= 20 - PAGE_SHIFT; @@ -3893,9 +3908,6 @@ static int init_csrows(struct mem_ctl_info *mci) int nr_pages = 0; u32 val; - if (pvt->umc) - return init_csrows_df(mci); - amd64_read_pci_cfg(pvt->F3, NBCFG, &val); pvt->nbcfg = val; @@ -4116,49 +4128,60 @@ static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid, amd64_warn("Error restoring NB MCGCTL settings!\n"); } -static bool ecc_enabled(struct amd64_pvt *pvt) +static bool f1x_check_ecc_enabled(struct amd64_pvt *pvt) { u16 nid = pvt->mc_node_id; bool nb_mce_en = false; - u8 ecc_en = 0, i; + u8 ecc_en = 0; u32 value; - if (boot_cpu_data.x86 >= 0x17) { - u8 umc_en_mask = 0, ecc_en_mask = 0; - struct amd64_umc *umc; + amd64_read_pci_cfg(pvt->F3, NBCFG, &value); - for_each_umc(i) { - umc = &pvt->umc[i]; + ecc_en = !!(value & NBCFG_ECC_ENABLE); - /* Only check enabled UMCs. */ - if (!(umc->sdp_ctrl & UMC_SDP_INIT)) - continue; + nb_mce_en = nb_mce_bank_enabled_on_node(nid); + if (!nb_mce_en) + edac_dbg(0, "NB MCE bank disabled, set MSR 0x%08x[4] on node %d to enable.\n", + MSR_IA32_MCG_CTL, nid); - umc_en_mask |= BIT(i); + edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled")); - if (umc->umc_cap_hi & UMC_ECC_ENABLED) - ecc_en_mask |= BIT(i); - } + if (!ecc_en || !nb_mce_en) + return false; + else + return true; +} - /* Check whether at least one UMC is enabled: */ - if (umc_en_mask) - ecc_en = umc_en_mask == ecc_en_mask; - else - edac_dbg(0, "Node %d: No enabled UMCs.\n", nid); +static bool f17_check_ecc_enabled(struct amd64_pvt *pvt) +{ + u16 nid = pvt->mc_node_id; + struct amd64_umc *umc; + u8 umc_en_mask = 0, ecc_en_mask = 0; + bool nb_mce_en = false; + u8 ecc_en = 0, i; - /* Assume UMC MCA banks are enabled. */ - nb_mce_en = true; - } else { - amd64_read_pci_cfg(pvt->F3, NBCFG, &value); + for_each_umc(i) { + umc = &pvt->umc[i]; + + /* Only check enabled UMCs. */ + if (!(umc->sdp_ctrl & UMC_SDP_INIT)) + continue; - ecc_en = !!(value & NBCFG_ECC_ENABLE); + umc_en_mask |= BIT(i); - nb_mce_en = nb_mce_bank_enabled_on_node(nid); - if (!nb_mce_en) - edac_dbg(0, "NB MCE bank disabled, set MSR 0x%08x[4] on node %d to enable.\n", - MSR_IA32_MCG_CTL, nid); + if (umc->umc_cap_hi & UMC_ECC_ENABLED) + ecc_en_mask |= BIT(i); } + /* Check whether at least one UMC is enabled: */ + if (umc_en_mask) + ecc_en = umc_en_mask == ecc_en_mask; + else + edac_dbg(0, "Node %d: No enabled UMCs.\n", nid); + + /* Assume UMC MCA banks are enabled. */ + nb_mce_en = true; + edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled")); if (!ecc_en || !nb_mce_en) @@ -4168,7 +4191,17 @@ static bool ecc_enabled(struct amd64_pvt *pvt) } static inline void -f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) +f1x_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) +{ + if (pvt->nbcap & NBCAP_SECDED) + mci->edac_ctl_cap |= EDAC_FLAG_SECDED; + + if (pvt->nbcap & NBCAP_CHIPKILL) + mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED; +} + +static inline void +f17_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) { u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1; @@ -4198,24 +4231,16 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) } } -static void setup_mci_misc_attrs(struct mem_ctl_info *mci) +static void f1x_setup_mci_misc_attrs(struct mem_ctl_info *mci) { struct amd64_pvt *pvt = mci->pvt_info; mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2; mci->edac_ctl_cap = EDAC_FLAG_NONE; - if (pvt->umc) { - f17h_determine_edac_ctl_cap(mci, pvt); - } else { - if (pvt->nbcap & NBCAP_SECDED) - mci->edac_ctl_cap |= EDAC_FLAG_SECDED; + pvt->ops->determine_edac_ctl_cap(mci, pvt); - if (pvt->nbcap & NBCAP_CHIPKILL) - mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED; - } - - mci->edac_cap = determine_edac_cap(pvt); + mci->edac_cap = pvt->ops->determine_edac_cap(pvt); mci->mod_name = EDAC_MOD_STR; mci->ctl_name = pvt->ctl_name; mci->dev_name = pci_name(pvt->F3); @@ -4245,6 +4270,18 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->early_channel_count = k8_early_channel_count; pvt->ops->map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow; pvt->ops->dbam_to_cs = k8_dbam_to_chip_select; + pvt->ops->prep_chip_selects = k8_prep_chip_selects; + pvt->ops->determine_memory_type = determine_memory_type; + pvt->ops->determine_ecc_sym_sz = f1x_determine_ecc_sym_sz; + pvt->ops->ecc_enabled = f1x_check_ecc_enabled; + pvt->ops->determine_edac_ctl_cap = f1x_determine_edac_ctl_cap; + pvt->ops->determine_edac_cap = f1x_determine_edac_cap; + pvt->ops->setup_mci_misc_attrs = f1x_setup_mci_misc_attrs; + pvt->ops->get_cs_mode = f1x_get_cs_mode; + pvt->ops->get_base_mask = read_dct_base_mask; + pvt->ops->dump_misc_regs = __dump_misc_regs; + pvt->ops->get_mc_regs = read_mc_regs; + pvt->ops->populate_csrows = init_csrows; break; case 0x10: @@ -4255,6 +4292,18 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->early_channel_count = f1x_early_channel_count; pvt->ops->map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow; pvt->ops->dbam_to_cs = f10_dbam_to_chip_select; + pvt->ops->prep_chip_selects = default_prep_chip_selects; + pvt->ops->determine_memory_type = determine_memory_type; + pvt->ops->determine_ecc_sym_sz = f1x_determine_ecc_sym_sz; + pvt->ops->ecc_enabled = f1x_check_ecc_enabled; + pvt->ops->determine_edac_ctl_cap = f1x_determine_edac_ctl_cap; + pvt->ops->determine_edac_cap = f1x_determine_edac_cap; + pvt->ops->setup_mci_misc_attrs = f1x_setup_mci_misc_attrs; + pvt->ops->get_cs_mode = f1x_get_cs_mode; + pvt->ops->get_base_mask = read_dct_base_mask; + pvt->ops->dump_misc_regs = __dump_misc_regs; + pvt->ops->get_mc_regs = read_mc_regs; + pvt->ops->populate_csrows = init_csrows; break; case 0x15: @@ -4263,11 +4312,13 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1; pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2; pvt->ops->dbam_to_cs = f16_dbam_to_chip_select; + pvt->ops->prep_chip_selects = f15_m30h_prep_chip_selects; } else if (pvt->model == 0x60) { pvt->ctl_name = "F15h_M60h"; pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1; pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2; pvt->ops->dbam_to_cs = f15_m60h_dbam_to_chip_select; + pvt->ops->prep_chip_selects = default_prep_chip_selects; } else if (pvt->model == 0x13) { /* Richland is only client */ return; @@ -4276,10 +4327,22 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1; pvt->f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2; pvt->ops->dbam_to_cs = f15_dbam_to_chip_select; + pvt->ops->prep_chip_selects = default_prep_chip_selects; } pvt->max_mcs = 2; pvt->ops->early_channel_count = f1x_early_channel_count; pvt->ops->map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow; + pvt->ops->determine_memory_type = determine_memory_type; + pvt->ops->determine_ecc_sym_sz = f1x_determine_ecc_sym_sz; + pvt->ops->ecc_enabled = f1x_check_ecc_enabled; + pvt->ops->determine_edac_ctl_cap = f1x_determine_edac_ctl_cap; + pvt->ops->determine_edac_cap = f1x_determine_edac_cap; + pvt->ops->setup_mci_misc_attrs = f1x_setup_mci_misc_attrs; + pvt->ops->get_cs_mode = f1x_get_cs_mode; + pvt->ops->get_base_mask = read_dct_base_mask; + pvt->ops->dump_misc_regs = __dump_misc_regs; + pvt->ops->get_mc_regs = read_mc_regs; + pvt->ops->populate_csrows = init_csrows; break; case 0x16: @@ -4296,6 +4359,17 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->early_channel_count = f1x_early_channel_count; pvt->ops->map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow; pvt->ops->dbam_to_cs = f16_dbam_to_chip_select; + pvt->ops->prep_chip_selects = default_prep_chip_selects; + pvt->ops->determine_memory_type = determine_memory_type; + pvt->ops->determine_ecc_sym_sz = f1x_determine_ecc_sym_sz; + pvt->ops->ecc_enabled = f1x_check_ecc_enabled; + pvt->ops->determine_edac_ctl_cap = f1x_determine_edac_ctl_cap; + pvt->ops->determine_edac_cap = f1x_determine_edac_cap; + pvt->ops->get_cs_mode = f1x_get_cs_mode; + pvt->ops->get_base_mask = read_dct_base_mask; + pvt->ops->dump_misc_regs = __dump_misc_regs; + pvt->ops->get_mc_regs = read_mc_regs; + pvt->ops->populate_csrows = init_csrows; break; case 0x17: @@ -4334,6 +4408,19 @@ static void per_family_init(struct amd64_pvt *pvt) case 0x18: pvt->ops->early_channel_count = f17_early_channel_count; pvt->ops->dbam_to_cs = f17_addr_mask_to_cs_size; + pvt->ops->prep_chip_selects = f17_prep_chip_selects; + pvt->ops->determine_memory_type = determine_memory_type_df; + pvt->ops->determine_ecc_sym_sz = f17_determine_ecc_sym_sz; + pvt->ops->ecc_enabled = f17_check_ecc_enabled; + pvt->ops->determine_edac_ctl_cap = f17_determine_edac_ctl_cap; + pvt->ops->determine_edac_cap = f17_determine_edac_cap; + pvt->ops->setup_mci_misc_attrs = f1x_setup_mci_misc_attrs; + pvt->ops->get_cs_mode = f17_get_cs_mode; + pvt->ops->get_base_mask = read_umc_base_mask; + pvt->ops->dump_misc_regs = __dump_misc_regs_df; + pvt->ops->get_mc_regs = __read_mc_regs_df; + pvt->ops->populate_csrows = init_csrows_df; + pvt->ops->get_umc_err_info = update_umc_err_info; if (pvt->fam == 0x18) { pvt->ctl_name = "F18h"; @@ -4376,12 +4463,43 @@ static void per_family_init(struct amd64_pvt *pvt) } pvt->ops->early_channel_count = f17_early_channel_count; pvt->ops->dbam_to_cs = f17_addr_mask_to_cs_size; + pvt->ops->prep_chip_selects = f17_prep_chip_selects; + pvt->ops->determine_memory_type = determine_memory_type_df; + pvt->ops->determine_ecc_sym_sz = f17_determine_ecc_sym_sz; + pvt->ops->ecc_enabled = f17_check_ecc_enabled; + pvt->ops->determine_edac_ctl_cap = f17_determine_edac_ctl_cap; + pvt->ops->determine_edac_cap = f17_determine_edac_cap; + pvt->ops->setup_mci_misc_attrs = f1x_setup_mci_misc_attrs; + pvt->ops->get_cs_mode = f17_get_cs_mode; + pvt->ops->get_base_mask = read_umc_base_mask; + pvt->ops->dump_misc_regs = __dump_misc_regs_df; + pvt->ops->get_mc_regs = __read_mc_regs_df; + pvt->ops->populate_csrows = init_csrows_df; + pvt->ops->get_umc_err_info = update_umc_err_info; break; default: amd64_err("Unsupported family!\n"); return; } + + /* ops required for all the families */ + if (!pvt->ops->early_channel_count || !pvt->ops->dbam_to_cs || + !pvt->ops->prep_chip_selects || !pvt->ops->determine_memory_type || + !pvt->ops->ecc_enabled || !pvt->ops->determine_edac_ctl_cap || + !pvt->ops->determine_edac_cap || !pvt->ops->setup_mci_misc_attrs || + !pvt->ops->get_cs_mode || !pvt->ops->get_base_mask || + !pvt->ops->dump_misc_regs || !pvt->ops->get_mc_regs || + !pvt->ops->populate_csrows) { + edac_dbg(1, "Common helper routines not defined.\n"); + return; + } + + /* ops required for families 17h and later */ + if (pvt->fam >= 0x17 && !pvt->ops->get_umc_err_info) { + edac_dbg(1, "Platform specific helper routines not defined.\n"); + return; + } } static const struct attribute_group *amd64_edac_attr_groups[] = { @@ -4413,7 +4531,15 @@ static int hw_info_get(struct amd64_pvt *pvt) if (ret) return ret; - read_mc_regs(pvt); + pvt->ops->get_mc_regs(pvt); + + pvt->ops->prep_chip_selects(pvt); + + pvt->ops->get_base_mask(pvt); + + pvt->ops->determine_memory_type(pvt); + + pvt->ops->determine_ecc_sym_sz(pvt); return 0; } @@ -4462,9 +4588,9 @@ static int init_one_instance(struct amd64_pvt *pvt) mci->pvt_info = pvt; mci->pdev = &pvt->F3->dev; - setup_mci_misc_attrs(mci); + pvt->ops->setup_mci_misc_attrs(mci); - if (init_csrows(mci)) + if (pvt->ops->populate_csrows(mci)) mci->edac_cap = EDAC_FLAG_NONE; ret = -ENODEV; @@ -4528,7 +4654,7 @@ static int probe_one_instance(unsigned int nid) goto err_enable; } - if (!ecc_enabled(pvt)) { + if (!pvt->ops->ecc_enabled(pvt)) { ret = -ENODEV; if (!ecc_enable_override) diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 4e3f9755bc73..07ff2c6c17c5 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -481,6 +481,19 @@ struct low_ops { struct err_info *err); int (*dbam_to_cs)(struct amd64_pvt *pvt, u8 dct, unsigned int cs_mode, int cs_mask_nr); + void (*prep_chip_selects)(struct amd64_pvt *pvt); + void (*determine_memory_type)(struct amd64_pvt *pvt); + void (*determine_ecc_sym_sz)(struct amd64_pvt *pvt); + bool (*ecc_enabled)(struct amd64_pvt *pvt); + void (*determine_edac_ctl_cap)(struct mem_ctl_info *mci, struct amd64_pvt *pvt); + unsigned long (*determine_edac_cap)(struct amd64_pvt *pvt); + int (*get_cs_mode)(int dimm, u8 ctrl, struct amd64_pvt *pvt); + void (*get_base_mask)(struct amd64_pvt *pvt); + void (*dump_misc_regs)(struct amd64_pvt *pvt); + void (*get_mc_regs)(struct amd64_pvt *pvt); + void (*setup_mci_misc_attrs)(struct mem_ctl_info *mci); + int (*populate_csrows)(struct mem_ctl_info *mci); + void (*get_umc_err_info)(struct mce *m, struct err_info *err); }; int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset, From patchwork Thu Feb 3 17:49:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734492 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0781C433FE for ; Thu, 3 Feb 2022 17:50:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353149AbiBCRui (ORCPT ); Thu, 3 Feb 2022 12:50:38 -0500 Received: from mail-dm6nam12on2068.outbound.protection.outlook.com ([40.107.243.68]:37942 "EHLO NAM12-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353106AbiBCRub (ORCPT ); Thu, 3 Feb 2022 12:50:31 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bzDvuTKYvAQgyW6mwRv9jfgjm+hhDJmWmUidKUhDpKoNmJR0lpxQ0mSxcfXLTgiKgRePYV/f2CTZcGABSGvhsWwX+2LPKjx9fhWQ3UVFel/wgU1BSOfoblfJ9/SwYNStRjUjWt4RCyfGzbH9tRBgVtw/+uzCQfajG0PpubaEAWKLqfrUMVpipYX10idTG+gABL4pgISqye0+SwpToewUQGvKy+oGu5Fe1+Wj4yDk1JA5TKRWQiGe6L7/yrDzmV/cgfQCjnkAiv7ge5US6zUuCzPfr15I4CoWQUC85LiO+1r69ckOz5X3278+SFCmNFWPFYRlH2bP+63XfrNLG4/cXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wHcmQDJY382Etk+UetJPyuSWyfrvCxqg1WVaZTwszDQ=; b=ZGkG5fiesmDlcA3zOqr7Mg1dDVBunl/x6Hi5p+cmsa5InpkCPfbnaoueOc4FKOoFfyciMIqk6SZATN9O93FdOl1FoKZsR8X5rDSUe+5+RIYrSq5oT7NxWjqOLOj8Bq6j2bc1FP+Y1U09bcoIn4ERPJXKrGa3p3P14UIJtXxs9AEozYD3YK2JwHdLLTphPxvERbDytrXvSQpgt3cam0XeTJ81dK3EU1BpOUApDgr1C6gtOtk7fctUdjTIRi4GAPJYFY5o6UmOl3F1dloDKgJbIn4c+C5INYRc95tUiHteafF6qMbJOsrbARsUzt6Q7pxQKQPjMHXQ1sFrWBFb4W9CuQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wHcmQDJY382Etk+UetJPyuSWyfrvCxqg1WVaZTwszDQ=; b=Rlryp28nF7bA5wi7yfYdWGmKYPhQ5M6kp/xQ/lo6O6RwNiJ2CKPz8j42utrh8WKYUL7qDhK1YdT5DYXpiCFimjut00Uts3zVOC+4HG4kG5k6vYzNrEUrAemSeLG7vfo8xRON8xXSU4D88sQLCD44c2Kv4ftfBMxqlsKczAW1zu8= Received: from MW4PR02CA0021.namprd02.prod.outlook.com (2603:10b6:303:16d::31) by CY4PR1201MB0149.namprd12.prod.outlook.com (2603:10b6:910:1c::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:29 +0000 Received: from CO1NAM11FT007.eop-nam11.prod.protection.outlook.com (2603:10b6:303:16d:cafe::d7) by MW4PR02CA0021.outlook.office365.com (2603:10b6:303:16d::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.11 via Frontend Transport; Thu, 3 Feb 2022 17:50:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT007.mail.protection.outlook.com (10.13.174.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:29 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:25 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v7 06/12] EDAC/amd64: Add AMD heterogeneous family 19h Model 30h-3fh Date: Thu, 3 Feb 2022 11:49:36 -0600 Message-ID: <20220203174942.31630-7-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6a2762de-1ebf-456d-7d3a-08d9e73daaf9 X-MS-TrafficTypeDiagnostic: CY4PR1201MB0149:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:5516; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 50LxJqP7LsH+49Ko0cEzbocPiNptKtHT2k+BCClEUXmzTYOkFQQt4S9JUrkYoNG/mgsAysWe+9IeW6dj2luBEb/uUMBx/9u1V2AdQD+trxSXL0goIv8g8XbVXjFU6dcR/qAXz4kwqQVHZtKwXoaX8kmoh8FyNr/9dKTGUI5dGdJE2cQ5F5UlwjZxmhpclarhtzpUg/MDJLqCSl+Hti0rFmh+YA3z2Y5uqI+yrvPpZ+0Wf7tAN1zUrmio8ml2lkCsVPwyS9RDgVEsK3dZ1EmiCG9RqvuOstyLV0VgKuZ8zXRVqgIEUyi2atLMF/Bsi1egfvm5IdTScx+ONfsu7e9Q4JYLp9q+QSDS7wqm+0Z7Exl3HJATF0DS1YsJ8sxONT4kWPnl8v8jNAd19BmJgTcNiSEzviBMVWpytgEDDkMhRYIIUocWP+jJ7ZO3/q957EyHRY2RqB3NMg2D/P997OCfocCtEM6W1LCJAtXbSXIgD7zTdUz3nZvsDkyD5FZxfrqIpv14VCZsrEw5/iTslUNdwayt9vabAsSD+s6ixN/tDJcruCdlWID2Wc6VG2O9HTDxVTvdwb3SnV1JMqjn+LrEkjt3PT+GqCAlNSJUWxiVQLJLGJhoiHLvD8eRsRP+7XraoP2IGBv417DyPs5cytYrQUleOAsRVE/2M8/n98hEkvEfYYsQtMoPHVMyQ4N5AXPp7dPScOdZNwo7uujJoBC8XGO2wzcmhGbx0BpObP9wriNC1KFCOMvkR3EBUf0vy+nkA4ymFIDyv6Bu7g1Dy6rnt0/1vicbXihzuI0MiOyAmRM= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(46966006)(40470700004)(36840700001)(966005)(82310400004)(316002)(81166007)(356005)(40460700003)(2906002)(110136005)(54906003)(83380400001)(6666004)(7696005)(36756003)(47076005)(336012)(186003)(26005)(1076003)(8936002)(426003)(2616005)(5660300002)(16526019)(508600001)(70206006)(70586007)(36860700001)(4326008)(8676002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:29.5259 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6a2762de-1ebf-456d-7d3a-08d9e73daaf9 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT007.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR1201MB0149 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K On heterogeneous systems with AMD CPUs, the data fabrics of the GPUs are connected directly via custom links. One such system, where Aldebaran GPU nodes are connected to the Family 19h, model 30h family of CPU nodes. Aldebaran GPU support was added to DRM framework https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20211028130106.15701-6-nchatrad@amd.com v6->v7: * split the model specific assignments in patch 5 of v6 series drivers/edac/amd64_edac.c | 14 ++++++++++++++ drivers/edac/amd64_edac.h | 2 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index babd25f29845..54af7e38d26c 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -4454,6 +4454,19 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0; pvt->f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6; pvt->max_mcs = 2; + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) { + if (pvt->mc_node_id >= amd_nb_num()) { + pvt->ctl_name = "ALDEBARAN"; + pvt->f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6; + pvt->max_mcs = 4; + goto end_fam; + } else { + pvt->ctl_name = "F19h_M30h"; + pvt->f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0; + pvt->f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6; + pvt->max_mcs = 8; + } } else { pvt->ctl_name = "F19h"; pvt->f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0; @@ -4476,6 +4489,7 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->get_mc_regs = __read_mc_regs_df; pvt->ops->populate_csrows = init_csrows_df; pvt->ops->get_umc_err_info = update_umc_err_info; + end_fam: break; default: diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 07ff2c6c17c5..66f7b5d45a37 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -130,6 +130,8 @@ #define PCI_DEVICE_ID_AMD_19H_M10H_DF_F6 0x14b3 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F0 0x166a #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F6 0x1670 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14d0 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14d6 /* * Function 1 - Address Map From patchwork Thu Feb 3 17:49:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A586C433F5 for ; Thu, 3 Feb 2022 17:50:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353142AbiBCRuh (ORCPT ); Thu, 3 Feb 2022 12:50:37 -0500 Received: from mail-bn8nam11on2058.outbound.protection.outlook.com ([40.107.236.58]:46400 "EHLO NAM11-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353109AbiBCRuc (ORCPT ); Thu, 3 Feb 2022 12:50:32 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=R3VEj3ojgzaJttz5/Hcmd5+1LOUcWkS10VxKsCq8DTjCYSaIldJzd1FZS56r6a4t59iGAmWQguOKRCfOWQOJ5ztTCpHCd9hfIi8Jvfb8DHoRUfSlo/NizTnbJWF1Yv2mCRba6ZtgVgrUu9NnJUxk66o6mzM+GKLwuubpzL+/AqsngZSZnb/vxJtAu/OYJQFTcn6QQSbfqBuPoInCQ0chpJIWl3/4meWGdh5DCeMVCHe4acFKmxOySe5hWi6+i8VKWRRqllnKKP54PHYBk8ISzdxHJ5b6LqJ/zy2U6Nbe4oi8QE1HnWJNYz8hQsohw7rLrlfoWtGI0RAFkFGEuvpmwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ct3QQGUsasie34WZF5anFTsDWa9ajs5XFZ3BhYLKrsk=; b=Nl0nTGXHNRLO1etGLWOlhpPQ+SPrrvPYZHfvT1ZkRU9TwkGjMtf12gsnX5Mx+p8KFGF4C3ss+qkwn9J8chZwCiP0ePPyYVCyTMBVCw5F7EHpl1T4AHF6jkma3qSMbolhnbI9/9ecnCeAmteVuw+XH9+S1fFKIv+JBRZNLlmiAsG5e7CQ3AjbeUyXhrpgyUu9kTF3EO0lKtO3Rbk0Rx/TrWfQyD6OH+Q/9T6h333Hl0Nr5Zi9y6kxLzcscMoRVZny9hY0wXGfjGwK/SgtiazptIXcvV0GojMpN6igZCyXlasSl36NTWg7UMm5kcY4cZiFPe3a2Uzv9a2H9TzWrXkmNg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ct3QQGUsasie34WZF5anFTsDWa9ajs5XFZ3BhYLKrsk=; b=IS2GMIuV9ZqNJ6lyqYnB30BzZqtOcDWOcgo381kWwnzhLZbIZY8QkbSI6SHkcPSHeChx2aY8ikzGTRPwoUSb7TVdvb2ILqMFUiv4hlw9l5F+G+i5H8hvWp3Ys3sXvdjBFOvWZVCsTKc+YPOR+3bP5aKDiFY8uWsgHMiyG9o0+mo= Received: from MW4PR04CA0322.namprd04.prod.outlook.com (2603:10b6:303:82::27) by CH2PR12MB4168.namprd12.prod.outlook.com (2603:10b6:610:a8::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:30 +0000 Received: from CO1NAM11FT019.eop-nam11.prod.protection.outlook.com (2603:10b6:303:82:cafe::29) by MW4PR04CA0322.outlook.office365.com (2603:10b6:303:82::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT019.mail.protection.outlook.com (10.13.175.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:29 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:26 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi , Muralidhara M K Subject: [PATCH v7 07/12] EDAC/amd64: Enumerate Aldebaran GPU nodes by adding family ops Date: Thu, 3 Feb 2022 11:49:37 -0600 Message-ID: <20220203174942.31630-8-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3f00c3aa-a118-4b2a-cb0f-08d9e73dab37 X-MS-TrafficTypeDiagnostic: CH2PR12MB4168:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:639; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Fu7z+sCzdoY3pgyj0aEH1yKm7ArVvzTNxHqE8CxLxpCyJXIVXBKmlxJB57JVHsrVjgJ/SRsicXYWtcElSbHGZwBZkqJJ0bQU734RCLdM+xeyzcSZ+Vt1LUVZ9BkGVt/XP7S53r8xwWKATYMTyGMkDXAJLj5omP49Kr4FDQiyJRi+Yc6V0h8iEX2pQv7sguYsy7Qz0VTdjNj2vWTiizr2P+tS5tZIZP3hOR7BrrTF5gfBp9Ufjm97rnAffRnK+L6GHf45HQ5J1eO1P1wMjyHLTwULQIb7BmhWdmzpgxQXrvFfaQkM//p5kqwat168j3fG2DkrdbplzlNkNF1JCvDKZpmj2rGRbAp1BLngm/KjMxCMIkoe/tMgySlrb3o7ZRTGv+cCxIFvd6RzrpzMdM3ilyfIl/8iU+G6wImnJpLTUgfC/Wsc7e7tK8BlfkZ3sFzwHcKUsIm+8WjEZz7peS16GqFvcpj/bjxnetIxkqv6LSo1VwX9RAThAa41IG2EWfGL/6MTOrK9sL82ys/aKBBoYuZXanckQraWDSCkIoODqjHWnVYjWAjj1jFmuYHHttfzrKuzXJn01r7WTtccgSddd23yixUuG7wSrLUEOR8TW0VezgL4VB3hpwJL9/6VwxGg8bxxnHVyoPO4nlzIuOPcWdzGLMLTuPzaO73dnUctQ8Si5hhbjyrqkRemfs6cbKO6VxbtQSU559se6P+hjkH+TPHK0WMV8TKCE1KrKy2pvVuHwXLj1zq6NLEN0l9fR7y8D7l2x0giWkSBjxWJsrZu5asMkKRvSemziC+IgtH1gL4= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(40470700004)(46966006)(16526019)(36756003)(508600001)(7696005)(81166007)(8936002)(5660300002)(4326008)(6666004)(8676002)(36860700001)(356005)(2616005)(2906002)(1076003)(26005)(336012)(426003)(186003)(82310400004)(40460700003)(30864003)(83380400001)(316002)(110136005)(70586007)(70206006)(54906003)(966005)(47076005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:29.9342 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3f00c3aa-a118-4b2a-cb0f-08d9e73dab37 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT019.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4168 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs are connected directly via custom links. One such system, where Aldebaran GPU nodes are connected to the Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report memory errors via SMCA banks. Aldebaran GPU support was added to DRM framework https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by default and the UMCs on GPU node are different from the UMCs on CPU nodes. GPU specific ops routines are defined to extend the amd64_edac module to enumerate HBM memory leveraging the existing edac and the amd64 specific data structures. The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels connected to HBM banks are enumerated as ranks. Cc: Yazen Ghannam Co-developed-by: Muralidhara M K Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20210823185437.94417-4-nchatrad@amd.com v6->v7: * Added GPU specific ops function definitions, based on the refactor. v5->v6: * Added to support number of GPU northbridges with amd_gpu_nb_num() v4->v5: * Removed else condition in per_family_init for 19h family v3->v4: * Split "f17_addr_mask_to_cs_size" instead as did in 3rd patch earlier v2->v3: * Bifurcated the GPU code from v2 v1->v2: * Restored line deletions and handled minor comments * Modified commit message and some of the function comments * variable df_inst_id is introduced instead of umc_num v0->v1: * Modifed the commit message * Change the edac_cap * kept sizes of both cpu and noncpu together * return success if the !F3 condition true and remove unnecessary validation drivers/edac/amd64_edac.c | 285 +++++++++++++++++++++++++++++++++----- drivers/edac/amd64_edac.h | 21 +++ 2 files changed, 273 insertions(+), 33 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 54af7e38d26c..10efe726a959 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1012,6 +1012,12 @@ static int sys_addr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr) /* Protect the PCI config register pairs used for DF indirect access. */ static DEFINE_MUTEX(df_indirect_mutex); +/* Total number of northbridges if in case of heterogeneous systems */ +static int amd_total_nb_num(void) +{ + return amd_nb_num() + amd_gpu_nb_num(); +} + /* * Data Fabric Indirect Access uses FICAA/FICAD. * @@ -1031,7 +1037,7 @@ static int __df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *l u32 ficaa; int err = -ENODEV; - if (node >= amd_nb_num()) + if (node >= amd_total_nb_num()) goto out; F4 = node_to_amd_nb(node)->link; @@ -1732,6 +1738,11 @@ static unsigned long f17_determine_edac_cap(struct amd64_pvt *pvt) return edac_cap; } +static unsigned long gpu_determine_edac_cap(struct amd64_pvt *pvt) +{ + return EDAC_FLAG_EC; +} + static void debug_display_dimm_sizes(struct amd64_pvt *, u8); static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan) @@ -1814,6 +1825,25 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) return cs_mode; } +static int gpu_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) +{ + return CS_EVEN_PRIMARY | CS_ODD_PRIMARY; +} + +static void gpu_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) +{ + int size, cs = 0, cs_mode; + + edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl); + + cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY; + + for_each_chip_select(cs, ctrl, pvt) { + size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs); + amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size); + } +} + static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) { int dimm, size0, size1, cs0, cs1, cs_mode; @@ -1835,6 +1865,27 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) } } +static void gpu_dump_misc_regs(struct amd64_pvt *pvt) +{ + struct amd64_umc *umc; + u32 i, umc_base; + + for_each_umc(i) { + umc_base = gpu_get_umc_base(i, 0); + umc = &pvt->umc[i]; + + edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg); + edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl); + edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl); + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i); + + gpu_debug_display_dimm_sizes(pvt, i); + } + + edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n", + pvt->dhar, dhar_base(pvt)); +} + static void __dump_misc_regs_df(struct amd64_pvt *pvt) { struct amd64_umc *umc; @@ -1973,6 +2024,43 @@ static void default_prep_chip_selects(struct amd64_pvt *pvt) pvt->csels[1].m_cnt = 4; } +static void gpu_prep_chip_selects(struct amd64_pvt *pvt) +{ + int umc; + + for_each_umc(umc) { + pvt->csels[umc].b_cnt = 8; + pvt->csels[umc].m_cnt = 8; + } +} + +static void gpu_read_umc_base_mask(struct amd64_pvt *pvt) +{ + u32 base_reg, mask_reg; + u32 *base, *mask; + int umc, cs; + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + base_reg = gpu_get_umc_base(umc, cs) + UMCCH_BASE_ADDR; + base = &pvt->csels[umc].csbases[cs]; + + if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) { + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *base, base_reg); + } + + mask_reg = gpu_get_umc_base(umc, cs) + UMCCH_ADDR_MASK; + mask = &pvt->csels[umc].csmasks[cs]; + + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) { + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *mask, mask_reg); + } + } + } +} + static void read_umc_base_mask(struct amd64_pvt *pvt) { u32 umc_base_reg, umc_base_reg_sec; @@ -2172,6 +2260,11 @@ static void determine_memory_type(struct amd64_pvt *pvt) pvt->dram_type = (pvt->dclr0 & BIT(16)) ? MEM_DDR3 : MEM_RDDR3; } +static void gpu_determine_memory_type(struct amd64_pvt *pvt) +{ + pvt->dram_type = MEM_HBM2; +} + /* Get the number of DCT channels the memory controller is using. */ static int k8_early_channel_count(struct amd64_pvt *pvt) { @@ -2504,6 +2597,19 @@ static int f17_early_channel_count(struct amd64_pvt *pvt) return channels; } +static int gpu_early_channel_count(struct amd64_pvt *pvt) +{ + int i, channels = 0; + + /* The memory channels in case of GPUs are fully populated */ + for_each_umc(i) + channels += pvt->csels[i].b_cnt; + + amd64_info("MCT channel count: %d\n", channels); + + return channels; +} + static int ddr3_cs_size(unsigned i, bool dct_width) { unsigned shift = 0; @@ -2631,11 +2737,46 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct, return ddr3_cs_size(cs_mode, false); } +static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode, + int csrow_nr, int dimm) +{ + u32 msb, weight, num_zero_bits; + u32 addr_mask_deinterleaved; + int size = 0; + + /* + * The number of zero bits in the mask is equal to the number of bits + * in a full mask minus the number of bits in the current mask. + * + * The MSB is the number of bits in the full mask because BIT[0] is + * always 0. + * + * In the special 3 Rank interleaving case, a single bit is flipped + * without swapping with the most significant bit. This can be handled + * by keeping the MSB where it is and ignoring the single zero bit. + */ + msb = fls(addr_mask_orig) - 1; + weight = hweight_long(addr_mask_orig); + num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE); + + /* Take the number of zero bits off from the top of the mask. */ + addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1); + + edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm); + edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig); + edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved); + + /* Register [31:1] = Address [39:9]. Size is in kBs here. */ + size = (addr_mask_deinterleaved >> 2) + 1; + + /* Return size in MBs. */ + return size >> 10; +} + static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, unsigned int cs_mode, int csrow_nr) { - u32 addr_mask_orig, addr_mask_deinterleaved; - u32 msb, weight, num_zero_bits; + u32 addr_mask_orig; int cs_mask_nr = csrow_nr; int dimm, size = 0; @@ -2680,33 +2821,15 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, else addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr]; - /* - * The number of zero bits in the mask is equal to the number of bits - * in a full mask minus the number of bits in the current mask. - * - * The MSB is the number of bits in the full mask because BIT[0] is - * always 0. - * - * In the special 3 Rank interleaving case, a single bit is flipped - * without swapping with the most significant bit. This can be handled - * by keeping the MSB where it is and ignoring the single zero bit. - */ - msb = fls(addr_mask_orig) - 1; - weight = hweight_long(addr_mask_orig); - num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE); - - /* Take the number of zero bits off from the top of the mask. */ - addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1); - - edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm); - edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig); - edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved); + return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, cs_mask_nr, dimm); +} - /* Register [31:1] = Address [39:9]. Size is in kBs here. */ - size = (addr_mask_deinterleaved >> 2) + 1; +static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, + unsigned int cs_mode, int csrow_nr) +{ + u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr]; - /* Return size in MBs. */ - return size >> 10; + return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1); } static void read_dram_ctl_register(struct amd64_pvt *pvt) @@ -3703,6 +3826,11 @@ static void f17_determine_ecc_sym_sz(struct amd64_pvt *pvt) } } +/* ECC symbol size is not available on Aldebaran nodes */ +static void gpu_determine_ecc_sym_sz(struct amd64_pvt *pvt) +{ +} + static void read_top_mem_registers(struct amd64_pvt *pvt) { u64 msr_val; @@ -3724,6 +3852,25 @@ static void read_top_mem_registers(struct amd64_pvt *pvt) } } +static void gpu_read_mc_regs(struct amd64_pvt *pvt) +{ + u8 nid = pvt->mc_node_id; + struct amd64_umc *umc; + u32 i, umc_base; + + /* Read registers from each UMC */ + for_each_umc(i) { + umc_base = gpu_get_umc_base(i, 0); + umc = &pvt->umc[i]; + + amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg); + amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl); + amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); + } + + amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); +} + /* * Retrieve the hardware registers of the memory controller. */ @@ -3850,6 +3997,35 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig) return nr_pages; } +static int gpu_init_csrows(struct mem_ctl_info *mci) +{ + struct amd64_pvt *pvt = mci->pvt_info; + struct dimm_info *dimm; + int empty = 1; + u8 umc, cs; + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + if (!csrow_enabled(cs, umc, pvt)) + continue; + + empty = 0; + dimm = mci->csrows[umc]->channels[cs]->dimm; + + edac_dbg(1, "MC node: %d, csrow: %d\n", + pvt->mc_node_id, cs); + + dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs); + dimm->mtype = pvt->dram_type; + dimm->edac_mode = EDAC_SECDED; + dimm->dtype = DEV_X16; + dimm->grain = 64; + } + } + + return empty; +} + static int init_csrows_df(struct mem_ctl_info *mci) { struct amd64_pvt *pvt = mci->pvt_info; @@ -4190,6 +4366,12 @@ static bool f17_check_ecc_enabled(struct amd64_pvt *pvt) return true; } +/* ECC is enabled by default on GPU nodes */ +static bool gpu_check_ecc_enabled(struct amd64_pvt *pvt) +{ + return true; +} + static inline void f1x_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) { @@ -4231,6 +4413,12 @@ f17_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) } } +static inline void +gpu_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) +{ + mci->edac_ctl_cap |= EDAC_FLAG_SECDED; +} + static void f1x_setup_mci_misc_attrs(struct mem_ctl_info *mci) { struct amd64_pvt *pvt = mci->pvt_info; @@ -4251,6 +4439,22 @@ static void f1x_setup_mci_misc_attrs(struct mem_ctl_info *mci) mci->get_sdram_scrub_rate = get_scrub_rate; } +static void gpu_setup_mci_misc_attrs(struct mem_ctl_info *mci) +{ + struct amd64_pvt *pvt = mci->pvt_info; + + mci->mtype_cap = MEM_FLAG_HBM2; + mci->edac_ctl_cap = EDAC_FLAG_NONE; + + pvt->ops->determine_edac_ctl_cap(mci, pvt); + + mci->edac_cap = pvt->ops->determine_edac_cap(pvt); + mci->mod_name = EDAC_MOD_STR; + mci->ctl_name = pvt->ctl_name; + mci->dev_name = pci_name(pvt->F3); + mci->ctl_page_to_phys = NULL; +} + /* * returns a pointer to the family descriptor on success, NULL otherwise. */ @@ -4460,6 +4664,20 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0; pvt->f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6; pvt->max_mcs = 4; + pvt->ops->early_channel_count = gpu_early_channel_count; + pvt->ops->dbam_to_cs = gpu_addr_mask_to_cs_size; + pvt->ops->prep_chip_selects = gpu_prep_chip_selects; + pvt->ops->determine_memory_type = gpu_determine_memory_type; + pvt->ops->determine_ecc_sym_sz = gpu_determine_ecc_sym_sz; + pvt->ops->determine_edac_ctl_cap = gpu_determine_edac_ctl_cap; + pvt->ops->determine_edac_cap = gpu_determine_edac_cap; + pvt->ops->setup_mci_misc_attrs = gpu_setup_mci_misc_attrs; + pvt->ops->get_cs_mode = gpu_get_cs_mode; + pvt->ops->ecc_enabled = gpu_check_ecc_enabled; + pvt->ops->get_base_mask = gpu_read_umc_base_mask; + pvt->ops->dump_misc_regs = gpu_dump_misc_regs; + pvt->ops->get_mc_regs = gpu_read_mc_regs; + pvt->ops->populate_csrows = gpu_init_csrows; goto end_fam; } else { pvt->ctl_name = "F19h_M30h"; @@ -4581,9 +4799,10 @@ static int init_one_instance(struct amd64_pvt *pvt) if (pvt->channel_count < 0) return ret; + /* Define layers for CPU and GPU nodes */ ret = -ENOMEM; layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; - layers[0].size = pvt->csels[0].b_cnt; + layers[0].size = amd_gpu_nb_num() ? pvt->max_mcs : pvt->csels[0].b_cnt; layers[0].is_virt_csrow = true; layers[1].type = EDAC_MC_LAYER_CHANNEL; @@ -4592,7 +4811,7 @@ static int init_one_instance(struct amd64_pvt *pvt) * only one channel. Also, this simplifies handling later for the price * of a couple of KBs tops. */ - layers[1].size = pvt->max_mcs; + layers[1].size = amd_gpu_nb_num() ? pvt->csels[0].b_cnt : pvt->max_mcs; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); @@ -4786,7 +5005,7 @@ static int __init amd64_edac_init(void) opstate_init(); err = -ENOMEM; - ecc_stngs = kcalloc(amd_nb_num(), sizeof(ecc_stngs[0]), GFP_KERNEL); + ecc_stngs = kcalloc(amd_total_nb_num(), sizeof(ecc_stngs[0]), GFP_KERNEL); if (!ecc_stngs) goto err_free; @@ -4794,7 +5013,7 @@ static int __init amd64_edac_init(void) if (!msrs) goto err_free; - for (i = 0; i < amd_nb_num(); i++) { + for (i = 0; i < amd_total_nb_num(); i++) { err = probe_one_instance(i); if (err) { /* unwind properly */ @@ -4852,7 +5071,7 @@ static void __exit amd64_edac_exit(void) else amd_unregister_ecc_decoder(decode_bus_error); - for (i = 0; i < amd_nb_num(); i++) + for (i = 0; i < amd_total_nb_num(); i++) remove_one_instance(i); kfree(ecc_stngs); diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 66f7b5d45a37..71df08a496d2 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -653,3 +653,24 @@ static inline u32 dct_sel_baseaddr(struct amd64_pvt *pvt) * - GPU UMCs use 1 chip select. So we say UMC = EDAC CSROW. * - GPU UMCs use 8 channels. So we say UMC Channel = EDAC Channel. */ +static inline u32 gpu_get_umc_base(u8 umc, u8 channel) +{ + /* + * On CPUs, there is one channel per UMC, so UMC numbering equals + * channel numbering. On GPUs, there are eight channels per UMC, + * so the channel numbering is different from UMC numbering. + * + * On CPU nodes channels are selected in 6th nibble + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000; + * + * On GPU nodes channels are selected in 3rd nibble + * HBM chX[3:0]= [Y ]5X[3:0]000; + * HBM chX[7:4]= [Y+1]5X[3:0]000 + */ + umc *= 2; + + if (channel >= 4) + umc++; + + return 0x50000 + (umc << 20) + ((channel % 4) << 12); +} From patchwork Thu Feb 3 17:49:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A10DAC43217 for ; Thu, 3 Feb 2022 17:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236406AbiBCRu7 (ORCPT ); Thu, 3 Feb 2022 12:50:59 -0500 Received: from mail-bn8nam12on2072.outbound.protection.outlook.com ([40.107.237.72]:40033 "EHLO NAM12-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353119AbiBCRue (ORCPT ); Thu, 3 Feb 2022 12:50:34 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Z41XN6Fmkz7g2KgovNpylNOzQtKGyeqLB6ucLhBMta2du9xm0Fy54vh4eR/AoS5bjNNu5GyXEJQdgPE1SZsKtyWpsZ0UopDMZFnwY0aqt6Jnxy8icOWQiq1t4DuV/ovH0lzmyM9hy6LXwFMGCjdg/7MEuyO2I9Hua7ndjxhzTB+c3fi6RXRzCFJh8c2XnrX/6HY2mpmgqs+rcDoa14qdL8sJwgb2eOuJbs9jg8ZWgBQ9jjcHOr7bU+IKOiOrspEe2BNW3CaB/ZQHcOvEzx9Q26XccRQKoC8+xTgFFPaQ1Vom/Pr/AwXsjbrUf0yhjCDVH7hMVSx70uuxPQwJB7j6IA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MSp5AyC1NrGmgO1uYNL2peQXGZdt8Hpz6j6UM+TB490=; b=Y9bKlX6DZUlOKRWfTw/Q+AheAPNUiR/oBQK21y+eiPLHr1fp3kBk1Ai50jOXZBl8e8/QZbYXjNq0oe6rEvTnUGNwVwN4P9YHZkcMc2VVfAYiEbGrMB0n9fi6bggLMqnW1F7lE+HBhfNGejZfFXDkhttVLHHRKnQAhJvHV7QDVrTaeHedQzYEsBgMeojwleZ7DdYObiqakrOnIk19v5qKjpQicVhcUpPHPnuHnK+VSuqAweNWjVuQQW/wFXIE/36sRhPXbZlZJb1GX5t8vGSW2Mboy5h9+gJrwT31j2LqgDL2mEcOoe4sDEY8qEO7whrzCJJm/hzA+TN5QS5OnlG1MQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MSp5AyC1NrGmgO1uYNL2peQXGZdt8Hpz6j6UM+TB490=; b=p/NsdTThZ+j9ojg0Huj7JoUnj0z636KYh8rZ1ZKh/DoVgY8MIV9v4IIj8ZRCadf3ZaCgIzQ5deyrnmzikRBZudjre8wx0vsAzKpvpAdhu8PcFxk5MMGX3GW0cCxpxpK2yiGCq+VnBhK+iO35A4UmhaLOpFRKTiozLkgZsXkVM+E= Received: from MW4PR02CA0021.namprd02.prod.outlook.com (2603:10b6:303:16d::31) by BY5PR12MB3746.namprd12.prod.outlook.com (2603:10b6:a03:1a7::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:30 +0000 Received: from CO1NAM11FT007.eop-nam11.prod.protection.outlook.com (2603:10b6:303:16d:cafe::3b) by MW4PR02CA0021.outlook.office365.com (2603:10b6:303:16d::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.11 via Frontend Transport; Thu, 3 Feb 2022 17:50:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT007.mail.protection.outlook.com (10.13.174.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:30 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:27 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi , Muralidhara M K Subject: [PATCH v7 08/12] EDAC/amd64: Add Family ops to update GPU csrow and channel info Date: Thu, 3 Feb 2022 11:49:38 -0600 Message-ID: <20220203174942.31630-9-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b5414c88-6b47-4416-c4e1-08d9e73dab64 X-MS-TrafficTypeDiagnostic: BY5PR12MB3746:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: YZVBLebseeFDdcBdq8ausGYN5vloXcOOG2d5QwNYpcUFiH+8QzxccY08gK/ZGB6EDI2N+QaHUuYnVAlqrczzSayxqTx++hbvqOE3EYT4LRydfj6HdEeJ4tCrnP+CG9mGwLmFAus1b9SaIsqsaWogO5GkRpcZ6uhcoi6EfFKffVMKKqFBfVPAPdYXjru8E2zHoJajFBfzwVYEKmYCxjwCGEAHqfWn+vwbZfCYpkZUroq8zyWuFXKJTJl+k+H9tCTZLXlxPBAcuDIpQcv6lzGpV/TYsM2YK4L2QGG3putddM3WTQkKpqxNPmfWg/mhqpkiVtflSvAdVqcXdYr9/nh09oH2RZ3EH44ggBac+EQRydmpR3wPIMWhVjdaSNKaYmd73mfecUzkBawhIr9AY9WI9ZkcSNtCfp1I/MQ1hQ3tnpitJS6spH84YMn4uSLWLaoae58r2Co2q/7eNHswGWkLMn2NLjzLYFcgT2UJRMQPZ69FuP16y4imrhm9vc3D/hSaVYp6N+h+v7Mncyg7d18qJGEjypaiPGeXH8GjlsNegTFi6HmIqcmtPUHdh/ucml2welYHcNPtH3BTAkLBXnsUR7G5OgepDnXWTg6sIRDlESMSLUQI06u4cbcYnGxffbHLBKj7U0t8t5nVMDvoK05qp+bhRt5on9K3rrjNGbExtTcPQ914aOqdmTos+lpMYywZBuhe/JkN4OvezfUVEPf0vg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(46966006)(36840700001)(36860700001)(15650500001)(82310400004)(8936002)(426003)(2906002)(7696005)(8676002)(2616005)(5660300002)(70206006)(70586007)(4326008)(6666004)(47076005)(40460700003)(508600001)(336012)(356005)(186003)(83380400001)(81166007)(26005)(36756003)(316002)(54906003)(16526019)(1076003)(110136005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:30.2290 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b5414c88-6b47-4416-c4e1-08d9e73dab64 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT007.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB3746 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org GPU node has 'X' number of PHYs and 'Y' number of channels. This results in 'X*Y' number of instances in the Data Fabric. Therefore the Data Fabric ID of an instance in GPU as below: df_inst_id = 'X' * number of channels per PHY + 'Y' On CPUs the Data Fabric ID of an instance on a CPU is equal to the UMC number. since the UMC number and channel are equal in CPU nodes, the channel can be used as the Data Fabric ID of the instance. Cc: Yazen Ghannam Co-developed-by: Muralidhara M K Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- v1->v7: * New change in v7 drivers/edac/amd64_edac.c | 60 +++++++++++++++++++++++++++++++++++++-- drivers/edac/amd64_edac.h | 2 ++ 2 files changed, 60 insertions(+), 2 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 10efe726a959..241419a0be93 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3653,6 +3653,30 @@ static inline void decode_bus_error(int node_id, struct mce *m) __log_ecc_error(mci, &err, ecc_type); } +/* + * On CPUs, The Data Fabric ID of an instance is equal to the UMC number. + * And since the UMC number and channel are equal in CPU nodes, the channel can be used + * as the Data Fabric ID of the instance. + */ +static int f17_df_inst_id(struct mem_ctl_info *mci, struct amd64_pvt *pvt, + struct err_info *err) +{ + return err->channel; +} + +/* + * A GPU node has 'X' number of PHYs and 'Y' number of channels. + * This results in 'X*Y' number of instances in the Data Fabric. + * Therefore the Data Fabric ID of an instance can be found with the following formula: + * df_inst_id = 'X' * number of channels per PHY + 'Y' + * + */ +static int gpu_df_inst_id(struct mem_ctl_info *mci, struct amd64_pvt *pvt, + struct err_info *err) +{ + return (err->csrow * pvt->channel_count / mci->nr_csrows) + err->channel; +} + /* * To find the UMC channel represented by this bank we need to match on its * instance_id. The instance_id of a bank is held in the lower 32 bits of its @@ -3670,12 +3694,38 @@ static void update_umc_err_info(struct mce *m, struct err_info *err) err->csrow = m->synd & 0x7; } +/* + * The CPUs have one channel per UMC, So UMC number is equivalent to a + * channel number. The GPUs have 8 channels per UMC, so the UMC number no + * longer works as a channel number. + * The channel number within a GPU UMC is given in MCA_IPID[15:12]. + * However, the IDs are split such that two UMC values go to one UMC, and + * the channel numbers are split in two groups of four. + * + * Refer comment on get_umc_base_gpu() from amd64_edac.h + * + * For example, + * UMC0 CH[3:0] = 0x0005[3:0]000 + * UMC0 CH[7:4] = 0x0015[3:0]000 + * UMC1 CH[3:0] = 0x0025[3:0]000 + * UMC1 CH[7:4] = 0x0035[3:0]000 + */ +static void gpu_update_umc_err_info(struct mce *m, struct err_info *err) +{ + u8 ch = (m->ipid & GENMASK(31, 0)) >> 20; + u8 phy = ((m->ipid >> 12) & 0xf); + + err->channel = ch % 2 ? phy + 4 : phy; + err->csrow = phy; +} + static void decode_umc_error(int node_id, struct mce *m) { u8 ecc_type = (m->status >> 45) & 0x3; struct mem_ctl_info *mci; struct amd64_pvt *pvt; struct err_info err; + u8 df_inst_id; u64 sys_addr; mci = edac_mc_find(node_id); @@ -3705,7 +3755,9 @@ static void decode_umc_error(int node_id, struct mce *m) pvt->ops->get_umc_err_info(m, &err); - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) { + df_inst_id = pvt->ops->find_umc_inst_id(mci, pvt, &err); + + if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) { err.err_code = ERR_NORM_ADDR; goto log_error; } @@ -4625,6 +4677,7 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->get_mc_regs = __read_mc_regs_df; pvt->ops->populate_csrows = init_csrows_df; pvt->ops->get_umc_err_info = update_umc_err_info; + pvt->ops->find_umc_inst_id = f17_df_inst_id; if (pvt->fam == 0x18) { pvt->ctl_name = "F18h"; @@ -4678,6 +4731,8 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->dump_misc_regs = gpu_dump_misc_regs; pvt->ops->get_mc_regs = gpu_read_mc_regs; pvt->ops->populate_csrows = gpu_init_csrows; + pvt->ops->get_umc_err_info = gpu_update_umc_err_info; + pvt->ops->find_umc_inst_id = gpu_df_inst_id; goto end_fam; } else { pvt->ctl_name = "F19h_M30h"; @@ -4707,6 +4762,7 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->get_mc_regs = __read_mc_regs_df; pvt->ops->populate_csrows = init_csrows_df; pvt->ops->get_umc_err_info = update_umc_err_info; + pvt->ops->find_umc_inst_id = f17_df_inst_id; end_fam: break; @@ -4728,7 +4784,7 @@ static void per_family_init(struct amd64_pvt *pvt) } /* ops required for families 17h and later */ - if (pvt->fam >= 0x17 && !pvt->ops->get_umc_err_info) { + if (pvt->fam >= 0x17 && (!pvt->ops->get_umc_err_info || !pvt->ops->find_umc_inst_id)) { edac_dbg(1, "Platform specific helper routines not defined.\n"); return; } diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 71df08a496d2..b6177bd5d5ba 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -496,6 +496,8 @@ struct low_ops { void (*setup_mci_misc_attrs)(struct mem_ctl_info *mci); int (*populate_csrows)(struct mem_ctl_info *mci); void (*get_umc_err_info)(struct mce *m, struct err_info *err); + int (*find_umc_inst_id)(struct mem_ctl_info *mci, struct amd64_pvt *pvt, + struct err_info *err); }; int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset, From patchwork Thu Feb 3 17:49:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33378C433EF for ; Thu, 3 Feb 2022 17:50:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353136AbiBCRug (ORCPT ); Thu, 3 Feb 2022 12:50:36 -0500 Received: from mail-bn7nam10on2062.outbound.protection.outlook.com ([40.107.92.62]:23302 "EHLO NAM10-BN7-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353112AbiBCRuc (ORCPT ); Thu, 3 Feb 2022 12:50:32 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZW90GFfIhkuMQNlT+xF1XZsaRjeAdvwCpUcL8143mNvq05eh0T/Gv1Rl7hiLXKr/VLutOD9yEzWiKkGp+0WLOfDQrDyq9zS/1qGXBveXSFlv09ue/ehRmAvx2FLGrtWix2dqN9eLgvrBcTsGZeaaqdlOMNSrRLpyYSYsU29NUE6vC2DpztESYzuMR9XjsHrUc2+bYJpj4rmDWf+ExMGj3A3IS7iIhbzz9VWZwWhfbgghJBESpHD6aY6vex8WlZmX1gYh0uIJnpc9StplP/0LnKTaJlMRJYtt6UfHTZobKAegI/aUOIC8NavuPDKxWMkfvcfB3ZM5olr614nq9RImJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=W/6n+F88rKhX1kcncnU8KQWJs3UA7c+aut25zH9c3y8=; b=NaBBE4F0J47AZw4gQ9fpLU/OV8X9FoYEwQ4f54XJCZbVml0gixMzblxCyVaM6qhhrK4LjhihxkiRLPXShUUEUFthCR7v6Q2Z4IQviSndhGoufLlXTmUcEhQhYh/OoIt4pAdrkTKGj9yphNFK1oAwLM1NIIXUwtr+Yb2/LUnowVEXtCG3LhBDuy4tZo3T++bJOZeYf+Gk2XBgjHJJIqD3xtigwLFazTlWbURjLkjsJb75RiquejAq5sCx6l/5CrTOHS7wLdlo88GEJ8+2ZCG/uVzcQ3C/i5oH7vtCfsuI5gF1NCAd4qfjnWtZt/HHq5UNzwVNUXAO+QOwOtBvNzQwcA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=W/6n+F88rKhX1kcncnU8KQWJs3UA7c+aut25zH9c3y8=; b=T1GHHsbzTClN2s8oYVyyvOTxuu9oPXshcBaF9tZVer+ztdZFWERu9pc8YPtEXR3f9E4xoAdtWRowlUU1VYtlrQ5Plj/yTz3GBIKDhHByUqyHoJVLeinhnKrfkWKy6qXTp35TZbWh6Wn1YyLS+uBQdKdnXuzmMO/IMKR+48PYkiA= Received: from MW4PR04CA0305.namprd04.prod.outlook.com (2603:10b6:303:82::10) by CH2PR12MB4808.namprd12.prod.outlook.com (2603:10b6:610:e::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:30 +0000 Received: from CO1NAM11FT019.eop-nam11.prod.protection.outlook.com (2603:10b6:303:82:cafe::a4) by MW4PR04CA0305.outlook.office365.com (2603:10b6:303:82::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT019.mail.protection.outlook.com (10.13.175.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:30 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:27 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi Subject: [PATCH v7 09/12] EDAC/amd64: Add check for when to add DRAM base and hole Date: Thu, 3 Feb 2022 11:49:39 -0600 Message-ID: <20220203174942.31630-10-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 189b493b-fabe-4be0-ebdb-08d9e73dab9b X-MS-TrafficTypeDiagnostic: CH2PR12MB4808:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:7219; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 2erm2kFhVBus9NxfOi36HJY3Ea2BxluSD9f/4ySRAydKczs7CuObMJEjYhJbx+sichwvlncb01y6WmKBtdZWLx2fpeApjzT80m2Ene43qyGcyTEBXGwzt/wjvNUGsy4FiFj3aXePenK63bIGEpDidD+SWuMArWkcVJBnmykolrhwTT4lgQOipQqjlUA10F4KDpwlStwxFfEhkHGpH7UVN+YbSLaRxzGsHqFe9ChOIFaKd8M+N+WwlwZvsmGyHOV87MecRvEpUn93yqwgH09eClws34eerFMITt3yekK6ZUm3DdZZvP4jKM5qnFqU/eeQ4A6DQyISEwepF6kYFSmvP3yYT2FNbQDsGE99HYTd+oTc0P+g3U6bcLILvZmaBJGTB3iKZvS1kfcxN695D71q5hT9v9PKUYNkJpDtC54i68iG7KvQ6l/8xmj/7MEkfE01TImK28sx5uMHW8rgtkyfE3ulMvDDHWqFI184kHTD72Snxu9G4LMz0BY+v1vA4+vVT98LeKxJ/skYNqjfxK/4sQ21EBzpShq8Aoa7EHBvTfI2Ak4UFsxqSZMV7AH6OVmOzZrRTCLc49ONPK3HNbRozdDzvKrrCAnIYE87jf+WguSQaZe38nOIHQNNAnEbHzxFLSanWLchgp0rzLPqTCYL4cueJJabXZww0rZguoSoNAas9GsAr2K263+jlGXemaftfQtmQIVM62OIk60icQd8CYjbT9EbQyuQ35MR788YD38ldQHc+PeeQvdMmXbp7WUeoveDDw66GRJhOMC7YKAxq07n7uo8kOa4TK1LwxwpEN0= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(46966006)(36840700001)(2906002)(7696005)(356005)(6666004)(2616005)(83380400001)(47076005)(82310400004)(336012)(36756003)(426003)(508600001)(186003)(1076003)(16526019)(26005)(81166007)(54906003)(36860700001)(316002)(4326008)(70586007)(8936002)(966005)(110136005)(8676002)(70206006)(40460700003)(5660300002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:30.6060 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 189b493b-fabe-4be0-ebdb-08d9e73dab9b X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT019.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4808 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Yazen Ghannam Some DF versions and interleaving modes require the DRAM base address and hole adjustments to be done before accounting for hashing. Others require this to be done after. Add a check for this behavior. Save a boolean in the ctx struct which can be appropriately set based on DF version or interleaving mode. Signed-off-by: Yazen Ghannam Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20211028175728.121452-32-yazen.ghannam@amd.com v4->v7: * Was patch 31 in v3. v2->v3: * New in v3. drivers/edac/amd64_edac.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 241419a0be93..0f35c8519555 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1112,6 +1112,7 @@ struct addr_ctx { u8 intlv_num_sockets; u8 cs_id; u8 node_id_shift; + bool late_hole_remove; int (*dehash_addr)(struct addr_ctx *ctx); void (*make_space_for_cs_id)(struct addr_ctx *ctx); void (*insert_cs_id)(struct addr_ctx *ctx); @@ -1676,7 +1677,7 @@ static int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr return -EINVAL; } - if (add_base_and_hole(&ctx)) { + if (!ctx.late_hole_remove && add_base_and_hole(&ctx)) { pr_debug("Failed to add DRAM base address and hole"); return -EINVAL; } @@ -1686,6 +1687,11 @@ static int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr return -EINVAL; } + if (ctx.late_hole_remove && add_base_and_hole(&ctx)) { + pr_debug("Failed to add DRAM base address and hole"); + return -EINVAL; + } + if (addr_over_limit(&ctx)) { pr_debug("Calculated address is over limit"); return -EINVAL; From patchwork Thu Feb 3 17:49:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFE43C4167D for ; Thu, 3 Feb 2022 17:51:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235732AbiBCRvC (ORCPT ); Thu, 3 Feb 2022 12:51:02 -0500 Received: from mail-mw2nam08on2076.outbound.protection.outlook.com ([40.107.101.76]:29312 "EHLO NAM04-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353123AbiBCRue (ORCPT ); Thu, 3 Feb 2022 12:50:34 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FLGCiudrrFG2TLk5Uwgiv8qQHJpXTN5KfCor+h0H3ShZv+09zyNIWlM6y4RbreDdusDv/zw4QogpjhA3+L9fLOjas0SpXWbwTQGfQO9oAxDE51OBjv20zvrIEXkLSxildxZ2TDGmcqngEXJRLaZr09XS6qgOBcysnc4mSM7aNMdyxIMg4QDEiy2rX0Gk135h/iJvLcgXxp4wcIHYKLBOg7nffAnJj5oAdQtsP/U95hdOOWZB6P62EMS0gJ5DKZ4FY6zb6DIXJeOby8E9LX3b0uXJnqbQjv3sl4LrOIrQ/TG/EN7HXr5x9Iq3A012K8LYld2mISMyIuWwp11k22FzHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0na8pE85ewegPop5SuJcM8lD8Rd2Om0ZDs9i85errHM=; b=U0/HvqExzsS6KnzVvFNpSWg3zoecFYzjRttqCf5j8KMiiRaOz7Z4SJOnAIOxAFEYm6f3PUIVMZ3q3WKRzkfEmBXE5G5OhnwPmdyv9ycvjhPE/iRnPk1K7BYEc2u+b13ZdOAqL5DYtolP/fC5hsbM4gsW4Im0EbgskfXfjNX3eMdZmgte5c695sFqlZDR/o7CKR9Owa3IGsfYEBEuoLb7s5yH6QGT/EZiX9/TklZxD2ETAPDbMVg9tu/xa/YX+KIwzTXQypJVqwm4i9kbYhEobpSBJqm6BPOd11PyUpd/4gmFPYXzgLuDoVmJbO2p5ObAf82dnmRMzg75MG7nGWJ23Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0na8pE85ewegPop5SuJcM8lD8Rd2Om0ZDs9i85errHM=; b=EuSDrCAFeRWvd4Liu+Kdjf79YlmmMFoiq6G9xrBLjPd/DVD9AoT4540txgh+NKFAKxQhRGgcFUurlRLNZXb2V20548PD9RyvFQguFWpOhFAkUHkRXCtqLR5MYMxxOHYWpdb2SFPENsEYLVKZQ0oBRzXKMQPw6J7RzZoDO9X5WxM= Received: from MW4PR04CA0301.namprd04.prod.outlook.com (2603:10b6:303:82::6) by BN6PR12MB1747.namprd12.prod.outlook.com (2603:10b6:404:106::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12; Thu, 3 Feb 2022 17:50:31 +0000 Received: from CO1NAM11FT019.eop-nam11.prod.protection.outlook.com (2603:10b6:303:82:cafe::a1) by MW4PR04CA0301.outlook.office365.com (2603:10b6:303:82::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.14 via Frontend Transport; Thu, 3 Feb 2022 17:50:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT019.mail.protection.outlook.com (10.13.175.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:31 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:28 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi Subject: [PATCH v7 10/12] EDAC/amd64: Save the number of block instances Date: Thu, 3 Feb 2022 11:49:40 -0600 Message-ID: <20220203174942.31630-11-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0c313758-e2c6-4041-5a77-08d9e73dac0e X-MS-TrafficTypeDiagnostic: BN6PR12MB1747:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3968; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +18qSuedfKmjPiT5wR2b3SNnRmgCMnXjoTNVb2FXS4CBPje4cClPATsvZu3n+2oC99lJ4ylkT5N5KvOHJzCj1OwpZ6EDp69a6DCz3Ejks5J/Bv7pzOwDYsKA6GucnYbMSAl9GR0tS2OFGudpGHVX3aOoeQrM/sOloqXPInH2aA9OQefTKnZVJookcMDp+qR7neDILVBU8q70Rl9WQKnlaHJvw7O5d4Gx2F3TwbXte2eK4qNYHzcAyDchPD0IFhsjLxSmSsQCSXDVOVsn/Np28DTE2Z0v3lKMl/osaMfYYV1B2O65Ea8PULBDDm8RkLqTYmItKGZa+VY78fNLS84cAMejUV9dj/IKWgcYHizDPB7vM+OjKJzFDaoFG8IhxmOZpb6f55q+1+AxtYI9ftNBIEpFArjRxAYYBSg4s03tvYXAIA8QJwlIwjmqmhfkG/ewh4Zv5c5gauhHli5LUjQZBNzvDbBAAEMLzKgOCP2rvY6yr91XPYBWfK9ofAD7YDWl/WYdRrfbRVyM04vmhMXe2GFbZ6LZqSOGx/naoAn1pxcl569mgb19cz9DAOJFFcJeGe+TkGYjvmFSSv4Aak4q2ei/Tda3jBrtgsVjCwpY+zX9OaKM8JWCchqy0dOw25kuu4/l7ScGHpq4VQdwVtmo0+A6ajfeMMiAHsJ8Isp8N5m3b8ccdGQa4JBa6gVV50KdRjsg/ROAPk/yP7YYHkHDYwR/ErmA18GmfDZGxT1TcTp0wAGSInoZFEoeEYYx7ewJPVKSW1G/1Ue2QuhQ/wKSmVa769OM0XrWPpJAuq68xgQ= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(2906002)(966005)(5660300002)(40460700003)(54906003)(47076005)(110136005)(316002)(83380400001)(36756003)(4326008)(70586007)(70206006)(82310400004)(356005)(6666004)(7696005)(8936002)(8676002)(508600001)(36860700001)(2616005)(186003)(336012)(426003)(16526019)(26005)(81166007)(1076003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:31.3560 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0c313758-e2c6-4041-5a77-08d9e73dac0e X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT019.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR12MB1747 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Yazen Ghannam Cache the number of block instances. This value is needed by future DF versions. Signed-off-by: Yazen Ghannam Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20211028175728.121452-33-yazen.ghannam@amd.com v3->v7: * Was in patch 32 in v3. v2->v3: * New in v3. drivers/edac/amd64_edac.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 0f35c8519555..8f2b5c8be651 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1112,6 +1112,7 @@ struct addr_ctx { u8 intlv_num_sockets; u8 cs_id; u8 node_id_shift; + u8 num_blk_instances; bool late_hole_remove; int (*dehash_addr)(struct addr_ctx *ctx); void (*make_space_for_cs_id)(struct addr_ctx *ctx); @@ -1437,6 +1438,17 @@ struct data_fabric_ops df3_ops = { struct data_fabric_ops *df_ops; +static int get_blk_inst_cnt(struct addr_ctx *ctx) +{ + /* Read D18F0x40 (FabricBlockInstanceCount). */ + if (df_indirect_read_broadcast(0, 0, 0x40, &ctx->tmp)) + return -EINVAL; + + ctx->num_blk_instances = ctx->tmp & 0xFF; + + return 0; +} + static int get_dram_offset_reg(struct addr_ctx *ctx) { /* Read D18F0x1B4 (DramOffset) */ @@ -1647,6 +1659,11 @@ static int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr ctx.nid = nid; ctx.inst_id = umc; + if (get_blk_inst_cnt(&ctx)) { + pr_debug("Failed to get Block Instance Count"); + return -EINVAL; + } + if (df_ops->get_masks(&ctx)) { pr_debug("Failed to get masks"); return -EINVAL; From patchwork Thu Feb 3 17:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E524FC4167B for ; Thu, 3 Feb 2022 17:51:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241091AbiBCRvA (ORCPT ); Thu, 3 Feb 2022 12:51:00 -0500 Received: from mail-dm6nam12on2059.outbound.protection.outlook.com ([40.107.243.59]:62369 "EHLO NAM12-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353086AbiBCRue (ORCPT ); Thu, 3 Feb 2022 12:50:34 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AsGo2JMMYLTqsfzoIrshwXFLwHkas6cD1geLu3XS/Rwi9fO1GEX16/rzqMTHbR0M/+l1Zn29nS0Hevr2xBWTFgeNOr5Uot9Yjfb/o/XPosMJ70cfwwR22OjNDD3kBN8h/b+GQJ9pEGbGakaN2y9kDrrEMQruS1gan+Z9Eg1N3KHwYKFd7yUppqOjQ3Cuon8eMW5c/g8Ojcwx+x3PGMl79isUWpTTu49ns+OrUc6z4aQvSLQ3G1VRsB55L4CrWS1DfSjJP5N7GCRYQzfSrO0+5n9Q2NM6KosaMQoQO01g8UvE9GbDB1PVMveozJHlEe3mO5We2HIlcbghJxSKDzWaLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=z81QHFc5J6IwgSiGu/++ELVqS2eRhOtUhZtPhuq6WWk=; b=IN0ENtaVnHiNz78F2kwRYwqX2dmoCgF5yTaXGBCgkxcLITtBUas11rR35jvom7/iBUYr8vizDkNqad7dw+BCCese3Yu9rc/JIJzoV4jWfysoWCfetYzDSEgVAI6a4ag/SyvSScvY44Duz7NqjUzyTyRZaQTBkIz91dBrg4autDSwoOQo3SLp3UNtLL09e/zRlukA9r01ZA+2WKOjS1nP5XeobIREpp2V02f2c11JSjN3hADwd/k6KIfUGArvj9gBjnIj0pfjdq7E+HOr1p5H8mJCBl4PxNs0KAq47P/I/Ljc3W4Rj4VtBj4oAY2tqS/1qdypbBniTfjTRZfeyhIxBw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z81QHFc5J6IwgSiGu/++ELVqS2eRhOtUhZtPhuq6WWk=; b=Cr1Z+YxpS8sV1Wrj874ZZHnTXHjtJEP3kPPsohOz5tgDjnBxWsY2WZZ+ht4WLsxSKAjR8wOAQCZ/WPgHa5Sv6Bl1eJy2QnIyl8CCzohud6YXsdt029f6qypqr7cwVFaFybDZk0+VIoMopQZIoHOIRZIEVQ0lk3/+icSk90YAxNA= Received: from MW4PR02CA0005.namprd02.prod.outlook.com (2603:10b6:303:16d::14) by CY4PR1201MB0216.namprd12.prod.outlook.com (2603:10b6:910:18::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.20; Thu, 3 Feb 2022 17:50:31 +0000 Received: from CO1NAM11FT007.eop-nam11.prod.protection.outlook.com (2603:10b6:303:16d:cafe::d6) by MW4PR02CA0005.outlook.office365.com (2603:10b6:303:16d::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.11 via Frontend Transport; Thu, 3 Feb 2022 17:50:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT007.mail.protection.outlook.com (10.13.174.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:31 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:28 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v7 11/12] EDAC/amd64: Add address translation support for DF3.5 Date: Thu, 3 Feb 2022 11:49:41 -0600 Message-ID: <20220203174942.31630-12-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c4457a25-27b7-4463-7d19-08d9e73dac23 X-MS-TrafficTypeDiagnostic: CY4PR1201MB0216:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:6790; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NBUdOf5Qvd5SEN4nCT1DKoSmTZGmVeqbdWh0Hufy393fuO844m02XKqbkWvhybUjPnjN0SlvO8r5gZdIWqMHnwdDjLnVvZ4z9iyggVMjyWQ0grHjvH1tWsgpgUSIu+zO9SMNLhEUXCyuOKzASJiZmtbBWSTTTZpWG+PYKRYlfGXXGf04693WT+B6bCcajENWAeFx/DrgyZ3aD7LgKFqfh5yv6ktbmCl8K7XFM4RaiBrVll1Wm3JBv/gjOKeWMyF94WiLu1Pv3MyUcADjXEjCCcQQwTUeoY0xqbfSizmdMLHyxilRTDxONcNW44HCSUg61XRQ6gVx1L44V2hWMwEUexCaWMyMJI9o6BWOFrdSGnwo7xNFrmuEmOr51cGcuHQKk1MZf6V9N7pnF5qSFjnHKpOECRxvFBHmzv3oC6PD6uCfnoBuhwCbbmIhrUqiMRDgSxgNkhOhP11JIzjBJkxygafp+mrcagJeiWLt71BXSJAYiN5EOcagOv0GdXPQG3pdtky2uRTRCXluXLcBjoIKZQabQmDLc5Z9Xdl9vgUwe/spMSbutY4r3R69sRK+NsZX3OirlTs2AZ2fPCBJVN9h42Cu91qi38SaxCZkrAAxZwaPg4kXcCislJuA6Rql4Uk+42lnFSvWTc/vBBB3igRYeIaKf7ruclUlvb0RWkCrU59Bpjgl/UK/xzQoAsMGmw00FDAmDqZyxqOmEPJqEKaH2t73CFZOvzw526CwDyGdazwqCYu2Z7A8Cw0DhDa1iSKyJl1LXofFNfb7lTzOyruGnXZGMsI58ufAQ8Um9WyV7EYEa+/pkmq+seLFHUUSumEZdhBOt36PlNHErkXVFwz3mQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(36860700001)(5660300002)(81166007)(83380400001)(82310400004)(2906002)(356005)(40460700003)(966005)(6666004)(70206006)(16526019)(336012)(186003)(426003)(1076003)(26005)(70586007)(47076005)(2616005)(316002)(36756003)(110136005)(8936002)(7696005)(508600001)(4326008)(8676002)(54906003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:31.4945 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c4457a25-27b7-4463-7d19-08d9e73dac23 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT007.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR1201MB0216 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K Add support for address translation on Data Fabric version 3.5. Add new data fabric ops and interleaving modes. Also, adjust how the DRAM address maps are found early in the translation for certain cases. Signed-off-by: Muralidhara M K Co-developed-by: Yazen Ghannam Signed-off-by: Yazen Ghannam Signed-off-by: Naveen Krishna Chatradhi --- Link: https://lkml.kernel.org/r/20211028175728.121452-34-yazen.ghannam@amd.com v3->v7: * Was in patch 33 in v3. v2->v3: * New in v3. Original version at link above. * Squashed the following into this patch: https://lkml.kernel.org/r/20210630152828.162659-8-nchatrad@amd.com * Dropped "df_regs" use. * Set "df_ops" during module init. * Dropped hard-coded Node ID values. drivers/edac/amd64_edac.c | 216 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 209 insertions(+), 7 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 8f2b5c8be651..6e0d617fd95f 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1004,6 +1004,7 @@ static int sys_addr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr) /* * Glossary of acronyms used in address translation for Zen-based systems * + * CCM = Cache Coherent Master * COD = Cluster-on-Die * CS = Coherent Slave * DF = Data Fabric @@ -1084,9 +1085,14 @@ enum intlv_modes { NOHASH_2CH = 0x01, NOHASH_4CH = 0x03, NOHASH_8CH = 0x05, + NOHASH_16CH = 0x07, + NOHASH_32CH = 0x08, HASH_COD4_2CH = 0x0C, HASH_COD2_4CH = 0x0D, HASH_COD1_8CH = 0x0E, + HASH_8CH = 0x1C, + HASH_16CH = 0x1D, + HASH_32CH = 0x1E, DF2_HASH_2CH = 0x21, }; @@ -1100,6 +1106,7 @@ struct addr_ctx { u32 reg_limit_addr; u32 reg_fab_id_mask0; u32 reg_fab_id_mask1; + u32 reg_fab_id_mask2; u16 cs_fabric_id; u16 die_id_mask; u16 socket_id_mask; @@ -1436,6 +1443,141 @@ struct data_fabric_ops df3_ops = { .get_component_id_mask = get_component_id_mask_df3, }; +static int dehash_addr_df35(struct addr_ctx *ctx) +{ + u8 hashed_bit, intlv_ctl_64k, intlv_ctl_2M, intlv_ctl_1G; + u8 num_intlv_bits = ctx->intlv_num_chan; + u32 i; + + /* Read D18F0x3F8 (DfGlobalCtrl). */ + if (df_indirect_read_broadcast(0, 0, 0x3F8, &ctx->tmp)) + return -EINVAL; + + intlv_ctl_64k = !!((ctx->tmp >> 20) & 0x1); + intlv_ctl_2M = !!((ctx->tmp >> 21) & 0x1); + intlv_ctl_1G = !!((ctx->tmp >> 22) & 0x1); + + /* + * CSSelect[0] = XOR of addr{8, 16, 21, 30}; + * CSSelect[1] = XOR of addr{9, 17, 22, 31}; + * CSSelect[2] = XOR of addr{10, 18, 23, 32}; + * CSSelect[3] = XOR of addr{11, 19, 24, 33}; - 16 and 32 channel only + * CSSelect[4] = XOR of addr{12, 20, 25, 34}; - 32 channel only + */ + for (i = 0; i < num_intlv_bits; i++) { + hashed_bit = ((ctx->ret_addr >> (8 + i)) ^ + ((ctx->ret_addr >> (16 + i)) & intlv_ctl_64k) ^ + ((ctx->ret_addr >> (21 + i)) & intlv_ctl_2M) ^ + ((ctx->ret_addr >> (30 + i)) & intlv_ctl_1G)); + + hashed_bit &= BIT(0); + if (hashed_bit != ((ctx->ret_addr >> (8 + i)) & BIT(0))) + ctx->ret_addr ^= BIT(8 + i); + } + + return 0; +} + +static int get_intlv_mode_df35(struct addr_ctx *ctx) +{ + ctx->intlv_mode = (ctx->reg_base_addr >> 2) & 0x1F; + + if (ctx->intlv_mode == HASH_COD4_2CH || + ctx->intlv_mode == HASH_COD2_4CH || + ctx->intlv_mode == HASH_COD1_8CH) { + ctx->make_space_for_cs_id = make_space_for_cs_id_cod_hash; + ctx->insert_cs_id = insert_cs_id_cod_hash; + ctx->dehash_addr = dehash_addr_df3; + } else { + ctx->make_space_for_cs_id = make_space_for_cs_id_simple; + ctx->insert_cs_id = insert_cs_id_simple; + ctx->dehash_addr = dehash_addr_df35; + } + + return 0; +} + +static void get_intlv_num_dies_df35(struct addr_ctx *ctx) +{ + ctx->intlv_num_dies = (ctx->reg_base_addr >> 7) & 0x1; +} + +static u8 get_die_id_shift_df35(struct addr_ctx *ctx) +{ + return ctx->node_id_shift; +} + +static u8 get_socket_id_shift_df35(struct addr_ctx *ctx) +{ + return (ctx->reg_fab_id_mask1 >> 8) & 0xF; +} + +static int get_masks_df35(struct addr_ctx *ctx) +{ + /* Read D18F1x150 (SystemFabricIdMask0). */ + if (df_indirect_read_broadcast(0, 1, 0x150, &ctx->reg_fab_id_mask0)) + return -EINVAL; + + /* Read D18F1x154 (SystemFabricIdMask1) */ + if (df_indirect_read_broadcast(0, 1, 0x154, &ctx->reg_fab_id_mask1)) + return -EINVAL; + + /* Read D18F1x158 (SystemFabricIdMask2) */ + if (df_indirect_read_broadcast(0, 1, 0x158, &ctx->reg_fab_id_mask2)) + return -EINVAL; + + ctx->node_id_shift = ctx->reg_fab_id_mask1 & 0xF; + + ctx->die_id_mask = ctx->reg_fab_id_mask2 & 0xFFFF; + + ctx->socket_id_mask = (ctx->reg_fab_id_mask2 >> 16) & 0xFFFF; + + return 0; +} + +static u16 get_dst_fabric_id_df35(struct addr_ctx *ctx) +{ + return ctx->reg_limit_addr & 0xFFF; +} + +static int get_cs_fabric_id_df35(struct addr_ctx *ctx) +{ + u16 nid = ctx->nid; + + /* Special handling for GPU nodes.*/ + if (nid >= amd_nb_num()) { + if (get_umc_to_cs_mapping(ctx)) + return -EINVAL; + + /* Need to convert back to the hardware-provided Node ID. */ + nid -= amd_nb_num(); + nid += amd_gpu_node_start_id(); + } + + ctx->cs_fabric_id = ctx->inst_id | (nid << ctx->node_id_shift); + + return 0; +} + +static u16 get_component_id_mask_df35(struct addr_ctx *ctx) +{ + return ctx->reg_fab_id_mask0 & 0xFFFF; +} + +struct data_fabric_ops df3point5_ops = { + .get_hi_addr_offset = get_hi_addr_offset_df3, + .get_intlv_mode = get_intlv_mode_df35, + .get_intlv_addr_sel = get_intlv_addr_sel_df3, + .get_intlv_num_dies = get_intlv_num_dies_df35, + .get_intlv_num_sockets = get_intlv_num_sockets_df3, + .get_masks = get_masks_df35, + .get_die_id_shift = get_die_id_shift_df35, + .get_socket_id_shift = get_socket_id_shift_df35, + .get_dst_fabric_id = get_dst_fabric_id_df35, + .get_cs_fabric_id = get_cs_fabric_id_df35, + .get_component_id_mask = get_component_id_mask_df35, +}; + struct data_fabric_ops *df_ops; static int get_blk_inst_cnt(struct addr_ctx *ctx) @@ -1534,8 +1676,17 @@ static void get_intlv_num_chan(struct addr_ctx *ctx) break; case NOHASH_8CH: case HASH_COD1_8CH: + case HASH_8CH: ctx->intlv_num_chan = 3; break; + case NOHASH_16CH: + case HASH_16CH: + ctx->intlv_num_chan = 4; + break; + case NOHASH_32CH: + case HASH_32CH: + ctx->intlv_num_chan = 5; + break; default: /* Valid interleaving modes where checked earlier. */ break; @@ -1642,6 +1793,44 @@ static int addr_over_limit(struct addr_ctx *ctx) return 0; } +static int find_ccm_instance_id(struct addr_ctx *ctx) +{ + u32 temp; + + for (ctx->inst_id = 0; ctx->inst_id < ctx->num_blk_instances; ctx->inst_id++) { + /* Read D18F0x44 (FabricBlockInstanceInformation0). */ + if (df_indirect_read_instance(0, 0, 0x44, ctx->inst_id, &temp)) + return -EINVAL; + + if (temp == 0) + continue; + + if ((temp & 0xF) == 0) + return 0; + } + + return -EINVAL; +} + +#define DF_NUM_DRAM_MAPS_AVAILABLE 16 +static int find_map_reg_by_dstfabricid(struct addr_ctx *ctx) +{ + u16 node_id_mask = (ctx->reg_fab_id_mask0 >> 16) & 0xFFFF; + u16 dst_fabric_id; + + for (ctx->map_num = 0; ctx->map_num < DF_NUM_DRAM_MAPS_AVAILABLE ; ctx->map_num++) { + if (get_dram_addr_map(ctx)) + continue; + + dst_fabric_id = df_ops->get_dst_fabric_id(ctx); + + if ((dst_fabric_id & node_id_mask) == (ctx->cs_fabric_id & node_id_mask)) + return 0; + } + + return -EINVAL; +} + static int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { struct addr_ctx ctx; @@ -1659,6 +1848,9 @@ static int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr ctx.nid = nid; ctx.inst_id = umc; + if (df_ops == &df3point5_ops) + ctx.late_hole_remove = true; + if (get_blk_inst_cnt(&ctx)) { pr_debug("Failed to get Block Instance Count"); return -EINVAL; @@ -1674,14 +1866,22 @@ static int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr return -EINVAL; } - if (remove_dram_offset(&ctx)) { - pr_debug("Failed to remove DRAM offset"); - return -EINVAL; - } + if (ctx.nid >= amd_nb_num()) { + if (find_ccm_instance_id(&ctx)) + return -EINVAL; - if (get_dram_addr_map(&ctx)) { - pr_debug("Failed to get DRAM address map"); - return -EINVAL; + if (find_map_reg_by_dstfabricid(&ctx)) + return -EINVAL; + } else { + if (remove_dram_offset(&ctx)) { + pr_debug("Failed to remove DRAM offset"); + return -EINVAL; + } + + if (get_dram_addr_map(&ctx)) { + pr_debug("Failed to get DRAM address map"); + return -EINVAL; + } } if (df_ops->get_intlv_mode(&ctx)) { @@ -4756,12 +4956,14 @@ static void per_family_init(struct amd64_pvt *pvt) pvt->ops->populate_csrows = gpu_init_csrows; pvt->ops->get_umc_err_info = gpu_update_umc_err_info; pvt->ops->find_umc_inst_id = gpu_df_inst_id; + df_ops = &df3point5_ops; goto end_fam; } else { pvt->ctl_name = "F19h_M30h"; pvt->f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0; pvt->f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6; pvt->max_mcs = 8; + df_ops = &df3point5_ops; } } else { pvt->ctl_name = "F19h"; From patchwork Thu Feb 3 17:49:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12734498 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78E91C43217 for ; Thu, 3 Feb 2022 17:51:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353144AbiBCRvD (ORCPT ); Thu, 3 Feb 2022 12:51:03 -0500 Received: from mail-bn8nam12on2083.outbound.protection.outlook.com ([40.107.237.83]:9307 "EHLO NAM12-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1353130AbiBCRug (ORCPT ); Thu, 3 Feb 2022 12:50:36 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=a15spsYnkd3GpKAN6Gf0iCp2oZNSN5/QJo6oIe7xmfoO3XwdrnSJsnyjsVIWsxJ5AJA1TwYiLQSiKKNE3VOLfyyiKJZNl0FeIECjoleq4JqcXnpSRjwGrbWNfJOM0UHuWRh3R/rvn/lG36z4zUYkAQZ35zsdHg0/jmGxbXkvC5DpHjScw0KE7t85Azk17zfDqv6JjvElKyQsphi4ONj+G6opxxG6RuN4kvyz2jYzA56y/DuFWLGG8V3V1IhOyudObhegu/bPRRF0fp1pSoWpseTzD4PSBZjpZslzZk2toZbKTYRAlRxSf1DEeLyvVcpxmKNMrhw8e33+MahVxKS81A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I8q72zyNt/6inybY3OwfW6Uw+5CoELZtmwbgqCYs0Nw=; b=hyC6xAJBaZezBMuyneqUgN83jbCRtwTdpB2oe2etF8dFuo98erdo3WjuDJSfkEGulJCTxdpQ9Qs1G1ib/7lBI1d4T7z1Mots6rssQkH9DGPsFWzxmrlrp4Z0u1qW/auzs6HMeQp93eLmtDykKPMe+i6ERrO3QXv/hR3tYigEiyh53f+hyxhtI/2mUM3cziasqmFsT1scCjooZ19NzxHGEVlESIch8Lh2/teqb85bY7PdLPh45R5vo8mFaL2zssvFdje2YPve4WSQmuyBKOJzobYuyXo95NwXfaUhAxx5DjLVyY4kaEyNy+JcGE1GPWdvhC9svj6Vw2PtoycYdbFW/A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=I8q72zyNt/6inybY3OwfW6Uw+5CoELZtmwbgqCYs0Nw=; b=Vrm6m7Bzlffd5NvjypPRHbTPuLQvx7rthDEgHIaTM8ipuTgbVNjRqZiSI6+laQFaEToIvg29ZBOhuWibb3rag1PV+a5UwLpJVNHVoHuWE1A8QlED3G4YCiH6hUmqo7W8zGtB8vC4n0pjJIiH+HBRV3XbXUHRT1ZGSiDDIM7ftmU= Received: from MW4PR04CA0326.namprd04.prod.outlook.com (2603:10b6:303:82::31) by DM6PR12MB2988.namprd12.prod.outlook.com (2603:10b6:5:3d::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4930.18; Thu, 3 Feb 2022 17:50:32 +0000 Received: from CO1NAM11FT019.eop-nam11.prod.protection.outlook.com (2603:10b6:303:82:cafe::39) by MW4PR04CA0326.outlook.office365.com (2603:10b6:303:82::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT019.mail.protection.outlook.com (10.13.175.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4951.12 via Frontend Transport; Thu, 3 Feb 2022 17:50:32 +0000 Received: from node-bp128-r03d.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 3 Feb 2022 11:50:29 -0600 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi Subject: [PATCH v7 12/12] EDAC/amd64: Add fixed UMC to CS mapping Date: Thu, 3 Feb 2022 11:49:42 -0600 Message-ID: <20220203174942.31630-13-nchatrad@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220203174942.31630-1-nchatrad@amd.com> References: <20220203174942.31630-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: e317c230-d95b-49b5-2b6e-08d9e73dac83 X-MS-TrafficTypeDiagnostic: DM6PR12MB2988:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:561; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xIBPVE+x9LofRxAREc9lqqHc0UyT72L8SkuN84EoB9Z+VV4qhk5KQ7696oDUTWmFTGOn/UJnwVA+MjFFRt38HosrTX7o/JMq7c02PhprgXEbMgS/oQZtXRBRQ5A2257qsbh85UWUZdf/EXC1v1iLpWtnR3xaps4M5opn+fmb+jSp7B7gNbwxnbzWAXggrdEMT1Js0JvfFwoMqRiwceSLeKjs3qR9dV36JILHJ8opzr8EMUuliFV7iBEf4fHL1fN2aT7vnxl1fuFnu6dxNVB3XYRrnY4iDW9p51LASDspf8tNR4+ONWNR40eT8XZX/9SvB5D4WMux8IMH0s0QClfC+vbHUMPzSLJgqYy722KVsQbyCz/6gHxcc90R7elAbyFr6f2L+2hILeGcFsr6qUhI1vW2gySGaCDh2oMeI66m9ZrSQJtlvLkMKb7cRLZpamem2reiGgxg34PEjsu6DG533XTCgdXqEoOz35rIpD0f1S4NFZrH1X4eFkqjtACyzIdHz63jRbN8PltUZQiUQmG3TkzJcuKeQWTbc9CJuVYhnM/gAMnFijWiB/PzN8M/6fJNzKEBeU1bMrYCBUvHVAS0I4LxHjMpB3FvSLrGUoqlF7QEMSUmf/pQX+GIDSBpkeMs+0nEFborT/JHd4gPkXq+xEz29H0vr8gaT4rcbk/WLKcNPXgEUgObKqLiOG+NUwcnJY2IFtH0zgH0ML5TF3ru6HFj7lTYaG+Z7vgvzJGbzXzs+hFpoj502RtQPR5eatUjr9I9HvAnyGwOCZvMDD/N57r8LdOlKjs7xKQZOfebGZE= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(46966006)(40470700004)(36840700001)(70206006)(70586007)(4326008)(47076005)(7696005)(8676002)(40460700003)(81166007)(508600001)(36756003)(82310400004)(966005)(36860700001)(356005)(8936002)(83380400001)(316002)(2906002)(1076003)(6666004)(54906003)(2616005)(16526019)(426003)(336012)(110136005)(186003)(26005)(5660300002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2022 17:50:32.1059 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e317c230-d95b-49b5-2b6e-08d9e73dac83 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT019.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB2988 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Yazen Ghannam GPU memory address mapping entries in Aldebaran will enable on which channel the error occurred. Aldebaran has 2 dies and are enumerated alternatively * die0's are enumerated as node 2, 4, 6 and 8 * die1's are enumerated as node 1, 3, 5 and 7 Signed-off-by: Yazen Ghannam Signed-off-by: Naveen Krishna Chatradhi --- Link: v3->v7: * Split and fixed UMC to CS mapping from patch 33 in v3. https://patchwork.kernel.org/project/linux-edac/patch/20211028175728.121452-34-yazen.ghannam@amd.com/ drivers/edac/amd64_edac.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 6e0d617fd95f..e0f9f3a4fff8 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1540,6 +1540,36 @@ static u16 get_dst_fabric_id_df35(struct addr_ctx *ctx) return ctx->reg_limit_addr & 0xFFF; } +/* UMC to CS mapping for Aldebaran die[0]s */ +u8 umc_to_cs_mapping_aldebaran_die0[] = { 28, 20, 24, 16, 12, 4, 8, 0, + 6, 30, 2, 26, 22, 14, 18, 10, + 19, 11, 15, 7, 3, 27, 31, 23, + 9, 1, 5, 29, 25, 17, 21, 13}; + +/* UMC to CS mapping for Aldebaran die[1]s */ +u8 umc_to_cs_mapping_aldebaran_die1[] = { 19, 11, 15, 7, 3, 27, 31, 23, + 9, 1, 5, 29, 25, 17, 21, 13, + 28, 20, 24, 16, 12, 4, 8, 0, + 6, 30, 2, 26, 22, 14, 18, 10}; + +int get_umc_to_cs_mapping(struct addr_ctx *ctx) +{ + if (ctx->inst_id >= sizeof(umc_to_cs_mapping_aldebaran_die0)) + return -EINVAL; + + /* + * Aldebaran has 2 dies and are enumerated alternatively + * die0's are enumerated as node 2, 4, 6 and 8 + * die1's are enumerated as node 1, 3, 5 and 7 + */ + if (ctx->nid % 2) + ctx->inst_id = umc_to_cs_mapping_aldebaran_die1[ctx->inst_id]; + else + ctx->inst_id = umc_to_cs_mapping_aldebaran_die0[ctx->inst_id]; + + return 0; +} + static int get_cs_fabric_id_df35(struct addr_ctx *ctx) { u16 nid = ctx->nid;