From patchwork Fri Aug 6 07:43:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12422787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63123C432BE for ; Fri, 6 Aug 2021 07:04:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 48B5660F22 for ; Fri, 6 Aug 2021 07:04:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243514AbhHFHEU (ORCPT ); Fri, 6 Aug 2021 03:04:20 -0400 Received: from mail-dm6nam10on2078.outbound.protection.outlook.com ([40.107.93.78]:12896 "EHLO NAM10-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S243496AbhHFHET (ORCPT ); Fri, 6 Aug 2021 03:04:19 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ofCxRrtm2/s5oo+ICfFp50wmjn5Vmxcbyw0Nl1/R3fC6HrKN3jBg6NJzVLAiTfjMzcuyEpZ1yhlBq3KaeRw5Hj3VU8RVt866oNcayDlDrfPoAe9Ku3cO5C5j81tEf+U5v2t7JDobqPKlytw4NifZOgvHI4htm97Isr39szagZrjC6IQyAPqtPbH1azJAMPO03iXI0aKn9oaIbQPOrcQxPFSx9hps8F/3yjE+OO8a+PWy0nP4WCmyzQtSuengKalIutEN/pUONaBNdeuClMF+GngkeyBtyuNY/wSeifr9QoaqmlMRUdXNifmMf72DbNkabmWgnObVSdphNlqxsu8tkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9rhjOvlxGt2o6q081FwUwKUj2M/wXvegcWEN7cmv/XQ=; b=H7H1SdtNJPyy5nIf2gKFdOtL1R+gtcmjvHf4hIRsqHtom4KnfrET6Jw7aTkA+Bs+5y3bwtRFkpY4ndl5vUQmUF1+1BqcHnh+6VaYScIBEwzn5bN+2gKg/dkIejuJwd5OoTwYYp0tZGiiDNNf2irRLx2UUylvHZGSKJoBn7q2Dp6o6ehY7bDarr/eDOZTPUZbwGwHlid7LOk8dTviejyFWXLGPXEVdLcpqH5gEY6+E49SESTLNF8RO9g4YrceFIVwuS714idjzfxZkeNaZlnuEDn0/NdhWGtXrFgc0I4JnuBjef+ufZFAsjbpOyHr5Z4Nx12BQOQjYsRwe9VAiXjmlw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9rhjOvlxGt2o6q081FwUwKUj2M/wXvegcWEN7cmv/XQ=; b=rC8+4HXo3oqteKhDNx1TnUV+SOZS+5OHwJ1eMGyEXZg5T3eEzxd+VnG86MXxNFMV4YuEMeBB+NIAJm6Y8p+OCB5+yfsCMKd+J0TcStbBSH8PRVhDH6f//33h2GCqpRxIRhgjXVzLwze12E/YSHlmzkhZnWFhc/WBFiFtI9LK4lw= Received: from DM5PR06CA0058.namprd06.prod.outlook.com (2603:10b6:3:37::20) by DM5PR12MB1355.namprd12.prod.outlook.com (2603:10b6:3:6e::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.25; Fri, 6 Aug 2021 07:04:02 +0000 Received: from DM6NAM11FT052.eop-nam11.prod.protection.outlook.com (2603:10b6:3:37:cafe::9d) by DM5PR06CA0058.outlook.office365.com (2603:10b6:3:37::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Fri, 6 Aug 2021 07:04:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT052.mail.protection.outlook.com (10.13.172.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4394.16 via Frontend Transport; Fri, 6 Aug 2021 07:04:02 +0000 Received: from SLES15.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Fri, 6 Aug 2021 02:03:59 -0500 From: Naveen Krishna Chatradhi To: , CC: , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v2 1/3] x86/amd_nb: Add support for northbridges on Aldebaran Date: Fri, 6 Aug 2021 13:13:48 +0530 Message-ID: <20210806074350.114614-2-nchatrad@amd.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210806074350.114614-1-nchatrad@amd.com> References: <20210630152828.162659-1-nchatrad@amd.com> <20210806074350.114614-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1b8bd4ea-e98c-4adc-23a4-08d958a85f2d X-MS-TrafficTypeDiagnostic: DM5PR12MB1355: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3826; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KtHc1zwkvGUYaJcNWCSh+VKJPUvmmc9nh4FfqROApWm/AiF63fmOO2Zn29ig1mlkdgQJWgjmy7Cv63XoFrarJjQaUhPa6gBfmghrz87b+Yjwsdem9ml+87Lpr33ZWvdXABpnxN6PjweoAJLUQDxE1Iyl7gzyDa9Vb4/0TzcQnwOXfj9UiMdxMO+ga9M22GpE/bTXjTx/vlw4L7qXiuZmqa6fmAAWrp2biwrRZ38aYfdOpD+USofUvHNcPo8cRZnM4zjEuuqqhuUKmXfWYSZg2S7Aa29K7PNSbxMXZZOzxb+D433qVxi6Wtb7MVu691WOgAS/9o2ddg/irii+7uDgj5vWRK+jp8kB/v5GkGJLemytDS/sSCzQXAscCI6Cv/sb98tJ+LtGEnEJPv98RaeqRvcf13hNHN4+l+LxaadPxAX5K+Kifzd1/ena+bPQxgrwYM7lPu50864bI820YEN4spyfoWfEWjiKHp8n4Ihs7opHa9q6HCARKwcuWKjYG1ab7RNeUMVVLrOnCPzfhcyEiflCxNHRvrLzAN+Timov5ntw56m/R0zivA26tV5VNKd9jHTgvLVFjSN8TKQA/K3VqgwpTqgFRhpnOBQWrBth157CCx73y5voJ46qsrl6UyTS7f/uBOsEXBJnChV80M9sgoUt/TZU2yTr2CU30kmbmWj15ta8QHNfnLAuYPkXYc9U3HkJDFRPoQtaPmSaLD9Q2r0pPv5i2QJyL3yrJN+ZBoM= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(136003)(39860400002)(346002)(376002)(396003)(46966006)(36840700001)(54906003)(1076003)(110136005)(36756003)(2616005)(426003)(70206006)(356005)(81166007)(2906002)(26005)(82740400003)(316002)(82310400003)(7696005)(336012)(36860700001)(47076005)(186003)(8936002)(4326008)(8676002)(5660300002)(70586007)(478600001)(16526019)(6666004)(83380400001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Aug 2021 07:04:02.3200 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1b8bd4ea-e98c-4adc-23a4-08d958a85f2d X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT052.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB1355 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K On newer systems the CPUs manage MCA errors reported from the GPUs. Enumerate the GPU nodes with the AMD NB framework to support EDAC. This patch adds necessary code to manage the Aldebaran nodes along with the CPU nodes. The GPU nodes are enumerated in sequential order based on the PCI hierarchy, and the first GPU node is assumed to have an "AMD Node ID" value of 8 (the second GPU node has 9, etc.). Each Aldebaran GPU package has 2 Data Fabrics, which are enumerated as 2 nodes. With this implementation detail, the Data Fabric on the GPU nodes can be accessed the same way as the Data Fabric on CPU nodes. Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Reviewed-by: Yazen Ghannam --- Changes since v1: 1. Modified the commit message and comments in the code 2. Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs" arch/x86/include/asm/amd_nb.h | 10 ++++++ arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++--- include/linux/pci_ids.h | 1 + 3 files changed, 69 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h index 00d1a400b7a1..f15247422992 100644 --- a/arch/x86/include/asm/amd_nb.h +++ b/arch/x86/include/asm/amd_nb.h @@ -79,6 +79,16 @@ struct amd_northbridge_info { #ifdef CONFIG_AMD_NB +/* + * On newer heterogeneous systems the data gabrics of the CPUs and GPUs + * are connected directly via a custom links, like is done with + * 2 socket CPU systems and also within a socket for Multi-chip Module + * (MCM) CPUs like Naples. + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value + * of 8 (the second GPU node has 9, etc.). + */ +#define NONCPU_NODE_INDEX 8 + u16 amd_nb_num(void); bool amd_nb_has_feature(unsigned int feature); struct amd_northbridge *node_to_amd_nb(int node); diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 5884dfa619ff..5597135a18b5 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -26,6 +26,8 @@ #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444 #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4 /* Protect the PCI config register pairs used for SMN. */ static DEFINE_MUTEX(smn_mutex); @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = { {} }; +static const struct pci_device_id amd_noncpu_root_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) }, + {} +}; + +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) }, + {} +}; + +static const struct pci_device_id amd_noncpu_nb_link_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) }, + {} +}; + const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = { { 0x00, 0x18, 0x20 }, { 0xff, 0x00, 0x20 }, @@ -182,11 +199,16 @@ int amd_cache_northbridges(void) const struct pci_device_id *misc_ids = amd_nb_misc_ids; const struct pci_device_id *link_ids = amd_nb_link_ids; const struct pci_device_id *root_ids = amd_root_ids; + + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids; + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids; + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids; + struct pci_dev *root, *misc, *link; struct amd_northbridge *nb; u16 roots_per_misc = 0; - u16 misc_count = 0; - u16 root_count = 0; + u16 misc_count = 0, misc_count_noncpu = 0; + u16 root_count = 0, root_count_noncpu = 0; u16 i, j; if (amd_northbridges.num) @@ -205,10 +227,16 @@ int amd_cache_northbridges(void) if (!misc_count) return -ENODEV; + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL) + misc_count_noncpu++; + root = NULL; while ((root = next_northbridge(root, root_ids)) != NULL) root_count++; + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL) + root_count_noncpu++; + if (root_count) { roots_per_misc = root_count / misc_count; @@ -222,15 +250,28 @@ int amd_cache_northbridges(void) } } - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); + if (misc_count_noncpu) { + /* + * The first non-CPU Node ID starts at 8 even if there are fewer + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb + * indexing scheme, allocate the number of GPU nodes plus 8. + * Some allocated amd_northbridge structures will go unused when + * the number of CPU nodes is less than 8, but this tradeoff is to + * keep things relatively simple. + */ + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu; + } else { + amd_northbridges.num = misc_count; + } + + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL); if (!nb) return -ENOMEM; amd_northbridges.nb = nb; - amd_northbridges.num = misc_count; link = misc = root = NULL; - for (i = 0; i < amd_northbridges.num; i++) { + for (i = 0; i < misc_count; i++) { node_to_amd_nb(i)->root = root = next_northbridge(root, root_ids); node_to_amd_nb(i)->misc = misc = @@ -251,6 +292,18 @@ int amd_cache_northbridges(void) root = next_northbridge(root, root_ids); } + if (misc_count_noncpu) { + link = misc = root = NULL; + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) { + node_to_amd_nb(i)->root = root = + next_northbridge(root, noncpu_root_ids); + node_to_amd_nb(i)->misc = misc = + next_northbridge(misc, noncpu_misc_ids); + node_to_amd_nb(i)->link = link = + next_northbridge(link, noncpu_link_ids); + } + } + if (amd_gart_present()) amd_northbridges.flags |= AMD_NB_GART; diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 4bac1831de80..d9aae90dfce9 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -554,6 +554,7 @@ #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493 #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3 #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 From patchwork Fri Aug 6 07:43:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12422789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB149C4338F for ; Fri, 6 Aug 2021 07:04:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2F2561131 for ; Fri, 6 Aug 2021 07:04:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243496AbhHFHE3 (ORCPT ); Fri, 6 Aug 2021 03:04:29 -0400 Received: from mail-dm3nam07on2082.outbound.protection.outlook.com ([40.107.95.82]:36640 "EHLO NAM02-DM3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S243526AbhHFHE2 (ORCPT ); Fri, 6 Aug 2021 03:04:28 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FNXaDEJVe06teuCfrva1s54UKQtwRXub1pHF3ZsuJD/COtyR/xGtju8lpEpBgAd0zhEGKP/L89QXxGjOMQA8bnrdcsU8p+Pz+UfxHgJ4eIenhoJ06sKp/AlAz3bVbH9G55eExFWN9lAgpMJ44cN765HQb6tIPHnFz/bhku3iQjBMbY+XCV6bUlZtoHf6XsmNf/SZfeGZzbVyHLEIAgRFTrBjdRN6yO/oIN7NQ3/ByxKufua+PUy9R4UubSbomuwuvHM0p6mVd5KA5koNVEUUurxnmO0s7i6WEPBgyn8VcFteiAN+KSF88YXqWmgLSfNBDbeRLxqRBQnvFoCAEVyXxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=g+ZUTriV9I9HO2oEZCiX9i73SP5WPlJ0et4JKh47xPo=; b=mRqH8AksUmEHA2HbhkZU7kG0uDo2BHYIHvDETPaoFXHnw1t/d5bESOONaIQfdmoyD5D6eMsYm+EsS2vk8Y7qca0Aq+9x7GR2NALS8q8mVKGKKIfE/ohNtSDVoO/t/IJEWUDiqNyzTm7wm1oAx3NFcqgRGvUVgeoPZq3EHKyPZp5GnCRpK749JSzf0kES19KRQGrXSkqWO7s2wthkTXu8RhIUMdGBxS5nQhs9ZKA+mvABEKh3VJY5VvaJLP6VcdhRbTjKmDECBGLS2QqzCemHHVlnR+0/lVrarwjTAuQ/uAWoNou24Aw0n8poNPL0csCq9703UMNC+bnl9WQU5MsA+g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=g+ZUTriV9I9HO2oEZCiX9i73SP5WPlJ0et4JKh47xPo=; b=fc7JJPiiSf7n4m16QPZKA6Qu7IQA1Ja+WNcOdrRokY0vMkAQKMSEjvgCqow3khXj2o0ybPWKiHJfHRUUgrf7iByOvufw7G72hVMzkK/q0UFvUJZgtp2mAQgcaGfhFNbWPYVzYGTvV1S1XEmrqOkr1uE0w70CMePJUDFT697NMB8= Received: from DM5PR21CA0032.namprd21.prod.outlook.com (2603:10b6:3:ed::18) by BYAPR12MB3525.namprd12.prod.outlook.com (2603:10b6:a03:13b::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.25; Fri, 6 Aug 2021 07:04:11 +0000 Received: from DM6NAM11FT050.eop-nam11.prod.protection.outlook.com (2603:10b6:3:ed:cafe::3d) by DM5PR21CA0032.outlook.office365.com (2603:10b6:3:ed::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.1 via Frontend Transport; Fri, 6 Aug 2021 07:04:11 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT050.mail.protection.outlook.com (10.13.173.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4394.16 via Frontend Transport; Fri, 6 Aug 2021 07:04:10 +0000 Received: from SLES15.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Fri, 6 Aug 2021 02:04:06 -0500 From: Naveen Krishna Chatradhi To: , CC: , , "Naveen Krishna Chatradhi" , Muralidhara M K Subject: [PATCH v2 2/3] EDAC/mce_amd: Extract node id from InstanceHi in IPID Date: Fri, 6 Aug 2021 13:13:49 +0530 Message-ID: <20210806074350.114614-3-nchatrad@amd.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210806074350.114614-1-nchatrad@amd.com> References: <20210630152828.162659-1-nchatrad@amd.com> <20210806074350.114614-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0bf0e619-ee42-46c4-aea0-08d958a86457 X-MS-TrafficTypeDiagnostic: BYAPR12MB3525: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:407; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vWuQV1Or9UHq+mv+LPdV4hZFcnI5D1cnqKwhRNnPcOknclgPKL6UELVRjFfw3csqZ6kRLamDc5uh9g6++Uz0uDVu8Ec666OLJZt4DBrAKqaIkgTahZs8+Hbo2bpmOldlLcw+DPVfWDPZPH1khEs0vM6R4Snh8UJzUn2lkBK7CQFAknppm9+CqM8uzgqwBx2WqgNs5jgHqlr/TBKUQnNhPNUl+UoWsRbtyX/wihdFooJReNVhdvTNMGXNLCxtZhQ9gSNvvQA+CTrM0nzpxT83MX+F9fP0rjxrrnmmz7jtDbscisXj1LmKL8Z2054unvgUI4VHiorF5A1WuQ1j61eE52K+qi8Ab19lRbrBXc660P56InQ4UURPIQO6lrqBGtEEYvmwP6jg3HhuDmHU05eqBNkvTcSCse4aOyPfMGTrHvPRpvTZS1xTyonnQkqivEHGhGNyH4/uRl8CtuTPoAhzta6kvDyhFdY4HxTDKnetrXoBbwC4d8DQoBAnzCq/rlKeMmsyu/WL/qxyjiVPT5wmPwI599Ae8mwjpMhlAPeO7vmqfqQdPgX3exW6oVmtuCO/e96jMYDmAZsOpRSbyoe25bwP81FIo9gTvKzmFbnsyZnFdq55Mhbj2dMswvT4/BsgKSXCb1JQmF2wcs2S9kr8Cih3N56QeWRq7edcLbdOm3DpYy6yV20+/jQrp6C2VG6ZdxRFrOpxFTIpWchZvuJEl4BasSxemrGUmaH4Ju2NsJw= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(136003)(39860400002)(346002)(396003)(376002)(46966006)(36840700001)(47076005)(83380400001)(81166007)(36860700001)(82740400003)(356005)(2906002)(82310400003)(70586007)(336012)(70206006)(8936002)(4326008)(6666004)(1076003)(54906003)(7696005)(316002)(8676002)(478600001)(110136005)(426003)(2616005)(5660300002)(36756003)(16526019)(186003)(26005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Aug 2021 07:04:10.9884 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0bf0e619-ee42-46c4-aea0-08d958a86457 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT050.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB3525 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On AMD systems with SMCA banks on NONCPU nodes, the node id information is available in MCA_IPID[47:44](InstanceIdHi). Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- Changes since v1: 1. Modified the commit message 2. rearranged the conditions before calling decode_dram_ecc() drivers/edac/mce_amd.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 27d56920b469..318b7fb715ff 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -1072,8 +1072,23 @@ static void decode_smca_error(struct mce *m) if (xec < smca_mce_descs[bank_type].num_descs) pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) - decode_dram_ecc(topology_die_id(m->extcpu), m); + if (xec == 0 && decode_dram_ecc) { + int node_id = 0; + + if (bank_type == SMCA_UMC) { + node_id = topology_die_id(m->extcpu); + } else if (bank_type == SMCA_UMC_V2) { + /* + * SMCA_UMC_V2 is used on the noncpu nodes, extract + * the node id from MCA_IPID[47:44](InstanceIdHi) + */ + node_id = ((m->ipid >> 44) & 0xF); + } else { + return; + } + + decode_dram_ecc(node_id, m); + } } static inline void amd_decode_err_code(u16 ec) From patchwork Fri Aug 6 07:43:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12422791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35AA7C4338F for ; Fri, 6 Aug 2021 07:04:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 18923611CB for ; Fri, 6 Aug 2021 07:04:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243532AbhHFHEj (ORCPT ); Fri, 6 Aug 2021 03:04:39 -0400 Received: from mail-bn8nam11on2051.outbound.protection.outlook.com ([40.107.236.51]:27616 "EHLO NAM11-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S243526AbhHFHEi (ORCPT ); Fri, 6 Aug 2021 03:04:38 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Zq7y3IbDigPJQl6ym9FvnvJxlEidC5Bom5PGzohb5zpYddYvk2/A4aPSEG/7URWvBgw1k5iSdosNY5TI3zZGtOueyWVUwvelt+PmM5I4+bdGWRYYxMzGikIle/8VhRGyyrNWkAo5eeaF8m5oNmvQk+Sz88gWx7rzHQP/N+u5U1VIRs1j5/vQuUBNc1MKpcdMq6U7ghak1+hoRBLSrGWhFVdjdgL/CZiCh5gAfF0bLeSUs7NAj3MNdL9387t47vT+xunQ0srh4MS/JrlJ4RSc3MiLxa6hKtqQ3UL+hHYHum2hYTE8yvMzZjcdSRY8z889tUHLmZKvkcukDgu+iMpyPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ixlm3OF40ktlKzOwJX7eJL+QCCfmmM6fDB9gK4hMRY4=; b=O7QNgG4QDQYzRlcTKuXF52QBq/S8j7Q2agPRn23wSDCdqplbgivsA3v1U36C/SwptaF+uA9j7gNhkSt86OK8pcxUAFYHWZ5tWHWvbAEPqY4aABzw1B7HjiQG1C8u9IrJ/hvnTcaC56VPEy3g8lB6+sSCsZ2WEXHXmjAjewxz1pCItZgpNvtYE1vhiyYEJsgL2mS53rDrO9Udht+JV4xeWgo1wm8CUwCOhmT0tl8Pcs+uSQGSyXPdlmqdHyi1yMrB8LhKwcCKYYR3O9I8Q++sGIpGVdDFGK/hhMo6Vh5k84PKnRFbBGLAPMWz5x2z85W1R/9RxUiJlVnTk3rwAwYbqA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ixlm3OF40ktlKzOwJX7eJL+QCCfmmM6fDB9gK4hMRY4=; b=w6jttSAb8SBWLiMy+3XPxu2vcxeiXJFdZ1IGGNJvgXNengug0+3F7Q6RnlR9stu3dmboDuxM96gu81Xl4Z5zGTATn761LbPtTJuD+8LoMyWBcgSUHzoYFTTav9BFpCzJjuNimxGUs4qzvvaCffp/0OpAvKXGq3MfpIqmUtg48Ig= Received: from DM5PR16CA0017.namprd16.prod.outlook.com (2603:10b6:3:c0::27) by DM5PR12MB2344.namprd12.prod.outlook.com (2603:10b6:4:b2::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.25; Fri, 6 Aug 2021 07:04:21 +0000 Received: from DM6NAM11FT021.eop-nam11.prod.protection.outlook.com (2603:10b6:3:c0:cafe::e5) by DM5PR16CA0017.outlook.office365.com (2603:10b6:3:c0::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Fri, 6 Aug 2021 07:04:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT021.mail.protection.outlook.com (10.13.173.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4394.16 via Frontend Transport; Fri, 6 Aug 2021 07:04:21 +0000 Received: from SLES15.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Fri, 6 Aug 2021 02:04:12 -0500 From: Naveen Krishna Chatradhi To: , CC: , , "Naveen Krishna Chatradhi" , Muralidhara M K Subject: [PATCH v2 3/3] EDAC/amd64: Enumerate memory on noncpu nodes Date: Fri, 6 Aug 2021 13:13:50 +0530 Message-ID: <20210806074350.114614-4-nchatrad@amd.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210806074350.114614-1-nchatrad@amd.com> References: <20210630152828.162659-1-nchatrad@amd.com> <20210806074350.114614-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 640ff3c2-b860-4c78-257d-08d958a86a75 X-MS-TrafficTypeDiagnostic: DM5PR12MB2344: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:210; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gkd+7geSIvrCypP1QalGSayPBECa9KKHQOR7vNR5plE/Jgb0ixKsRib3LGPiI254fW/Bw5eVSVbH8+Vkb2mm63/odXG4jb2tYCTFdiwO/gLx395x4A7p4rqhOBwjI1uJ/5+EEzlT5E8cVr//Z+d6KYaJYoLYrUNJ/+4qaOkDC+OAfv4jxiTOYknhNo7sOaLe63UDI9XR3NivhxgIGyCkSppeB7vdqqNT9cxmibrG3huPAz52ViAkBgOBtP9pbBHlsvSvOL1aZd6MShyfOGB4ZznAZwRgTv7oijj+1dsAhSL/zy5xjH+t6v/lOz+5rQPS32XpTG5Kfh9dLSjyuT75aqaSWWqfS1RPvmxOoK3P1fbc1dFIE7LpEoUcR+fojpi5FpB31/C/xTv1EvupZMYYQxWUBltnJTTh2an/b8UxX6lEY7g5ApIMqVA4U78T06K1y/NTmdeV4kKSeJ/VU1xIEC8vNtY8/zs+AX8uikjzPWRXcnncs0Lu2k8W+prETQEdUX4b4w3SIT9QM/sUE0QarTW/+sBT95kBseTgUQq3AOAh/Q3/r950Z/G/pTdx+mGirkgyiugGbmU6E54pbtF6EW33L/N4IqEgEvApE29OsVLPc8x+yT14wzXM+WLq1NoSXJG6jwdmkxFuDDp7XBOCkZzNZYrhnQyLEa/VPsLHrAmf7P6Q0iVFb0SXzMWaww1g3zXG4XUagT/WzaTb/4o+3cTIzhQJh1yyAPhbNM65SAM= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(376002)(346002)(396003)(39860400002)(136003)(46966006)(36840700001)(4326008)(5660300002)(30864003)(70206006)(336012)(6666004)(7696005)(70586007)(1076003)(478600001)(426003)(26005)(82740400003)(8936002)(36756003)(54906003)(2906002)(81166007)(2616005)(83380400001)(186003)(356005)(47076005)(36860700001)(8676002)(110136005)(16526019)(82310400003)(316002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Aug 2021 07:04:21.2488 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 640ff3c2-b860-4c78-257d-08d958a86a75 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT021.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB2344 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On newer heterogeneous systems from AMD with GPU nodes interfaced with HBM2 memory are connected to the CPUs via custom links. This patch modifies the amd64_edac module to handle the HBM memory enumeration leveraging the existing edac and the amd64 specific data structures. This patch does the following for non-cpu nodes: 1. Define PCI IDs and ops for Aldeberarn GPUs in family_types array. 2. The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels connected to HBMs are enumerated as ranks. 3. Define a function to find the UMCv2 channel number 4. Define a function to calculate base address of the UMCv2 registers 5. Add debug information for UMCv2 channel registers. Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- Changes since v1: 1. Modifed the commit message 2. Change the edac_cap 3. kept sizes of both cpu and noncpu together 4. return success if the !F3 condition true and remove unnecessary validation 5. declared is_noncpu as bool 6. modified the condition from channel/4 to channel>=4 7. Rearranged debug information for noncpu umcch registers drivers/edac/amd64_edac.c | 202 +++++++++++++++++++++++++++++++++----- drivers/edac/amd64_edac.h | 27 +++++ 2 files changed, 202 insertions(+), 27 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index b03c33240238..2dd77a828394 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1979,6 +1979,9 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt) if (umc_en_mask == dimm_ecc_en_mask) edac_cap = EDAC_FLAG_SECDED; + + if (pvt->is_noncpu) + edac_cap = EDAC_FLAG_SECDED; } else { bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F) ? 19 @@ -2037,6 +2040,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) { int cs_mode = 0; + if (pvt->is_noncpu) + return CS_EVEN_PRIMARY | CS_ODD_PRIMARY; + if (csrow_enabled(2 * dimm, ctrl, pvt)) cs_mode |= CS_EVEN_PRIMARY; @@ -2056,6 +2062,15 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl); + if (pvt->is_noncpu) { + cs_mode = f17_get_cs_mode(cs0, ctrl, pvt); + for_each_chip_select(cs0, ctrl, pvt) { + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0); + amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0); + } + return; + } + for (dimm = 0; dimm < 2; dimm++) { cs0 = dimm * 2; cs1 = dimm * 2 + 1; @@ -2080,10 +2095,15 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt) umc_base = get_umc_base(i); umc = &pvt->umc[i]; - edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg); + if (!pvt->is_noncpu) + edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg); edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg); edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl); edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl); + if (pvt->is_noncpu) { + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i); + goto dimm_size; + } amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp); edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp); @@ -2108,6 +2128,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt) i, 1 << ((tmp >> 4) & 0x3)); } + dimm_size: debug_display_dimm_sizes_df(pvt, i); } @@ -2175,10 +2196,14 @@ static void prep_chip_selects(struct amd64_pvt *pvt) pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; } else if (pvt->fam >= 0x17) { int umc; - for_each_umc(umc) { - pvt->csels[umc].b_cnt = 4; - pvt->csels[umc].m_cnt = 2; + if (pvt->is_noncpu) { + pvt->csels[umc].b_cnt = 8; + pvt->csels[umc].m_cnt = 8; + } else { + pvt->csels[umc].b_cnt = 4; + pvt->csels[umc].m_cnt = 2; + } } } else { @@ -2187,6 +2212,31 @@ static void prep_chip_selects(struct amd64_pvt *pvt) } } +static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt) +{ + u32 base_reg, mask_reg; + u32 *base, *mask; + int umc, cs; + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR; + base = &pvt->csels[umc].csbases[cs]; + + if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *base, base_reg); + + mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK; + mask = &pvt->csels[umc].csmasks[cs]; + + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *mask, mask_reg); + } + } +} + static void read_umc_base_mask(struct amd64_pvt *pvt) { u32 umc_base_reg, umc_base_reg_sec; @@ -2247,8 +2297,12 @@ static void read_dct_base_mask(struct amd64_pvt *pvt) prep_chip_selects(pvt); - if (pvt->umc) - return read_umc_base_mask(pvt); + if (pvt->umc) { + if (pvt->is_noncpu) + return read_noncpu_umc_base_mask(pvt); + else + return read_umc_base_mask(pvt); + } for_each_chip_select(cs, 0, pvt) { int reg0 = DCSB0 + (cs * 4); @@ -2294,6 +2348,10 @@ static void determine_memory_type(struct amd64_pvt *pvt) u32 dram_ctrl, dcsm; if (pvt->umc) { + if (pvt->is_noncpu) { + pvt->dram_type = MEM_HBM2; + return; + } if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5)) pvt->dram_type = MEM_LRDDR4; else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4)) @@ -2683,7 +2741,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt) /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */ for_each_umc(i) - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT); + if (pvt->is_noncpu) + channels += pvt->csels[i].b_cnt; + else + channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT); amd64_info("MCT channel count: %d\n", channels); @@ -2824,6 +2885,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, u32 msb, weight, num_zero_bits; int dimm, size = 0; + if (pvt->is_noncpu) { + addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr]; + /* The memory channels in case of GPUs are fully populated */ + goto skip_noncpu; + } + /* No Chip Selects are enabled. */ if (!cs_mode) return size; @@ -2849,6 +2916,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, else addr_mask_orig = pvt->csels[umc].csmasks[dimm]; + skip_noncpu: /* * The number of zero bits in the mask is equal to the number of bits * in a full mask minus the number of bits in the current mask. @@ -3594,6 +3662,16 @@ static struct amd64_family_type family_types[] = { .dbam_to_cs = f17_addr_mask_to_cs_size, } }, + [ALDEBARAN_GPUS] = { + .ctl_name = "ALDEBARAN", + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0, + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6, + .max_mcs = 4, + .ops = { + .early_channel_count = f17_early_channel_count, + .dbam_to_cs = f17_addr_mask_to_cs_size, + } + }, }; /* @@ -3849,6 +3927,19 @@ static int find_umc_channel(struct mce *m) return (m->ipid & GENMASK(31, 0)) >> 20; } +/* + * The HBM memory managed by the UMCCH of the noncpu node + * can be calculated based on the [15:12]bits of IPID as follows + */ +static int find_umc_channel_noncpu(struct mce *m) +{ + u8 umc, ch; + + umc = find_umc_channel(m); + ch = ((m->ipid >> 12) & 0xf); + return umc % 2 ? (ch + 4) : ch; +} + static void decode_umc_error(int node_id, struct mce *m) { u8 ecc_type = (m->status >> 45) & 0x3; @@ -3856,6 +3947,7 @@ static void decode_umc_error(int node_id, struct mce *m) struct amd64_pvt *pvt; struct err_info err; u64 sys_addr = m->addr; + u8 umc_num; mci = edac_mc_find(node_id); if (!mci) @@ -3868,7 +3960,17 @@ static void decode_umc_error(int node_id, struct mce *m) if (m->status & MCI_STATUS_DEFERRED) ecc_type = 3; - err.channel = find_umc_channel(m); + if (pvt->is_noncpu) { + /* The UMCPHY is reported as csrow in case of noncpu nodes */ + err.csrow = find_umc_channel(m) / 2; + /* UMCCH is managing the HBM memory */ + err.channel = find_umc_channel_noncpu(m); + umc_num = err.csrow * 8 + err.channel; + } else { + err.channel = find_umc_channel(m); + err.csrow = m->synd & 0x7; + umc_num = err.channel; + } if (!(m->status & MCI_STATUS_SYNDV)) { err.err_code = ERR_SYND; @@ -3884,9 +3986,7 @@ static void decode_umc_error(int node_id, struct mce *m) err.err_code = ERR_CHANNEL; } - err.csrow = m->synd & 0x7; - - if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, err.channel)) { + if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, umc_num)) { err.err_code = ERR_NORM_ADDR; goto log_error; } @@ -4013,15 +4113,20 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) /* Read registers from each UMC */ for_each_umc(i) { + if (pvt->is_noncpu) + umc_base = get_noncpu_umc_base(i, 0); + else + umc_base = get_umc_base(i); - umc_base = get_umc_base(i); umc = &pvt->umc[i]; - - amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg); amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg); amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl); amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); - amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi); + + if (!pvt->is_noncpu) { + amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg); + amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi); + } } } @@ -4103,7 +4208,9 @@ static void read_mc_regs(struct amd64_pvt *pvt) determine_memory_type(pvt); edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); - determine_ecc_sym_sz(pvt); + /* ECC symbol size is not available on NONCPU nodes */ + if (!pvt->is_noncpu) + determine_ecc_sym_sz(pvt); } /* @@ -4191,15 +4298,21 @@ static int init_csrows_df(struct mem_ctl_info *mci) continue; empty = 0; - dimm = mci->csrows[cs]->channels[umc]->dimm; + if (pvt->is_noncpu) { + dimm = mci->csrows[umc]->channels[cs]->dimm; + dimm->edac_mode = EDAC_SECDED; + dimm->dtype = DEV_X16; + } else { + dimm = mci->csrows[cs]->channels[umc]->dimm; + dimm->edac_mode = edac_mode; + dimm->dtype = dev_type; + } edac_dbg(1, "MC node: %d, csrow: %d\n", pvt->mc_node_id, cs); dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs); dimm->mtype = pvt->dram_type; - dimm->edac_mode = edac_mode; - dimm->dtype = dev_type; dimm->grain = 64; } } @@ -4464,7 +4577,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt) umc_en_mask |= BIT(i); - if (umc->umc_cap_hi & UMC_ECC_ENABLED) + /* ECC is enabled by default on NONCPU nodes */ + if (pvt->is_noncpu || + (umc->umc_cap_hi & UMC_ECC_ENABLED)) ecc_en_mask |= BIT(i); } @@ -4500,6 +4615,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) { u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1; + if (pvt->is_noncpu) { + mci->edac_ctl_cap |= EDAC_SECDED; + return; + } + for_each_umc(i) { if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) { ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED); @@ -4530,7 +4650,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) { struct amd64_pvt *pvt = mci->pvt_info; - mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2; + if (pvt->is_noncpu) + mci->mtype_cap = MEM_FLAG_HBM2; + else + mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2; + mci->edac_ctl_cap = EDAC_FLAG_NONE; if (pvt->umc) { @@ -4635,11 +4759,24 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) fam_type = &family_types[F17_M70H_CPUS]; pvt->ops = &family_types[F17_M70H_CPUS].ops; fam_type->ctl_name = "F19h_M20h"; - break; + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) { + if (pvt->is_noncpu) { + int tmp = pvt->mc_node_id - NONCPU_NODE_INDEX; + + fam_type = &family_types[ALDEBARAN_GPUS]; + pvt->ops = &family_types[ALDEBARAN_GPUS].ops; + sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2); + fam_type->ctl_name = pvt->buf; + } else { + fam_type = &family_types[F19_CPUS]; + pvt->ops = &family_types[F19_CPUS].ops; + fam_type->ctl_name = "F19h_M30h"; + } + } else { + fam_type = &family_types[F19_CPUS]; + pvt->ops = &family_types[F19_CPUS].ops; + family_types[F19_CPUS].ctl_name = "F19h"; } - fam_type = &family_types[F19_CPUS]; - pvt->ops = &family_types[F19_CPUS].ops; - family_types[F19_CPUS].ctl_name = "F19h"; break; default: @@ -4707,9 +4844,10 @@ static int init_one_instance(struct amd64_pvt *pvt) if (pvt->channel_count < 0) return ret; + /* Define layers for CPU and NONCPU nodes */ ret = -ENOMEM; layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; - layers[0].size = pvt->csels[0].b_cnt; + layers[0].size = pvt->is_noncpu ? fam_type->max_mcs : pvt->csels[0].b_cnt; layers[0].is_virt_csrow = true; layers[1].type = EDAC_MC_LAYER_CHANNEL; @@ -4718,7 +4856,7 @@ static int init_one_instance(struct amd64_pvt *pvt) * only one channel. Also, this simplifies handling later for the price * of a couple of KBs tops. */ - layers[1].size = fam_type->max_mcs; + layers[1].size = pvt->is_noncpu ? pvt->csels[0].b_cnt : fam_type->max_mcs; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); @@ -4763,6 +4901,9 @@ static int probe_one_instance(unsigned int nid) struct ecc_settings *s; int ret; + if (!F3) + return 0; + ret = -ENOMEM; s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL); if (!s) @@ -4774,6 +4915,9 @@ static int probe_one_instance(unsigned int nid) if (!pvt) goto err_settings; + if (nid >= NONCPU_NODE_INDEX) + pvt->is_noncpu = true; + pvt->mc_node_id = nid; pvt->F3 = F3; @@ -4847,6 +4991,10 @@ static void remove_one_instance(unsigned int nid) struct mem_ctl_info *mci; struct amd64_pvt *pvt; + /* Nothing to remove for the space holder entries */ + if (!F3) + return; + /* Remove from EDAC CORE tracking list */ mci = edac_mc_del_mc(&F3->dev); if (!mci) diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 85aa820bc165..c5532a6f0c34 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -126,6 +126,8 @@ #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446 #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650 #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6 /* * Function 1 - Address Map @@ -298,6 +300,7 @@ enum amd_families { F17_M60H_CPUS, F17_M70H_CPUS, F19_CPUS, + ALDEBARAN_GPUS, NUM_FAMILIES, }; @@ -389,6 +392,9 @@ struct amd64_pvt { enum mem_type dram_type; struct amd64_umc *umc; /* UMC registers */ + char buf[20]; + + bool is_noncpu; }; enum err_codes { @@ -410,6 +416,27 @@ struct err_info { u32 offset; }; +static inline u32 get_noncpu_umc_base(u8 umc, u8 channel) +{ + /* + * On the NONCPU nodes, base address is calculated based on + * UMC channel and the HBM channel. + * + * UMC channels are selected in 6th nibble + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000; + * + * HBM channels are selected in 3rd nibble + * HBM chX[3:0]= [Y ]5X[3:0]000; + * HBM chX[7:4]= [Y+1]5X[3:0]000 + */ + umc *= 2; + + if (channel >= 4) + umc++; + + return 0x50000 + (umc << 20) + ((channel % 4) << 12); +} + static inline u32 get_umc_base(u8 channel) { /* chY: 0xY50000 */