From patchwork Fri May 19 00:04:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alison Schofield X-Patchwork-Id: 13247553 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04C96C7EE29 for ; Fri, 19 May 2023 00:05:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230151AbjESAFD (ORCPT ); Thu, 18 May 2023 20:05:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229617AbjESAFC (ORCPT ); Thu, 18 May 2023 20:05:02 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D55110C2; Thu, 18 May 2023 17:05:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684454701; x=1715990701; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BXhf4mDICnsf+kZJuoVdPXCcgcPvU9JnTs/nQQ3Cy5c=; b=i+rxIgSWoF2RRd+Gf31BLxR92AQIcLOAqwqyc1xvf36xtqLpZGCxjuG7 x6N2gQ+j601OqAaWD8X3HO3Tn7uDwRmmYjZ0NIzzB4Q1Qu6GsLBQptSQm INExh0CtUocEyUDib6+G+JWeYpwwykM2LWH2ZeAMZVAYnu5HXfWbGeMJz F5YP2gslm4EgZeKkNWgmoAgZucT+G+6yeDQQnExNrr63L4Nh4XHl+NXA2 43rxf/iuHuqgJgo49WOSiqnewIDymT6R/FbLZXVjyfJaloWUSJ2SDkOuY t6urtGvgeOVvu/ZocjVduisoSCCSyInQ3Xjwb4DaHV6tSlM7M1INlc3+8 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="355446207" X-IronPort-AV: E=Sophos;i="6.00,175,1681196400"; d="scan'208";a="355446207" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:05:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="876634923" X-IronPort-AV: E=Sophos;i="6.00,175,1681196400"; d="scan'208";a="876634923" Received: from aschofie-mobl2.amr.corp.intel.com (HELO localhost) ([10.251.20.44]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:05:00 -0700 From: alison.schofield@intel.com To: "Rafael J. Wysocki" , Len Brown , Dan Williams , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Andrew Morton , Jonathan Cameron , Dave Jiang Cc: Alison Schofield , x86@kernel.org, linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/2] x86/numa: Introduce numa_fill_memblks() Date: Thu, 18 May 2023 17:04:55 -0700 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Alison Schofield numa_fill_memblks() fills in the gaps in numa_meminfo memblks over an HPA address range. The initial use case is the ACPI driver that needs to extend SRAT defined proximity domains to an entire CXL CFMWS Window[1]. The APCI driver expects to use numa_fill_memblks() while parsing the CFMWS. Extending the memblks created during SRAT parsing, to cover the entire CFMWS Window, is desirable because everything in a CFMWS Window is expected to be of a similar performance class. Requires CONFIG_NUMA_KEEP_MEMINFO. [1] A CXL CFMWS Window represents a contiguous CXL memory resource, aka an HPA range. The CFMWS (CXL Fixed Memory Window Structure) is part of the ACPI CEDT (CXL Early Discovery Table). Signed-off-by: Alison Schofield --- arch/x86/include/asm/sparsemem.h | 2 + arch/x86/mm/numa.c | 82 ++++++++++++++++++++++++++++++++ include/linux/numa.h | 7 +++ 3 files changed, 91 insertions(+) diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h index 64df897c0ee3..1be13b2dfe8b 100644 --- a/arch/x86/include/asm/sparsemem.h +++ b/arch/x86/include/asm/sparsemem.h @@ -37,6 +37,8 @@ extern int phys_to_target_node(phys_addr_t start); #define phys_to_target_node phys_to_target_node extern int memory_add_physaddr_to_nid(u64 start); #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid +extern int numa_fill_memblks(u64 start, u64 end); +#define numa_fill_memblks numa_fill_memblks #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2aadb2019b4f..6c8f9cff71da 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -961,4 +962,85 @@ int memory_add_physaddr_to_nid(u64 start) return nid; } EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); + +static int __init cmp_memblk(const void *a, const void *b) +{ + const struct numa_memblk *ma = *(const struct numa_memblk **)a; + const struct numa_memblk *mb = *(const struct numa_memblk **)b; + + if (ma->start != mb->start) + return (ma->start < mb->start) ? -1 : 1; + + if (ma->end != mb->end) + return (ma->end < mb->end) ? -1 : 1; + + return 0; +} + +static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata; + +/** + * numa_fill_memblks - Fill gaps in numa_meminfo memblks + * @start: address to begin fill + * @end: address to end fill + * + * Find and extend numa_meminfo memblks to cover the @start/@end + * HPA address range, following these rules: + * 1. The first memblk must start at @start + * 2. The last memblk must end at @end + * 3. Fill the gaps between memblks by extending numa_memblk.end + * Result: All addresses in start/end range are included in + * numa_meminfo. + * + * RETURNS: + * 0 : Success. numa_meminfo fully describes start/end + * NUMA_NO_MEMBLK : No memblk exists in start/end range + */ + +int __init numa_fill_memblks(u64 start, u64 end) +{ + struct numa_meminfo *mi = &numa_meminfo; + struct numa_memblk **blk = &numa_memblk_list[0]; + int count = 0; + + for (int i = 0; i < mi->nr_blks; i++) { + struct numa_memblk *bi = &mi->blk[i]; + + if (start <= bi->start && end >= bi->end) { + blk[count] = &mi->blk[i]; + count++; + } + } + if (!count) + return NUMA_NO_MEMBLK; + + if (count == 1) { + blk[0]->start = start; + blk[0]->end = end; + return 0; + } + + sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL); + blk[0]->start = start; + blk[count - 1]->end = end; + + for (int i = 0, j = 1; j < count; i++, j++) { + /* Overlaps OK. sort() put the lesser end first */ + if (blk[i]->start == blk[j]->start) + continue; + + /* No gap */ + if (blk[i]->end == blk[j]->start) + continue; + + /* Fill the gap */ + if (blk[i]->end < blk[j]->start) { + blk[i]->end = blk[j]->start; + continue; + } + } + return 0; +} +EXPORT_SYMBOL_GPL(numa_fill_memblks); + #endif diff --git a/include/linux/numa.h b/include/linux/numa.h index 59df211d051f..0f512c0aba54 100644 --- a/include/linux/numa.h +++ b/include/linux/numa.h @@ -12,6 +12,7 @@ #define MAX_NUMNODES (1 << NODES_SHIFT) #define NUMA_NO_NODE (-1) +#define NUMA_NO_MEMBLK (-1) /* optionally keep NUMA memory info available post init */ #ifdef CONFIG_NUMA_KEEP_MEMINFO @@ -43,6 +44,12 @@ static inline int phys_to_target_node(u64 start) return 0; } #endif +#ifndef numa_fill_memblks +static inline int __init numa_fill_memblks(u64 start, u64 end) +{ + return NUMA_NO_MEMBLK; +} +#endif #else /* !CONFIG_NUMA */ static inline int numa_map_to_online_node(int node) { From patchwork Fri May 19 00:04:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alison Schofield X-Patchwork-Id: 13247554 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDB62C77B7D for ; Fri, 19 May 2023 00:05:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230164AbjESAFE (ORCPT ); Thu, 18 May 2023 20:05:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230056AbjESAFD (ORCPT ); Thu, 18 May 2023 20:05:03 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 383D1E77; Thu, 18 May 2023 17:05:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684454702; x=1715990702; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K3xHRy6KbwsjGx/sRwLZsoxpVJFd7CR2yUrei5ryLZ0=; b=Eb8ZRt8KiHsEN3EXssJCWv+OZ074B7MIrrnUiez0iq6VoMaZy8S/kE7/ QZwCKhrEsZGvgwGGfayPhBls5vir9tk7lNyOEAbZ1hsZrz8XOeNf6/E2S GnrRcJnxuzksS5zzXqJlygOeCGtl0iVn5T4Svw2OdtV0YsVa91BWzqDt5 I0dxomKp5E851pSIG3Js2ajdlTb3heEWqNDTk9GtREL7N7NkfgJ/pXw7a rMAk0Kuhdj/1jBhCqmNeNaah+/bNuzTxiYGDUfB10i1RBg+jeAt6dOviF jFdunp+IoLy4ERyx5XJzhPaPulgqDJ3ZTUOqTf6I+H7MuejZZ2aajrAp5 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="355446219" X-IronPort-AV: E=Sophos;i="6.00,175,1681196400"; d="scan'208";a="355446219" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:05:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="876634939" X-IronPort-AV: E=Sophos;i="6.00,175,1681196400"; d="scan'208";a="876634939" Received: from aschofie-mobl2.amr.corp.intel.com (HELO localhost) ([10.251.20.44]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 17:05:01 -0700 From: alison.schofield@intel.com To: "Rafael J. Wysocki" , Len Brown , Dan Williams , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Andrew Morton , Jonathan Cameron , Dave Jiang Cc: Alison Schofield , x86@kernel.org, linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window Date: Thu, 18 May 2023 17:04:56 -0700 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Alison Schofield Commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT") did not account for the case where the BIOS only partially describes a CFMWS Window in the SRAT. That means the omitted address ranges, of a partially described CFMWS Window, do not get assigned to a NUMA node. Replace the call to phys_to_target_node() with numa_add_memblks(). Numa_add_memblks() searches an HPA range for existing memblk(s) and extends those memblk(s) to fill the entire CFMWS Window. Extending the existing memblks is a simple strategy that reuses SRAT defined proximity domains from part of a window to fill out the entire window, based on the knowledge* that all of a CFMWS window is of a similar performance class. *Note that this heuristic will evolve when CFMWS Windows present a wider range of characteristics. The extension of the proximity domain, implemented here, is likely a step in developing a more sophisticated performance profile in the future. There is no change in behavior when the SRAT does not describe the CFMWS Window at all. In that case, a new NUMA node with a single memblk covering the entire CFMWS Window is created. Fixes: fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT") Signed-off-by: Alison Schofield --- drivers/acpi/numa/srat.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index 1f4fc5f8a819..12f330b0eac0 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -310,11 +310,16 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, start = cfmws->base_hpa; end = cfmws->base_hpa + cfmws->window_size; - /* Skip if the SRAT already described the NUMA details for this HPA */ - node = phys_to_target_node(start); - if (node != NUMA_NO_NODE) + /* + * The SRAT may have already described NUMA details for all, + * or a portion of, this CFMWS HPA range. Extend the memblks + * found for any portion of the window to cover the entire + * window. + */ + if (!numa_fill_memblks(start, end)) return 0; + /* No SRAT description. Create a new node. */ node = acpi_map_pxm_to_node(*fake_pxm); if (node == NUMA_NO_NODE) {