From patchwork Tue Jun 25 12:58:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 13711109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 848C4C3064D for ; Tue, 25 Jun 2024 12:58:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=rCw/f2WreuFLhN721myDy+YkLeZoRUij06skazpEo+Y=; b=vrHVN2CfTRWsSOLlHryi37/RC5 FDcODiuOEj524THJXxE3tXANXj/ksA06DZGXH3WemwVtnHlI5tQ8kY6oYsGSjrQurFocsQtoEkXmt 3PwQjRXM7fgPhwmbM4vTWdDdYiYX08iW7+xswQ9RpNkNsNDKeqV2T0i/Y9cUGRIB/akqLt1vz7a8Q wH8x9VvgLeiagdHTYRM2A5TSJ+fxFgLj1DNNWTnHPBpZcu5FNRydqr+rQnTMyvDBTOUeG2w/Eawg5 o7rURT62/OY7lIG3dg8SMkW3ym5l2BDQfshTpfPRjJbyY6BVprRzss/zM1diDOb1PzpsJlwEduULd QKNLii0g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sM5ku-00000002sKp-3zMU; Tue, 25 Jun 2024 12:58:40 +0000 Received: from fanzine.igalia.com ([178.60.130.6] helo=fanzine2.igalia.com) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sM5kh-00000002sDx-0K43 for linux-arm-kernel@lists.infradead.org; Tue, 25 Jun 2024 12:58:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=rCw/f2WreuFLhN721myDy+YkLeZoRUij06skazpEo+Y=; b=FJPC5nkK+Ifqc4zRavopnnbJn+ eScvPYOBnvnASKdHTYEE8QbZSWv62nc7QqRQsw5t7tVIVknCMP2qVhGZjHPK6Juvf3qYmpzJTNU+n hxstIrap+5ErXqvXMxpD2jjn+oXNA2kvcTbpgD3ztuVuOHSGPXiSqsKx9BIfoDsJkLntTYBHKJA7N lWcIQP1eKBT4DocVhVDBs6LEGmA48Xv8wMAjSIRuGw9UtKibQctQj+n1GTXlB/YYAJfg2jJ5Uox/Z N8BSD/ItkoJMZLV1hoDl3rD8ADY4z0jIvz57Upg5naYSkxMr9EZ1lMr5Y8XIr3vRU33qlxX4Uwogb P2617/xg==; Received: from [84.69.19.168] (helo=localhost) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1sM5kU-007J4W-Vj; Tue, 25 Jun 2024 14:58:15 +0200 From: Tvrtko Ursulin To: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org, kernel-dev@igalia.com, Tvrtko Ursulin , Catalin Marinas , Will Deacon , Greg Kroah-Hartman Subject: [PATCH 0/2] NUMA emulation for arm64 Date: Tue, 25 Jun 2024 13:58:01 +0100 Message-ID: <20240625125803.38038-1-tursulin@igalia.com> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240625_055827_586727_A27BBF9D X-CRM114-Status: GOOD ( 13.43 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Tvrtko Ursulin This series adds a very simple NUMA emulation implementation and enables selecting it on arm64 platforms. Obvious question is why? Short answer - it can bring a significant performance uplift on Raspberry Pi 5. Longer answer is that splitting the physical RAM into chunks, and utilising an allocation policy such as interleaving, can enable the BCM2721 memory controller to better utilise parallelism in physical memory chip organisation. In more conrete numbers, testing with Geekbench 6 shows that splitting into four emulated NUMA nodes can uplift the single core score of the benchmark by around 6%, and the multi-core by around 18%. Code is quite simple and new functionality can be enabled using the new NUMA_EMULATION Kconfig option and then at runtime using the existing (shared with other platforms) numa=fake= kernel boot argument. Note however that the default allocation policy is not interleaving and further steps are required to "unlock" the performance uplift. Simplest method is probably to launch test programs via the "numactl --interleave=all COMMAND" wrapper, but it is also possible to change the system wide policy via systemd configuration. Cc: Catalin Marinas Cc: Will Deacon Cc: Greg Kroah-Hartman Cc: “Rafael J. Wysocki" Maíra Canal (2): numa: Add simple generic NUMA emulation arm64/numa: Add NUMA emulation for ARM64 arch/arm64/Kconfig | 10 ++++++ drivers/base/Kconfig | 7 ++++ drivers/base/Makefile | 1 + drivers/base/arch_numa.c | 6 ++++ drivers/base/numa_emulation.c | 67 +++++++++++++++++++++++++++++++++++ drivers/base/numa_emulation.h | 21 +++++++++++ 6 files changed, 112 insertions(+) create mode 100644 drivers/base/numa_emulation.c create mode 100644 drivers/base/numa_emulation.h