From patchwork Mon Mar 10 16:23:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alireza Sanaee X-Patchwork-Id: 14010399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2566AC282EC for ; Mon, 10 Mar 2025 16:24:51 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1trfvA-0007n8-3u; Mon, 10 Mar 2025 12:24:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1trfv0-0007lZ-J3; Mon, 10 Mar 2025 12:23:55 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1trfuv-0001xC-KY; Mon, 10 Mar 2025 12:23:54 -0400 Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4ZBMXl1Mw4z6L4s6; Tue, 11 Mar 2025 00:19:19 +0800 (CST) Received: from frapeml500003.china.huawei.com (unknown [7.182.85.28]) by mail.maildlp.com (Postfix) with ESMTPS id D050814011F; Tue, 11 Mar 2025 00:23:41 +0800 (CST) Received: from a2303103017.china.huawei.com (10.47.68.114) by frapeml500003.china.huawei.com (7.182.85.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Mon, 10 Mar 2025 17:23:40 +0100 To: , CC: , , , , , , , , , , , , , , , , Subject: [PATCH v8 0/6] Specifying cache topology on ARM Date: Mon, 10 Mar 2025 16:23:31 +0000 Message-ID: <20250310162337.844-1-alireza.sanaee@huawei.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [10.47.68.114] X-ClientProxiedBy: lhrpeml100005.china.huawei.com (7.191.160.25) To frapeml500003.china.huawei.com (7.182.85.28) Received-SPF: pass client-ip=185.176.79.56; envelope-from=alireza.sanaee@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Alireza Sanaee X-Patchwork-Original-From: Alireza Sanaee via From: Alireza Sanaee Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Specifying the cache layout in virtual machines is useful for applications and operating systems to fetch accurate information about the cache structure and make appropriate adjustments. Enforcing correct sharing information can lead to better optimizations. This patch enables the specification of cache layout through a command line parameter, building on a patch set by Intel [1,2,3]. It uses this set as a foundation. The device tree and ACPI/PPTT table, and device tree are populated based on user-provided information and CPU topology. Example: +----------------+ +----------------+ | Socket 0 | | Socket 1 | | (L3 Cache) | | (L3 Cache) | +--------+-------+ +--------+-------+ | | +--------+--------+ +--------+--------+ | Cluster 0 | | Cluster 0 | | (L2 Cache) | | (L2 Cache) | +--------+--------+ +--------+--------+ | | +--------+--------+ +--------+--------+ +--------+--------+ +--------+----+ | Core 0 | | Core 1 | | Core 0 | | Core 1 | | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d)| +--------+--------+ +--------+--------+ +--------+--------+ +--------+----+ | | | | +--------+ +--------+ +--------+ +--------+ |Thread 0| |Thread 1| |Thread 1| |Thread 0| +--------+ +--------+ +--------+ +--------+ |Thread 1| |Thread 0| |Thread 0| |Thread 1| +--------+ +--------+ +--------+ +--------+ The following command will represent the system relying on **ACPI PPTT tables**. ./qemu-system-aarch64 \ -machine virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluseter,smp-cache.3.cache=l3,smp-cache.3.topology=socket \ -cpu max \ -m 2048 \ -smp sockets=2,clusters=1,cores=2,threads=2 \ -kernel ./Image.gz \ -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \ -initrd rootfs.cpio.gz \ -bios ./edk2-aarch64-code.fd \ -nographic The following command will represent the system relying on **the device tree**. ./qemu-system-aarch64 \ -machine virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluseter,smp-cache.3.cache=l3,smp-cache.3.topology=socket \ -cpu max \ -m 2048 \ -smp sockets=2,clusters=1,cores=2,threads=2 \ -kernel ./Image.gz \ -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=off" \ -initrd rootfs.cpio.gz \ -nographic Failure cases: 1) There are scenarios where caches exist in systems' registers but left unspecified by users. In this case qemu returns failure. 2) SMT threads cannot share caches which is not very common. More discussions here [4]. Currently only three levels of caches are supported to be specified from the command line. However, increasing the value does not require significant changes. Further, this patch assumes l2 and l3 unified caches and does not allow l(2/3)(i/d). The level terminology is thread/core/cluster/socket right now. Hierarchy assumed in this patch: Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3; TODO: 1) Making the code to work with arbitrary levels 2) Separated data and instruction cache at L2 and L3. 3) Additional cache controls. e.g. size of L3 may not want to just match the underlying system, because only some of the associated host CPUs may be bound to this VM. [1] https://lore.kernel.org/kvm/20240908125920.1160236-1-zhao1.liu@intel.com/ [2] https://lore.kernel.org/qemu-devel/20241101083331.340178-1-zhao1.liu@intel.com/ [3] https://lore.kernel.org/qemu-devel/20250110145115.1574345-1-zhao1.liu@intel.com/ [4] https://lore.kernel.org/devicetree-spec/20250203120527.3534-1-alireza.sanaee@huawei.com/ Change Log: v7->v8: * rebase: Merge tag 'pull-nbd-2024-08-26' of https://repo.or.cz/qemu/ericb into staging * I mis-included a file in patch #4 and I removed it in this one. v6->v7: * Intel stuff got pulled up, so rebase. * added some discussions on device tree. v5->v6: * Minor bug fix. * rebase based on new Intel patchset. - https://lore.kernel.org/qemu-devel/20250110145115.1574345-1-zhao1.liu@intel.com/ v4->v5: * Added Reviewed-by tags. * Applied some comments. v3->v4: * Device tree added. Depends-on: Building PPTT with root node and identical implementation flag Depends-on: Msg-id: 20250306023342.508-1-alireza.sanaee@huawei.com Alireza Sanaee (6): target/arm/tcg: increase cache level for cpu=max arm/virt.c: add cache hierarchy to device tree bios-tables-test: prepare to change ARM ACPI virt PPTT hw/acpi/aml-build.c: add cache hierarchy to pptt table tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology Update the ACPI tables according to the acpi aml_build change, also empty bios-tables-test-allowed-diff.h. hw/acpi/aml-build.c | 205 +++++++++++- hw/arm/virt-acpi-build.c | 8 +- hw/arm/virt.c | 350 +++++++++++++++++++++ hw/cpu/core.c | 92 ++++++ hw/loongarch/virt-acpi-build.c | 2 +- include/hw/acpi/aml-build.h | 4 +- include/hw/arm/virt.h | 4 + include/hw/cpu/core.h | 27 ++ target/arm/tcg/cpu64.c | 13 + tests/data/acpi/aarch64/virt/PPTT.topology | Bin 356 -> 540 bytes tests/qtest/bios-tables-test.c | 4 + 11 files changed, 701 insertions(+), 8 deletions(-)