From patchwork Thu Sep 12 13:38:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alireza Sanaee X-Patchwork-Id: 13802177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2EB09EED608 for ; Thu, 12 Sep 2024 13:39:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sok2H-0001Vv-RY; Thu, 12 Sep 2024 09:39:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sok2G-0001UX-0i; Thu, 12 Sep 2024 09:39:00 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sok2D-0008Pk-Ik; Thu, 12 Sep 2024 09:38:59 -0400 Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4X4JLV1hn9z6K6lM; Thu, 12 Sep 2024 21:33:54 +0800 (CST) Received: from frapeml500003.china.huawei.com (unknown [7.182.85.28]) by mail.maildlp.com (Postfix) with ESMTPS id F149A1400C9; Thu, 12 Sep 2024 21:38:44 +0800 (CST) Received: from a2303103017.china.huawei.com (10.48.147.42) by frapeml500003.china.huawei.com (7.182.85.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 12 Sep 2024 15:38:43 +0200 To: , CC: , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 0/5] Specifying cache topology on ARM Date: Thu, 12 Sep 2024 14:38:24 +0100 Message-ID: <20240912133829.400-1-alireza.sanaee@huawei.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [10.48.147.42] X-ClientProxiedBy: lhrpeml100003.china.huawei.com (7.191.160.210) To frapeml500003.china.huawei.com (7.182.85.28) Received-SPF: pass client-ip=185.176.79.56; envelope-from=alireza.sanaee@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Alireza Sanaee X-Patchwork-Original-From: Alireza Sanaee via From: Alireza Sanaee Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Specifying the cache layout in virtual machines is useful for applications and operating systems to fetch accurate information about the cache structure and make appropriate adjustments. Enforcing correct sharing information can lead to better optimizations. This patch enables the specification of cache layout through a command line parameter, building on a patch set by Intel [1,2]. It uses this set as a foundation. The ACPI/PPTT table is populated based on user-provided information and CPU topology. Example: +----------------+ +----------------+ | Socket 0 | | Socket 1 | | (L3 Cache) | | (L3 Cache) | +--------+-------+ +--------+-------+ | | +--------+--------+ +--------+--------+ | Cluster 0 | | Cluster 0 | | (L2 Cache) | | (L2 Cache) | +--------+--------+ +--------+--------+ | | +--------+--------+ +--------+--------+ +--------+--------+ +--------+----+ | Core 0 | | Core 1 | | Core 0 | | Core 1 | | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d)| +--------+--------+ +--------+--------+ +--------+--------+ +--------+----+ | | | | +--------+ +--------+ +--------+ +--------+ |Thread 0| |Thread 1| |Thread 1| |Thread 0| +--------+ +--------+ +--------+ +--------+ |Thread 1| |Thread 0| |Thread 0| |Thread 1| +--------+ +--------+ +--------+ +--------+ The following command will represent the system. ./qemu-system-aarch64 \ -machine virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluseter,smp-cache.3.cache=l3,smp-cache.3.topology=socket \ -cpu max \ -m 2048 \ -smp sockets=2,clusters=1,cores=2,threads=2 \ -kernel ./Image.gz \ -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \ -initrd rootfs.cpio.gz \ -bios ./edk2-aarch64-code.fd \ -nographic Failure cases: 1) there are cases where QEMU might not have any clusters selected in the -smp option, while user specifies caches to be shared at cluster level. In this situations, qemu returns error. 2) There are other scenarios where caches exist in systems' registers but not left unspecified by users. In this case qemu returns failure. Currently only three levels of caches are supported to be specified from the command line. However, increasing the value does not require significant changes. Further, this patch assumes l2 and l3 unified caches and does not allow l(2/3)(i/d). The level terminology is thread/core/cluster/socket right now. Here is the hierarchy assumed in this patch: Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3; [1] https://lore.kernel.org/kvm/20240908125920.1160236-1-zhao1.liu@intel.com/ [2] https://lore.kernel.org/qemu-devel/20240704031603.1744546-1-zhao1.liu@intel.com/ TODO: 1) Making the code to work with arbitrary levels 2) Separated data and instruction cache at L2 and L3. 3) Allow for different Data or Instruction only at a particular level. 4) Additional cache controls. e.g. size of L3 may not want to just match the underlying system, because only some of the associated host CPUs may be bound to this VM. 5) Add device tree related code to generate info related to caches. Depends-on: target/arm/tcg: refine cache descriptions with a wrapper Depends-on: Msg-id: 20240903144550.280-1-alireza.sanaee@huawei.com Alireza Sanaee (5): bios-tables-test: prepare to change ARM ACPI virt PPTT i386/cpu: add IsDefined flag to smp-cache property target/arm/tcg: increase cache level for cpu=max hw/acpi: add cache hierarchy node to pptt table tests/acpi/arm/virt/PPTT: update golden masters for PPTT update hw/acpi/aml-build.c | 312 +++++++++++++++++- hw/arm/virt-acpi-build.c | 137 +++++++- hw/arm/virt.c | 5 + hw/core/machine-smp.c | 2 + hw/loongarch/acpi-build.c | 3 +- include/hw/acpi/aml-build.h | 20 +- include/hw/boards.h | 1 + target/arm/tcg/cpu64.c | 13 + tests/data/acpi/aarch64/virt/PPTT | Bin 76 -> 76 bytes .../data/acpi/aarch64/virt/PPTT.acpihmatvirt | Bin 156 -> 156 bytes tests/data/acpi/aarch64/virt/PPTT.topology | Bin 336 -> 336 bytes 11 files changed, 484 insertions(+), 9 deletions(-)