From patchwork Fri Jan 24 17:29:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 13949791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 51578C0218B for ; Fri, 24 Jan 2025 17:30:13 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tbNVH-0001OP-9Z; Fri, 24 Jan 2025 12:30:00 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tbNV3-0001C5-2A for qemu-devel@nongnu.org; Fri, 24 Jan 2025 12:29:50 -0500 Received: from frasgout.his.huawei.com ([185.176.79.56]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tbNUz-0000B3-R8 for qemu-devel@nongnu.org; Fri, 24 Jan 2025 12:29:44 -0500 Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4YflBL2xWzz6L55j; Sat, 25 Jan 2025 01:27:38 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 0C36C140119; Sat, 25 Jan 2025 01:29:38 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 24 Jan 2025 18:29:37 +0100 To: , , CC: =?utf-8?q?Alex_Benn=C3=A9e?= , Alexandre Iooss , Mahmoud Mandour , Pierrick Bouvier , , Niyas Sait Subject: [RFC PATCH QEMU 1/3] hw/cxl: Initial CXL Hotness Monitoring Unit Emulation Date: Fri, 24 Jan 2025 17:29:03 +0000 Message-ID: <20250124172905.84099-2-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250124172905.84099-1-Jonathan.Cameron@huawei.com> References: <20250124172905.84099-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml100004.china.huawei.com (7.191.162.219) To frapeml500008.china.huawei.com (7.182.85.71) Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron X-Patchwork-Original-From: Jonathan Cameron via From: Jonathan Cameron Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Intended to support enabling in kernel. For now this is dumb and the data made up. That will change in the near future. Instantiates 3 instances within one CHMU with separate interrupts. Signed-off-by: Jonathan Cameron --- include/hw/cxl/cxl.h | 1 + include/hw/cxl/cxl_chmu.h | 154 ++++++++++++ include/hw/cxl/cxl_device.h | 13 +- include/hw/cxl/cxl_pci.h | 7 +- hw/cxl/cxl-chmu.c | 459 ++++++++++++++++++++++++++++++++++++ hw/mem/cxl_type3.c | 25 +- hw/cxl/meson.build | 1 + 7 files changed, 655 insertions(+), 5 deletions(-) diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h index 857fa61898..bef856485f 100644 --- a/include/hw/cxl/cxl.h +++ b/include/hw/cxl/cxl.h @@ -16,6 +16,7 @@ #include "hw/pci/pci_host.h" #include "cxl_pci.h" #include "cxl_component.h" +#include "cxl_chmu.h" #include "cxl_cpmu.h" #include "cxl_device.h" diff --git a/include/hw/cxl/cxl_chmu.h b/include/hw/cxl/cxl_chmu.h new file mode 100644 index 0000000000..2de04ea605 --- /dev/null +++ b/include/hw/cxl/cxl_chmu.h @@ -0,0 +1,154 @@ +/* + * QEMU CXL Hotness Monitoring Unit + * + * Copyright (c) 2024 Huawei + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + */ + +#include "hw/register.h" + +#ifndef _CXL_CHMU_H_ +#define _CXL_CHMU_H_ + +/* Emulated parameters - arbitrary choices */ +#define CXL_CHMU_INSTANCES_PER_BLOCK 3 +#define CXL_HOTLIST_ENTRIES 1024 + /* 1TB - should be enough for anyone, right? */ +#define CXL_MAX_DRAM_CAPACITY 0x10000000000UL + +/* In instance address space */ +#define CXL_CHMU_HL_START (0x70 + (CXL_MAX_DRAM_CAPACITY / (0x10000000UL * 8))) +#define CXL_CHMU_INSTANCE_SIZE (CXL_CHMU_HL_START + CXL_HOTLIST_ENTRIES * 8) +#define CXL_CHMU_SIZE \ + (0x10 + CXL_CHMU_INSTANCE_SIZE * CXL_CHMU_INSTANCES_PER_BLOCK) + +/* + * Many of these registers are documented as being a multiple of 64 bits long. + * Reading then can only be done in 64 bit chunks though so specify them here + * as multiple registers. + */ +REG64(CXL_CHMU_COMMON_CAP0, 0x0) + FIELD(CXL_CHMU_COMMON_CAP0, VERSION, 0, 4) + FIELD(CXL_CHMU_COMMON_CAP0, NUM_INSTANCES, 8, 8) +REG64(CXL_CHMU_COMMON_CAP1, 0x8) + FIELD(CXL_CHMU_COMMON_CAP1, INSTANCE_LENGTH, 0, 16) + +/* Per instance registers for instance 0 in CHMU main address space */ +REG64(CXL_CHMU0_CAP0, 0x10) + FIELD(CXL_CHMU0_CAP0, MSI_N, 0, 4) + FIELD(CXL_CHMU0_CAP0, OVERFLOW_INT, 4, 1) + FIELD(CXL_CHMU0_CAP0, LEVEL_INT, 5, 1) + FIELD(CXL_CHMU0_CAP0, EPOCH_TYPE, 6, 2) +#define CXL_CHMU0_CAP0_EPOCH_TYPE_GLOBAL 0 +#define CXL_CHMU0_CAP0_EPOCH_TYPE_PERCNT 1 + /* Break up the Tracked M2S Request field into flags */ + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_R, 8, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_W, 9, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_RW, 10, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_ALL_R, 11, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_ALL_W, 12, 1) + FIELD(CXL_CHMU0_CAP0, TRACKED_M2S_REQ_ALL_RW, 13, 1) + + FIELD(CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_SCALE, 16, 4) +#define CXL_CHMU_EPOCH_LENGTH_SCALE_100USEC 1 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_1MSEC 2 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_10MSEC 3 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_100MSEC 4 +#define CXL_CHMU_EPOCH_LENGTH_SCALE_1SEC 5 + FIELD(CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_VAL, 20, 12) + FIELD(CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_SCALE, 32, 4) + FIELD(CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_VAL, 36, 12) + FIELD(CXL_CHMU0_CAP0, HOTLIST_SIZE, 48, 16) +REG64(CXL_CHMU0_CAP1, 0x18) + FIELD(CXL_CHMU0_CAP1, UNIT_SIZES, 0, 32) + FIELD(CXL_CHMU0_CAP1, DOWN_SAMPLING_FACTORS, 32, 16) + /* Split up Flags */ + FIELD(CXL_CHMU0_CAP1, FLAGS_EPOCH_BASED, 48, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_ALWAYS_ON, 49, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_RANDOMIZED_DOWN_SAMPLING, 50, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_OVERLAPPING_ADDRESS_RANGES, 51, 1) + FIELD(CXL_CHMU0_CAP1, FLAGS_INSERT_AFTER_CLEAR, 52, 1) +REG64(CXL_CHMU0_CAP2, 0x20) + FIELD(CXL_CHMU0_CAP2, BITMAP_REG_OFFSET, 0, 64) +REG64(CXL_CHMU0_CAP3, 0x28) + FIELD(CXL_CHMU0_CAP3, HOTLIST_REG_OFFSET, 0, 64) + +REG64(CXL_CHMU0_CONF0, 0x50) + FIELD(CXL_CHMU0_CONF0, M2S_REQ_TO_TRACK, 0, 8) + FIELD(CXL_CHMU0_CONF0, FLAGS_RANDOMIZE_DOWNSAMPLING, 8, 1) + FIELD(CXL_CHMU0_CONF0, FLAGS_INT_ON_OVERFLOW, 9, 1) + FIELD(CXL_CHMU0_CONF0, FLAGS_INT_ON_FILL_THRESH, 10, 1) + FIELD(CXL_CHMU0_CONF0, CONTROL_ENABLE, 16, 1) + FIELD(CXL_CHMU0_CONF0, CONTROL_RESET, 17, 1) + FIELD(CXL_CHMU0_CONF0, HOTNESS_THRESHOLD, 32, 32) +REG64(CXL_CHMU0_CONF1, 0x58) + FIELD(CXL_CHMU0_CONF1, UNIT_SIZE, 0, 32) + FIELD(CXL_CHMU0_CONF1, DOWN_SAMPLING_FACTOR, 32, 8) + FIELD(CXL_CHMU0_CONF1, REPORTING_MODE, 40, 8) + FIELD(CXL_CHMU0_CONF1, EPOCH_LENGTH_SCALE, 48, 4) + FIELD(CXL_CHMU0_CONF1, EPOCH_LENGTH_VAL, 52, 12) +REG64(CXL_CHMU0_CONF2, 0x60) + FIELD(CXL_CHMU0_CONF2, NOTIFICATION_THRESHOLD, 0, 16) + +REG64(CXL_CHMU0_STATUS, 0x70) + /* Break up status field into separate flags */ + FIELD(CXL_CHMU0_STATUS, STATUS_ENABLED, 0, 1) + FIELD(CXL_CHMU0_STATUS, OPERATION_IN_PROG, 16, 16) + FIELD(CXL_CHMU0_STATUS, COUNTER_WIDTH, 32, 8) + /* Break up oddly name overflow interrupt stats */ + FIELD(CXL_CHMU0_STATUS, OVERFLOW_INT, 40, 1) + FIELD(CXL_CHMU0_STATUS, LEVEL_INT, 41, 1) + +REG16(CXL_CHMU0_HEAD, 0x78) +REG16(CXL_CHMU0_TAIL, 0x7A) + +/* Provide first few of these so we can calculate the size */ +REG64(CXL_CHMU0_RANGE_CONFIG_BITMAP0, 0x80) +REG64(CXL_CHMU0_RANGE_CONFIG_BITMAP1, 0x88) + +REG64(CXL_CHMU0_HOTLIST0, CXL_CHMU_HL_START + 0x10) +REG64(CXL_CHMU0_HOTLIST1, CXL_CHMU_HL_START + 0x10) + +REG64(CXL_CHMU1_CAP0, 0x10 + CXL_CHMU_INSTANCE_SIZE) + +typedef struct CHMUState CHMUState; + +typedef struct CHMUInstance { + Object *private; + uint32_t hotness_thresh; + uint32_t unit_size; + uint8_t ds_factor; + uint16_t head, tail, fillthresh, op_in_prog; + uint8_t what; + + bool int_on_overflow; + bool int_on_fill_thresh; + bool overflow_set; + bool fill_thresh_set; + uint8_t msi_n; + + bool enabled; + uint64_t hotlist[CXL_HOTLIST_ENTRIES]; + QEMUTimer *timer; + uint32_t epoch_ms; + /* Hack for now */ + CHMUState *parent; +} CHMUInstance; + +typedef struct CHMUState { + CHMUInstance inst[CXL_CHMU_INSTANCES_PER_BLOCK]; + int socket; + /* Hack updated on first HDM decoder only */ + uint64_t base; + uint64_t size; + uint16_t port; +} CHMUState; +typedef struct cxl_device_state CXLDeviceState; +int cxl_chmu_register_block_init(Object *obj, + CXLDeviceState *cxl_dstte, + int id, uint8_t msi_n, + Error **errp); + +#endif /* _CXL_CHMU_H_ */ diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h index 04c93cd753..f855cd69d9 100644 --- a/include/hw/cxl/cxl_device.h +++ b/include/hw/cxl/cxl_device.h @@ -15,6 +15,7 @@ #include "hw/register.h" #include "hw/cxl/cxl_events.h" +#include "hw/cxl/cxl_chmu.h" #include "hw/cxl/cxl_cpmu.h" /* * The following is how a CXL device's Memory Device registers are laid out. @@ -109,12 +110,20 @@ (x) * (1 << 16), \ 1 << 16) +#define CXL_NUM_CHMU_INSTANCES 1 +#define CXL_CHMU_OFFSET(x) \ + QEMU_ALIGN_UP(CXL_MEMORY_DEVICE_REGISTERS_OFFSET + \ + CXL_MEMORY_DEVICE_REGISTERS_LENGTH + \ + (1 << 16) * CXL_NUM_CPMU_INSTANCES, \ + 1 << 16) + #define CXL_MMIO_SIZE \ QEMU_ALIGN_UP(CXL_DEVICE_CAP_REG_SIZE + \ CXL_DEVICE_STATUS_REGISTERS_LENGTH + \ CXL_MAILBOX_REGISTERS_LENGTH + \ CXL_MEMORY_DEVICE_REGISTERS_LENGTH + \ - CXL_NUM_CPMU_INSTANCES * (1 << 16), \ + CXL_NUM_CPMU_INSTANCES * (1 << 16) + \ + CXL_NUM_CHMU_INSTANCES * (1 << 16), \ (1 << 16)) /* CXL r3.1 Table 8-34: Command Return Codes */ @@ -231,6 +240,7 @@ typedef struct CXLCCI { typedef struct cxl_device_state { MemoryRegion device_registers; MemoryRegion cpmu_registers[CXL_NUM_CPMU_INSTANCES]; + MemoryRegion chmu_registers[1]; /* CXL r3.1 Section 8.2.8.3: Device Status Registers */ struct { MemoryRegion device; @@ -280,6 +290,7 @@ typedef struct cxl_device_state { const struct cxl_cmd (*cxl_cmd_set)[256]; CPMUState cpmu[CXL_NUM_CPMU_INSTANCES]; + CHMUState chmu[1]; CXLEventLog event_logs[CXL_EVENT_TYPE_MAX]; } CXLDeviceState; diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h index c54ed54a25..88a5e3958e 100644 --- a/include/hw/cxl/cxl_pci.h +++ b/include/hw/cxl/cxl_pci.h @@ -32,7 +32,7 @@ #define PCIE_CXL3_FLEXBUS_PORT_DVSEC_LENGTH 0x20 #define PCIE_CXL3_FLEXBUS_PORT_DVSEC_REVID 2 -#define REG_LOC_DVSEC_LENGTH 0x2c +#define REG_LOC_DVSEC_LENGTH 0x34 #define REG_LOC_DVSEC_REVID 0 enum { @@ -172,9 +172,9 @@ typedef struct CXLDVSECRegisterLocator { struct { uint32_t lo; uint32_t hi; - } reg_base[4]; + } reg_base[5]; } QEMU_PACKED CXLDVSECRegisterLocator; -QEMU_BUILD_BUG_ON(sizeof(CXLDVSECRegisterLocator) != 0x2C); +QEMU_BUILD_BUG_ON(sizeof(CXLDVSECRegisterLocator) != 0x34); /* BAR Equivalence Indicator */ #define BEI_BAR_10H 0 @@ -190,5 +190,6 @@ QEMU_BUILD_BUG_ON(sizeof(CXLDVSECRegisterLocator) != 0x2C); #define RBI_BAR_VIRT_ACL (2 << 8) #define RBI_CXL_DEVICE_REG (3 << 8) #define RBI_CXL_CPMU_REG (4 << 8) +#define RBI_CXL_CHMU_REG (5 << 8) #endif diff --git a/hw/cxl/cxl-chmu.c b/hw/cxl/cxl-chmu.c new file mode 100644 index 0000000000..5922d78ffc --- /dev/null +++ b/hw/cxl/cxl-chmu.c @@ -0,0 +1,459 @@ +/* + * CXL Hotness Monitoring Unit + * + * Copyright(C) 2024 Huawei + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "qemu/guest-random.h" +#include "hw/cxl/cxl.h" +#include "hw/cxl/cxl_chmu.h" + +#include "hw/pci/msi.h" +#include "hw/pci/msix.h" + +#define CHMU_HOTLIST_LENGTH 1024 + +enum chmu_consumer_request { + QUERY_TAIL, + QUERY_HEAD, + SET_HEAD, + SET_HOTLIST_SIZE, + QUERY_HOTLIST_ENTRY, + SIGNAL_EPOCH_END, + SET_ENABLED, + SET_NUMBER_GRANUALS, + SET_HPA_BASE, + SET_HPA_SIZE, +}; + +static int chmu_send(CHMUState *chmu, uint64_t instance, + enum chmu_consumer_request command, + uint64_t param, uint64_t *response) +{ + uint64_t request[3] = { instance, command, param }; + uint64_t temp; + uint64_t *reply = response ?: &temp; + int rc; + + send(chmu->socket, request, sizeof(request), 0); + rc = recv(chmu->socket, reply, sizeof(*reply), 0); + if (rc < sizeof(reply)) { + return -1; + } + return 0; +} + +static uint64_t chmu_read(void *opaque, hwaddr offset, unsigned size) +{ + CHMUState *chmu = opaque; + CHMUInstance *chmui; + uint64_t val = 0; + hwaddr chmu_stride = A_CXL_CHMU1_CAP0 - A_CXL_CHMU0_CAP0; + int instance = 0; + int rc; + + if (offset >= A_CXL_CHMU0_CAP0) { + instance = (offset - A_CXL_CHMU0_CAP0) / chmu_stride; + /* + * Offset allows register defs for CHMU instance 0 to be used + * for all instances. Includes common cap. + */ + offset -= chmu_stride * instance; + } + + if (instance >= CXL_CHMU_INSTANCES_PER_BLOCK) { + return 0; + } + + chmui = &chmu->inst[instance]; + switch (offset) { + case A_CXL_CHMU_COMMON_CAP0: + val = FIELD_DP64(val, CXL_CHMU_COMMON_CAP0, VERSION, 1); + val = FIELD_DP64(val, CXL_CHMU_COMMON_CAP0, NUM_INSTANCES, + CXL_CHMU_INSTANCES_PER_BLOCK); + break; + case A_CXL_CHMU_COMMON_CAP1: + val = FIELD_DP64(val, CXL_CHMU_COMMON_CAP1, INSTANCE_LENGTH, + A_CXL_CHMU1_CAP0 - A_CXL_CHMU0_CAP0); + break; + case A_CXL_CHMU0_CAP0: + val = FIELD_DP64(val, CXL_CHMU0_CAP0, MSI_N, chmui->msi_n); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, OVERFLOW_INT, 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, LEVEL_INT, 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, EPOCH_TYPE, + CXL_CHMU0_CAP0_EPOCH_TYPE_GLOBAL); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_R, 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_W, 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, TRACKED_M2S_REQ_NONTEE_RW, 1); + /* No emulation of TEE modes yet so don't pretend to support them */ + val = FIELD_DP64(val, CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_SCALE, + CXL_CHMU_EPOCH_LENGTH_SCALE_1SEC); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, MAX_EPOCH_LENGTH_VAL, 100); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_SCALE, + CXL_CHMU_EPOCH_LENGTH_SCALE_100MSEC); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, MIN_EPOCH_LENGTH_VAL, 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP0, HOTLIST_SIZE, + CXL_HOTLIST_ENTRIES); + break; + case A_CXL_CHMU0_CAP1: + /* 4KiB and 8KiB only */ + val = FIELD_DP64(val, CXL_CHMU0_CAP1, UNIT_SIZES, BIT(4) | BIT(5)); + /* Only support downsamp by 32 */ + val = FIELD_DP64(val, CXL_CHMU0_CAP1, DOWN_SAMPLING_FACTORS, BIT(5)); + val = FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_EPOCH_BASED, 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_ALWAYS_ON, 0); + val = FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_RANDOMIZED_DOWN_SAMPLING, + 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_OVERLAPPING_ADDRESS_RANGES, + 1); + val = FIELD_DP64(val, CXL_CHMU0_CAP1, FLAGS_INSERT_AFTER_CLEAR, 0); + break; + case A_CXL_CHMU0_CAP2: + val = FIELD_DP64(val, CXL_CHMU0_CAP2, BITMAP_REG_OFFSET, + A_CXL_CHMU0_RANGE_CONFIG_BITMAP0 - A_CXL_CHMU0_CAP0); + break; + case A_CXL_CHMU0_CAP3: + val = FIELD_DP64(val, CXL_CHMU0_CAP3, HOTLIST_REG_OFFSET, + A_CXL_CHMU0_HOTLIST0 - A_CXL_CHMU0_CAP0); + break; + case A_CXL_CHMU0_STATUS: + val = FIELD_DP64(val, CXL_CHMU0_STATUS, STATUS_ENABLED, + chmui->enabled ? 1 : 0); + val = FIELD_DP64(val, CXL_CHMU0_STATUS, OPERATION_IN_PROG, + chmui->op_in_prog); + val = FIELD_DP64(val, CXL_CHMU0_STATUS, COUNTER_WIDTH, 16); + val = FIELD_DP64(val, CXL_CHMU0_STATUS, OVERFLOW_INT, + chmui->overflow_set ? 1 : 0); + val = FIELD_DP64(val, CXL_CHMU0_STATUS, LEVEL_INT, + chmui->fill_thresh_set ? 1 : 0); + break; + case A_CXL_CHMU0_TAIL: + if (chmu->socket) { + rc = chmu_send(chmu, instance, QUERY_TAIL, 0, &val); + if (rc < 0) { + printf("Failed to read tail\n"); + return 0; + } + } else { + val = chmui->tail; + } + break; + case A_CXL_CHMU0_HEAD: + if (chmu->socket) { + rc = chmu_send(chmu, instance, QUERY_HEAD, 0, &val); + if (rc < 0) { + printf("Failed to read head\n"); + return 0; + } + } else { + val = chmui->head; + } + break; + case A_CXL_CHMU0_HOTLIST0...(8 * (A_CXL_CHMU0_HOTLIST0 + + CHMU_HOTLIST_LENGTH)): + if (chmu->socket) { + rc = chmu_send(chmu, instance, QUERY_HOTLIST_ENTRY, + (offset - A_CXL_CHMU0_HOTLIST0) / 8, &val); + if (rc < 0) { + printf("Failed to read a hotlist entry\n"); + return 0; + } + } else { + val = chmui->hotlist[(offset - A_CXL_CHMU0_HOTLIST0) / 8]; + } + break; + } + return val; +} + +static void chmu_write(void *opaque, hwaddr offset, uint64_t value, + unsigned size) +{ + CHMUState *chmu = opaque; + CHMUInstance *chmui; + hwaddr chmu_stride = A_CXL_CHMU1_CAP0 - A_CXL_CHMU0_CAP0; + int instance = 0; + int i, rc; + + if (offset >= A_CXL_CHMU0_CAP0) { + instance = (offset - A_CXL_CHMU0_CAP0) / chmu_stride; + /* offset as if in chmu0 so includes the common caps */ + offset -= chmu_stride * instance; + } + if (instance >= CXL_CHMU_INSTANCES_PER_BLOCK) { + return; + } + + chmui = &chmu->inst[instance]; + + switch (offset) { + case A_CXL_CHMU0_STATUS: + /* The interrupt fields are RW12C */ + if (FIELD_EX64(value, CXL_CHMU0_STATUS, OVERFLOW_INT)) { + chmui->overflow_set = false; + } + if (FIELD_EX64(value, CXL_CHMU0_STATUS, LEVEL_INT)) { + chmui->fill_thresh_set = false; + } + break; + case A_CXL_CHMU0_RANGE_CONFIG_BITMAP0...(A_CXL_CHMU0_HOTLIST0 - 8): + /* TODO - wire this up */ + printf("Bitmap write %lx %lx\n", + offset - A_CXL_CHMU0_RANGE_CONFIG_BITMAP0, value); + break; + case A_CXL_CHMU0_CONF0: + if (FIELD_EX64(value, CXL_CHMU0_CONF0, CONTROL_ENABLE)) { + chmui->enabled = true; + timer_mod(chmui->timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + chmui->epoch_ms); + } else { + timer_del(chmui->timer); + chmui->enabled = false; + } + if (chmu->socket) { + bool enabled = FIELD_EX64(value, CXL_CHMU0_CONF0, CONTROL_ENABLE); + + if (enabled) { + rc = chmu_send(chmu, instance, SET_HPA_BASE, chmu->base, NULL); + if (rc < 0) { + printf("Failed to set base\n"); + } + rc = chmu_send(chmu, instance, SET_HPA_SIZE, chmu->size, NULL); + if (rc < 0) { + printf("Failed to set size\n"); + } + } + rc = chmu_send(chmu, instance, SET_ENABLED, enabled ? 1 : 0, NULL); + if (rc < 0) { + printf("Failed to set enabled\n"); + } + } + + if (FIELD_EX64(value, CXL_CHMU0_CONF0, CONTROL_RESET)) { + /* TODO reset counters once implemented */ + chmui->head = 0; + chmui->tail = 0; + for (i = 0; i < CXL_HOTLIST_ENTRIES; i++) { + chmui->hotlist[i] = 0; + } + } + chmui->what = + FIELD_EX64(value, CXL_CHMU0_CONF0, M2S_REQ_TO_TRACK); + chmui->int_on_overflow = + FIELD_EX64(value, CXL_CHMU0_CONF0, FLAGS_INT_ON_OVERFLOW); + chmui->int_on_fill_thresh = + FIELD_EX64(value, CXL_CHMU0_CONF0, FLAGS_INT_ON_FILL_THRESH); + chmui->hotness_thresh = + FIELD_EX64(value, CXL_CHMU0_CONF0, HOTNESS_THRESHOLD); + break; + case A_CXL_CHMU0_CONF1: { + uint8_t scale; + uint32_t mult; + + chmui->unit_size = FIELD_EX64(value, CXL_CHMU0_CONF1, UNIT_SIZE); + chmui->ds_factor = + FIELD_EX64(value, CXL_CHMU0_CONF1, DOWN_SAMPLING_FACTOR); + + /* TODO: Sanity check value in supported range */ + scale = FIELD_EX64(value, CXL_CHMU0_CONF1, EPOCH_LENGTH_SCALE); + mult = FIELD_EX64(value, CXL_CHMU0_CONF1, EPOCH_LENGTH_VAL); + switch (scale) { + /* TODO: Implement maths, not lookup */ + case 1: /* 100usec */ + chmui->epoch_ms = mult / 10; + break; + case 2: + chmui->epoch_ms = mult; + break; + case 3: + chmui->epoch_ms = mult * 10; + break; + case 4: + chmui->epoch_ms = mult * 100; + break; + case 5: + chmui->epoch_ms = mult * 1000; + break; + default: + /* Unknown value so ignore */ + break; + } + break; + } + case A_CXL_CHMU0_CONF2: + chmui->fillthresh = FIELD_EX64(value, CXL_CHMU0_CONF2, + NOTIFICATION_THRESHOLD); + break; + case A_CXL_CHMU0_HEAD: + chmui->head = value; + if (chmu->socket) { + rc = chmu_send(chmu, instance, SET_HEAD, value, NULL); + if (rc < 0) { + printf("Failed to set head\n"); + } + } + break; + case A_CXL_CHMU0_TAIL: /* Not sure why this is writeable! */ + chmui->tail = value; + break; + } +} + +static const MemoryRegionOps chmu_ops = { + .read = chmu_read, + .write = chmu_write, + .endianness = DEVICE_LITTLE_ENDIAN, + .valid = { + .min_access_size = 1, + .max_access_size = 8, + .unaligned = false, + }, + .impl = { + .min_access_size = 4, + .max_access_size = 8, + }, +}; + +static void chmu_timer_update(void *opaque) +{ + CHMUInstance *chmui = opaque; + PCIDevice *pdev = PCI_DEVICE(chmui->private); + int i; +#define entries_to_add 167 + bool interrupt_needed = false; + bool remote = chmui->parent->socket; + + timer_del(chmui->timer); + + /* This tick is the epoch. How to handle? */ + if (remote) { + int rc; + uint64_t reply; + /* hack instance always 0! */ + rc = chmu_send(chmui->parent, 0, SIGNAL_EPOCH_END, 0, &reply); + if (rc < 0) { + printf("Epoch signalling failed\n"); + } + + rc = chmu_send(chmui->parent, 0, QUERY_TAIL, 0, &reply); + if (rc < 0) { + printf("failed to read the tail\n"); + } + chmui->tail = reply; + printf("after epoch tail is %x\n", chmui->tail); + } else { /* Fake some data if we don't have a real source */ + uint8_t rand[entries_to_add]; + + qemu_guest_getrandom_nofail(rand, sizeof(rand)); + for (i = 0; i < entries_to_add; i++) { + if ((chmui->tail + 1) % CXL_HOTLIST_ENTRIES == chmui->head) { + /* Overflow occured, drop out */ + break; + } + chmui->hotlist[chmui->tail % CXL_HOTLIST_ENTRIES] = + (chmui->tail << 16) | (chmui->hotness_thresh + rand[i]); + chmui->tail++; + chmui->tail %= CXL_HOTLIST_ENTRIES; + } + } + + /* All interrupt code is kept in here whatever the data source */ + if (chmui->int_on_fill_thresh && !chmui->fill_thresh_set) { + if (((chmui->tail > chmui->head) && + (chmui->tail - chmui->head > chmui->fillthresh)) | + ((chmui->tail < chmui->head) && + (CXL_HOTLIST_ENTRIES - chmui->head + chmui->tail > + chmui->fillthresh))) { + chmui->fill_thresh_set = true; + interrupt_needed = true; + } + } + if (chmui->int_on_overflow && !chmui->overflow_set) { + if ((chmui->tail + 1) % CXL_HOTLIST_ENTRIES == chmui->head) { + chmui->overflow_set = true; + interrupt_needed = true; + } + } + + if (interrupt_needed) { + if (msix_enabled(pdev)) { + msix_notify(pdev, chmui->msi_n); + } else if (msi_enabled(pdev)) { + msi_notify(pdev, chmui->msi_n); + } + } + + timer_mod(chmui->timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + chmui->epoch_ms); +} + +int cxl_chmu_register_block_init(Object *obj, + CXLDeviceState *cxl_dstate, + int id, uint8_t msi_n, + Error **errp) +{ + CHMUState *chmu = &cxl_dstate->chmu[id]; + MemoryRegion *registers = &cxl_dstate->chmu_registers[id]; + g_autofree gchar *name = g_strdup_printf("chmu%d-registers", id); + struct sockaddr_in server_addr; + int i; + + memory_region_init_io(registers, obj, &chmu_ops, chmu, name, + pow2ceil(CXL_CHMU_SIZE)); + memory_region_add_subregion(&cxl_dstate->device_registers, + CXL_CHMU_OFFSET(id), registers); + + for (i = 0; i < CXL_CHMU_INSTANCES_PER_BLOCK; i++) { + CHMUInstance *chmui = &chmu->inst[i]; + + chmui->parent = chmu;/* hack */ + chmui->private = obj; + chmui->msi_n = msi_n + i; + chmui->timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, chmu_timer_update, + chmui); + } + + if (chmu->port) { + uint64_t helloval = 41; + chmu->socket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (chmu->socket < 0) { + error_setg(errp, "Failed to create a socket"); + return -1; + } + + memset((char *)&server_addr, 0, sizeof(server_addr)); + server_addr.sin_family = AF_INET; + server_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + server_addr.sin_port = htons(chmu->port); + if (connect(chmu->socket, (struct sockaddr *)&server_addr, + sizeof(server_addr)) < 0) { + close(chmu->socket); + error_setg(errp, "Socket connect failed"); + return -1; + } + + send(chmu->socket, &helloval, sizeof(helloval), 0); + for (i = 0; i < CXL_CHMU_INSTANCES_PER_BLOCK; i++) { + int rc; + rc = chmu_send(chmu, i, SET_HOTLIST_SIZE, + CHMU_HOTLIST_LENGTH, NULL); + if (rc) { + error_setg(errp, "Failed to set hotlist size"); + return rc; + } + + rc = chmu_send(chmu, i, SET_NUMBER_GRANUALS, + cxl_dstate->static_mem_size / 4096, NULL); + if (rc) { + error_setg(errp, "Failed to set number of granuals"); + return rc; + } + } + } + return 0; +} diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index c1004ddae8..78426758af 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -38,7 +38,10 @@ enum CXL_T3_MSIX_VECTOR { CXL_T3_MSIX_CPMU0, CXL_T3_MSIX_CPMU1, CXL_T3_MSIX_PCIE_DOE_COMPLIANCE, - CXL_T3_MSIX_VECTOR_NR + CXL_T3_MSIX_CHMU0_BASE, + /* One interrupt per CMUH instance in the block */ + CXL_T3_MSIX_VECTOR_NR = + CXL_T3_MSIX_CHMU0_BASE + CXL_CHMU_INSTANCES_PER_BLOCK, }; #define DWORD_BYTE 4 @@ -499,6 +502,8 @@ static void build_dvsecs(CXLType3Dev *ct3d) RBI_CXL_CPMU_REG | CXL_DEVICE_REG_BAR_IDX; regloc_dvsec->reg_base[2 + i].hi = 0; } + regloc_dvsec->reg_base[4].lo = CXL_CHMU_OFFSET(0) | RBI_CXL_CHMU_REG | + CXL_DEVICE_REG_BAR_IDX; cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC, REG_LOC_DVSEC_REVID, (uint8_t *)regloc_dvsec); @@ -535,6 +540,17 @@ static void hdm_decoder_commit(CXLType3Dev *ct3d, int which) ctrl = FIELD_DP32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED, 1); stl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_inc, ctrl); + + if (which == 0) { + uint32_t low, high; + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO); + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI); + ct3d->cxl_dstate.chmu[0].base = ((uint64_t)high << 32) | (low & 0xf0000000); + + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_LO); + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_HI); + ct3d->cxl_dstate.chmu[0].size = ((uint64_t)high << 32) | (low & 0xf0000000); + } } static void hdm_decoder_uncommit(CXLType3Dev *ct3d, int which) @@ -1008,6 +1024,12 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp) CXL_T3_MSIX_CPMU0); cxl_cpmu_register_block_init(OBJECT(pci_dev), &ct3d->cxl_dstate, 1, CXL_T3_MSIX_CPMU1); + rc = cxl_chmu_register_block_init(OBJECT(pci_dev), &ct3d->cxl_dstate, + 0, CXL_T3_MSIX_CHMU0_BASE, errp); + if (rc) { + goto err_free_special_ops; + } + pci_register_bar(pci_dev, CXL_DEVICE_REG_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, @@ -1317,6 +1339,7 @@ static const Property ct3_props[] = { speed, PCIE_LINK_SPEED_32), DEFINE_PROP_PCIE_LINK_WIDTH("x-width", CXLType3Dev, width, PCIE_LINK_WIDTH_16), + DEFINE_PROP_UINT16("chmu-port", CXLType3Dev, cxl_dstate.chmu[0].port, 0), }; static uint64_t get_lsa_size(CXLType3Dev *ct3d) diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build index 4db7cad267..c97e64b586 100644 --- a/hw/cxl/meson.build +++ b/hw/cxl/meson.build @@ -6,6 +6,7 @@ system_ss.add(when: 'CONFIG_CXL', 'cxl-host.c', 'cxl-cdat.c', 'cxl-events.c', + 'cxl-chmu.c', 'cxl-cpmu.c', 'switch-mailbox-cci.c', ), From patchwork Fri Jan 24 17:29:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 13949792 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E12C2C02181 for ; Fri, 24 Jan 2025 17:30:50 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tbNVn-0002Ha-85; Fri, 24 Jan 2025 12:30:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tbNVU-000285-Ui for qemu-devel@nongnu.org; Fri, 24 Jan 2025 12:30:14 -0500 Received: from frasgout.his.huawei.com ([185.176.79.56]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tbNVS-0000T8-VY for qemu-devel@nongnu.org; Fri, 24 Jan 2025 12:30:12 -0500 Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4YflBx2h0Zz67l35; Sat, 25 Jan 2025 01:28:09 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 04EB8140B18; Sat, 25 Jan 2025 01:30:09 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 24 Jan 2025 18:30:08 +0100 To: , , CC: =?utf-8?q?Alex_Benn=C3=A9e?= , Alexandre Iooss , Mahmoud Mandour , Pierrick Bouvier , , Niyas Sait Subject: [RFC PATCH QEMU 2/3] plugins: Add cache miss reporting over a socket. Date: Fri, 24 Jan 2025 17:29:04 +0000 Message-ID: <20250124172905.84099-3-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250124172905.84099-1-Jonathan.Cameron@huawei.com> References: <20250124172905.84099-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml100004.china.huawei.com (7.191.162.219) To frapeml500008.china.huawei.com (7.182.85.71) Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron X-Patchwork-Original-From: Jonathan Cameron via From: Jonathan Cameron Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This allows an external program to act as a hotness tracker. Signed-off-by: Jonathan Cameron --- contrib/plugins/cache.c | 75 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 68 insertions(+), 7 deletions(-) diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c index 7baff86860..5af1e6559c 100644 --- a/contrib/plugins/cache.c +++ b/contrib/plugins/cache.c @@ -7,10 +7,17 @@ #include #include +#include #include +#include +#include #include +static int client_socket = -1; +static uint64_t missfilterbase; +static uint64_t missfiltersize; + #define STRTOLL(x) g_ascii_strtoll(x, NULL, 10) QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION; @@ -104,6 +111,7 @@ static Cache **l2_ucaches; static GMutex *l1_dcache_locks; static GMutex *l1_icache_locks; static GMutex *l2_ucache_locks; +static GMutex *socket_lock; static uint64_t l1_dmem_accesses; static uint64_t l1_imem_accesses; @@ -385,6 +393,21 @@ static bool access_cache(Cache *cache, uint64_t addr) return false; } +static void miss(uint64_t paddr) +{ + if (client_socket < 0) { + return; + } + + if (paddr < missfilterbase || paddr >= missfilterbase + missfiltersize) { + return; + } + + g_mutex_lock(socket_lock); + send(client_socket, &paddr, sizeof(paddr), 0); + g_mutex_unlock(socket_lock); +} + static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info, uint64_t vaddr, void *userdata) { @@ -395,9 +418,6 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info, bool hit_in_l1; hwaddr = qemu_plugin_get_hwaddr(info, vaddr); - if (hwaddr && qemu_plugin_hwaddr_is_io(hwaddr)) { - return; - } effective_addr = hwaddr ? qemu_plugin_hwaddr_phys_addr(hwaddr) : vaddr; cache_idx = vcpu_index % cores; @@ -412,7 +432,11 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info, l1_dcaches[cache_idx]->accesses++; g_mutex_unlock(&l1_dcache_locks[cache_idx]); - if (hit_in_l1 || !use_l2) { + if (hit_in_l1) { + return; + } + if (!use_l2) { + miss(effective_addr); /* No need to access L2 */ return; } @@ -422,6 +446,7 @@ static void vcpu_mem_access(unsigned int vcpu_index, qemu_plugin_meminfo_t info, insn = userdata; __atomic_fetch_add(&insn->l2_misses, 1, __ATOMIC_SEQ_CST); l2_ucaches[cache_idx]->misses++; + miss(effective_addr); } l2_ucaches[cache_idx]->accesses++; g_mutex_unlock(&l2_ucache_locks[cache_idx]); @@ -447,8 +472,12 @@ static void vcpu_insn_exec(unsigned int vcpu_index, void *userdata) l1_icaches[cache_idx]->accesses++; g_mutex_unlock(&l1_icache_locks[cache_idx]); - if (hit_in_l1 || !use_l2) { - /* No need to access L2 */ + if (hit_in_l1) { + return; + } + + if (!use_l2) { + miss(insn_addr); return; } @@ -739,14 +768,16 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info, int argc, char **argv) { - int i; + int i, port; int l1_iassoc, l1_iblksize, l1_icachesize; int l1_dassoc, l1_dblksize, l1_dcachesize; int l2_assoc, l2_blksize, l2_cachesize; + struct sockaddr_in server_addr; limit = 32; sys = info->system_emulation; + port = -1; l1_dassoc = 8; l1_dblksize = 64; l1_dcachesize = l1_dblksize * l1_dassoc * 32; @@ -808,11 +839,39 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info, fprintf(stderr, "invalid eviction policy: %s\n", opt); return -1; } + } else if (g_strcmp0(tokens[0], "port") == 0) { + port = STRTOLL(tokens[1]); + } else if (g_strcmp0(tokens[0], "missfilterbase") == 0) { + missfilterbase = STRTOLL(tokens[1]); + } else if (g_strcmp0(tokens[0], "missfiltersize") == 0) { + missfiltersize = STRTOLL(tokens[1]); } else { fprintf(stderr, "option parsing failed: %s\n", opt); return -1; } } + if (port >= -1) { + uint64_t paddr = 42; /* hello, I'm a provider */ + client_socket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (client_socket < 0) { + printf("failed to create a socket\n"); + return -1; + } + printf("Cache miss reported on on %lx size %lx\n", + missfilterbase, missfiltersize); + memset((char *)&server_addr, 0, sizeof(server_addr)); + server_addr.sin_family = AF_INET; + server_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + server_addr.sin_port = htons(port); + + if (connect(client_socket, (struct sockaddr *)&server_addr, + sizeof(server_addr)) < 0) { + close(client_socket); + return -1; + } + /* Let it know we are a data provider */ + send(client_socket, &paddr, sizeof(paddr), 0); + } policy_init(); @@ -840,6 +899,8 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info, return -1; } + socket_lock = g_new0(GMutex, 1); + l1_dcache_locks = g_new0(GMutex, cores); l1_icache_locks = g_new0(GMutex, cores); l2_ucache_locks = use_l2 ? g_new0(GMutex, cores) : NULL; From patchwork Fri Jan 24 17:29:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 13949793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A655DC02181 for ; Fri, 24 Jan 2025 17:31:10 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tbNWG-0002XO-41; Fri, 24 Jan 2025 12:31:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tbNW1-0002QI-T0 for qemu-devel@nongnu.org; Fri, 24 Jan 2025 12:30:49 -0500 Received: from frasgout.his.huawei.com ([185.176.79.56]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tbNVz-0000Wt-2h for qemu-devel@nongnu.org; Fri, 24 Jan 2025 12:30:45 -0500 Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4YflCX4Jmjz6L58F; Sat, 25 Jan 2025 01:28:40 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 404931402DB; Sat, 25 Jan 2025 01:30:40 +0800 (CST) Received: from SecurePC-101-06.china.huawei.com (10.122.19.247) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 24 Jan 2025 18:30:39 +0100 To: , , CC: =?utf-8?q?Alex_Benn=C3=A9e?= , Alexandre Iooss , Mahmoud Mandour , Pierrick Bouvier , , Niyas Sait Subject: [RFC PATCH QEMU x3/3] contrib: Add example hotness monitoring unit server Date: Fri, 24 Jan 2025 17:29:05 +0000 Message-ID: <20250124172905.84099-4-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250124172905.84099-1-Jonathan.Cameron@huawei.com> References: <20250124172905.84099-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.122.19.247] X-ClientProxiedBy: lhrpeml100004.china.huawei.com (7.191.162.219) To frapeml500008.china.huawei.com (7.182.85.71) Received-SPF: pass client-ip=185.176.79.56; envelope-from=jonathan.cameron@huawei.com; helo=frasgout.his.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Jonathan Cameron X-Patchwork-Original-From: Jonathan Cameron via From: Jonathan Cameron Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This is used inconjuction with the cache plugin (with port parameter supplied) and the CXL Type 3 device with a hotness monitoring unit (chmu-port parameter supplied). It implements a very basic oracle with a counter per 4KiB page and simple loop to find large counts. The hotlist length is controlled by the QEMU device implementation. This is only responsible for the data handling events etc are a problem for the CXL HMU emulation. Note that when running this things are fairly slow. Signed-off-by: Jonathan Cameron --- contrib/hmu/hmu.c | 312 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 312 insertions(+) diff --git a/contrib/hmu/hmu.c b/contrib/hmu/hmu.c new file mode 100644 index 0000000000..aa47efd98b --- /dev/null +++ b/contrib/hmu/hmu.c @@ -0,0 +1,312 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#define ID_PROVIDER 42 +#define ID_CONSUMER 41 + +/* Move to shared header */ +enum consumer_request { + QUERY_TAIL, + QUERY_HEAD, + SET_HEAD, + SET_HOTLIST_SIZE, + QUERY_HOTLIST_ENTRY, + SIGNAL_EPOCH_END, + SET_ENABLED, + SET_NUMBER_GRANUALS, + SET_HPA_BASE, + SET_HPA_SIZE, +}; + +struct tracking_instance { + uint64_t base, size; + uint16_t head, tail; + uint16_t hotlist_length; + uint64_t *hotlist; + int32_t *counters; + size_t num_counters; + bool enabled; +}; + +#define MAX_INSTANCES 16 +static int num_tracking_instances; +static struct tracking_instance *instances[MAX_INSTANCES] = {}; +/* + * Instances never removed so this only protects the index against + * parallel creations. + */ +pthread_mutex_t instances_lock; +static int register_tracker(struct tracking_instance *inst) +{ + pthread_mutex_lock(&instances_lock); + if (num_tracking_instances >= MAX_INSTANCES) { + pthread_mutex_unlock(&instances_lock); + return -1; + } + instances[num_tracking_instances++] = inst; + printf("registered %d\n", num_tracking_instances); + pthread_mutex_unlock(&instances_lock); + return 0; +} + +static void notify_tracker(struct tracking_instance *inst, uint64_t paddr) +{ + uint64_t offset; + + if (paddr < inst->base || paddr >= inst->base + inst->size) { + return; + } + /* Fixme: multiple regions */ + offset = (paddr - inst->base) / 4096; + + /* TODO - check masking */ + + if (!inst->counters) { + printf("No counter storage\n"); + return; + } + if (offset >= inst->num_counters) { + printf("out of range? %lx %lx\n", offset, inst->num_counters); + return; + } + inst->counters[offset]++; +} + +/* CHMU instance in QEMU */ +static void *provider_innerloop(void * _socket) +{ + int socket = *(int *)_socket; + uint64_t paddr; + int rc; + + printf("Provider connected\n"); + while (1) { + rc = read(socket, &paddr, sizeof(paddr)); + if (rc == 0) { + return NULL; + } + /* Lock not taken as instances only goes up which should be safe */ + for (int i = 0; i < num_tracking_instances; i++) + if (instances[i]->enabled) { + notify_tracker(instances[i], paddr); + } + } +} + + +/* Cache plugin hopefully squirting us some data */ +static void *consumer_innerloop(void *_socket) +{ + int socket = *(int *)_socket; + /* for now all chmu have 3 instances */ + struct tracking_instance insts[3] = {}; + /* Instance, command, parameter */ + uint64_t paddr[3]; + int rc; + + for (int i = 0; i < 3; i++) { + rc = register_tracker(&insts[i]); + if (rc) { + printf("Failed to register tracker\n"); + return NULL; + /* todo cleanup to not have partial trackers registered */ + } + } + printf("Consumer connected\n"); + + while (1) { + uint64_t reply, param; + enum consumer_request request; + + struct tracking_instance *inst; + + rc = read(socket, paddr, sizeof(paddr)); + if (rc < sizeof(paddr)) { + printf("short message %x\n", rc); + return NULL; + } + if (paddr[0] > 3) { + printf("garbage\n"); + exit(-1); + } + inst = &insts[paddr[0]]; + request = paddr[1]; + param = paddr[2]; + + switch (request) { + case QUERY_TAIL: + reply = inst->tail; + break; + case QUERY_HEAD: + reply = inst->head; + break; + case SET_HEAD: + reply = param; + inst->head = param; + break; + case SET_HOTLIST_SIZE: { + uint64_t *newlist; + reply = param; + inst->hotlist_length = param; + newlist = realloc(inst->hotlist, sizeof(*inst->hotlist) * param); + if (!newlist) { + printf("failed to allocate hotlist\n"); + break; + } + inst->hotlist = newlist; + break; + } + case QUERY_HOTLIST_ENTRY: + if (param >= inst->hotlist_length) { + printf("out of range hotlist read?\n"); + break; + } + reply = inst->hotlist[param]; + break; + case SIGNAL_EPOCH_END: { + int space; + int added = 0; + printf("into epoch end\n"); + reply = param; + + if (insts->tail > inst->head) { + space = inst->tail - inst->head; + } else { + space = inst->hotlist_length - inst->tail + + inst->head; + } + if (!inst->counters) { + printf("How did we reach end of an epoque without counters?\n"); + break; + } + for (int i = 0; i < inst->num_counters; i++) { + if (!(inst->counters[i] > 0)) { + continue; + } + inst->hotlist[inst->tail] = + (uint64_t)inst->counters[i] | ((uint64_t)i << 32); + printf("added hotlist element %lx at %u\n", + inst->hotlist[inst->tail], inst->tail); + inst->tail = (inst->tail + 1) % inst->hotlist_length; + added++; + if (added == space) { + break; + } + } + memset(inst->counters, 0, + inst->num_counters * sizeof(*inst->counters)); + + printf("End of epoch %u %u\n", inst->head, inst->tail); + /* Overflow hadnling based on fullness detection in qemu */ + break; + } + case SET_ENABLED: + reply = param; + inst->enabled = !!param; + printf("enabled? %d\n", inst->enabled); + break; + case SET_NUMBER_GRANUALS: { /* FIXME Should derive from granual size */ + uint32_t *newcounters; + + reply = param; + newcounters = realloc(inst->counters, + sizeof(*inst->counters) * + param); + if (!newcounters) { + printf("Failed to allocate counter storage\n"); + } + printf("allocated space for %lu counters\n", param); + inst->counters = newcounters; + inst->num_counters = param; + break; + } + case SET_HPA_BASE: + reply = param; + inst->base = param; + break; + case SET_HPA_SIZE: /* Size */ + reply = param; + inst->size = param; + break; + default: + printf("No idea yet\n"); + break; + } + write(socket, &reply, sizeof(reply)); + } +} + +int main(int argc, char **argv) +{ + int server_fd, new_socket; + struct sockaddr_in address; + int opt = 1; + int addrlen = sizeof(address); + uint64_t paddr; + unsigned short port; + + if (argc < 2) { + printf("Please provide port to listen on\n"); + return -1; + } + port = atoi(argv[1]); + + if ((server_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == 0) { + return -1; + } + + if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR | SO_REUSEPORT, + &opt, sizeof(opt))) { + return -1; + } + address.sin_family = AF_INET; + address.sin_addr.s_addr = INADDR_ANY; + address.sin_port = htons(port); + + if (bind(server_fd, (struct sockaddr *)&address, sizeof(address)) < 0) { + return -1; + } + + printf("Listening on port %u\n", port); + if (listen(server_fd, 3) < 0) { + return -1; + } + + while (1) { + int rc; + pthread_t thread; + if ((new_socket = accept(server_fd, (struct sockaddr *)&address, + (socklen_t *)&addrlen)) < 0) { + exit(-1); + } + + rc = read(new_socket, &paddr, sizeof(paddr)); + if (rc == 0) { + return 0; + } + + if (paddr == ID_PROVIDER) { + if (pthread_create(&thread, NULL, provider_innerloop, + &new_socket)) { + printf("thread create fail\n"); + }; + } else if (paddr == ID_CONSUMER) { + if (pthread_create(&thread, NULL, consumer_innerloop, + &new_socket)) { + printf("thread create fail\n"); + }; + } else { + printf("No idea what this was - initial value not provider or consumer\n"); + close(new_socket); + return 0; + } + } + + return 0; +}