From patchwork Wed Jun 14 19:16:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13280393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 971B3EB64DB for ; Wed, 14 Jun 2023 19:19:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234582AbjFNTTk (ORCPT ); Wed, 14 Jun 2023 15:19:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233317AbjFNTTj (ORCPT ); Wed, 14 Jun 2023 15:19:39 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D44B2137 for ; Wed, 14 Jun 2023 12:19:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686770377; x=1718306377; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to; bh=cKeS+U4sYUhplZfWFntMyShCYSXobYHb2rZR1SYY6F8=; b=hL7UdEKmyKN3DCEb7kgPrPPEGoQY95AIWbfk8b3RRaDIUIz6fDfKeVIM nV/NQ69eY+bJv5ZkJX8I4Dtz7awLIoq7/w+175x7Amr2QgJINO92f+q2O 09J/3FikCGNJ3v+GJNmklJl+X7q/ApKX/kdUqGXnACLQUBE3ZhAeXNE1d IR7PIlC8dRmxX+efW3hXmEQxvu4SwUshGsNYuVKYBApQth15+//rTkC2W ZAq/HGCnwV0LFvUoGaFG3dcWVOJMO/7h4R0O/bAFoww5FE8FoHw1ZFMgN LoBCTudLmkx7W7/Tadi8PIWGPgymnt8v8aBs5PicdjDkLF3Dy/YV20IzG A==; X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="338347273" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="338347273" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="886384233" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="886384233" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.212.116.198]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:33 -0700 From: ira.weiny@intel.com Date: Wed, 14 Jun 2023 12:16:28 -0700 Subject: [PATCH 1/5] cxl/mem : Read Dynamic capacity configuration from the device MIME-Version: 1.0 Message-Id: <20230604-dcd-type2-upstream-v1-1-71b6341bae54@intel.com> References: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> In-Reply-To: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> To: Navneet Singh , Fan Ni , Jonathan Cameron , Ira Weiny , Dan Williams , linux-cxl@vger.kernel.org X-Mailer: b4 0.13-dev-9a8cd X-Developer-Signature: v=1; a=ed25519-sha256; t=1686770367; l=15127; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=PcsN4I9rqNsPCQZzgSCQbJjGpJotFPnQ2dQGxBeKS8g=; b=EpVaah1GhS7j8Xt4k1WLlUhAw/3OAqTChjeOYGReiIjeIGz0Kdl8n8tp8Aq6K0do3Z3FG4/hz A1i8qjyOx/ZBvl39g/leeyHfrNZeZ2sYhGVy1TZpb2yIIH9H32MJEaj X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Navneet Singh Read the Dynamic capacity configuration and store dynamic capacity region information in the device state which driver will use to map into the HDM ranges. Implement Get Dynamic Capacity Configuration (opcode 4800h) mailbox command as specified in CXL 3.0 spec section 8.2.9.8.9.1. Signed-off-by: Navneet Singh --- [iweiny: ensure all mds->dc_region's are named] --- drivers/cxl/core/mbox.c | 190 ++++++++++++++++++++++++++++++++++++++++++++++-- drivers/cxl/cxlmem.h | 70 +++++++++++++++++- drivers/cxl/pci.c | 4 + 3 files changed, 256 insertions(+), 8 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 3ca0bf12c55f..c5b696737c87 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -111,6 +111,37 @@ static u8 security_command_sets[] = { 0x46, /* Security Passthrough */ }; +static bool cxl_is_dcd_command(u16 opcode) +{ +#define CXL_MBOX_OP_DCD_CMDS 0x48 + + if ((opcode >> 8) == CXL_MBOX_OP_DCD_CMDS) + return true; + + return false; +} + +static void cxl_set_dcd_cmd_enabled(struct cxl_memdev_state *mds, + u16 opcode) +{ + switch (opcode) { + case CXL_MBOX_OP_GET_DC_CONFIG: + set_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); + break; + case CXL_MBOX_OP_GET_DC_EXTENT_LIST: + set_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds); + break; + case CXL_MBOX_OP_ADD_DC_RESPONSE: + set_bit(CXL_DCD_ENABLED_ADD_RESPONSE, mds->dcd_cmds); + break; + case CXL_MBOX_OP_RELEASE_DC: + set_bit(CXL_DCD_ENABLED_RELEASE, mds->dcd_cmds); + break; + default: + break; + } +} + static bool cxl_is_security_command(u16 opcode) { int i; @@ -666,6 +697,7 @@ static int cxl_xfer_log(struct cxl_memdev_state *mds, uuid_t *uuid, static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel) { struct cxl_cel_entry *cel_entry; + struct cxl_mem_command *cmd; const int cel_entries = size / sizeof(*cel_entry); struct device *dev = mds->cxlds.dev; int i; @@ -674,11 +706,12 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel) for (i = 0; i < cel_entries; i++) { u16 opcode = le16_to_cpu(cel_entry[i].opcode); - struct cxl_mem_command *cmd = cxl_mem_find_command(opcode); + cmd = cxl_mem_find_command(opcode); - if (!cmd && !cxl_is_poison_command(opcode)) { - dev_dbg(dev, - "Opcode 0x%04x unsupported by driver\n", opcode); + if (!cmd && !cxl_is_poison_command(opcode) && + !cxl_is_dcd_command(opcode)) { + dev_dbg(dev, "Opcode 0x%04x unsupported by driver\n", + opcode); continue; } @@ -688,6 +721,9 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel) if (cxl_is_poison_command(opcode)) cxl_set_poison_cmd_enabled(&mds->poison, opcode); + if (cxl_is_dcd_command(opcode)) + cxl_set_dcd_cmd_enabled(mds, opcode); + dev_dbg(dev, "Opcode 0x%04x enabled\n", opcode); } } @@ -1059,7 +1095,7 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds) if (rc < 0) return rc; - mds->total_bytes = + mds->total_static_capacity = le64_to_cpu(id.total_capacity) * CXL_CAPACITY_MULTIPLIER; mds->volatile_only_bytes = le64_to_cpu(id.volatile_capacity) * CXL_CAPACITY_MULTIPLIER; @@ -1077,10 +1113,137 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds) mds->poison.max_errors = min_t(u32, val, CXL_POISON_LIST_MAX); } + mds->dc_event_log_size = le16_to_cpu(id.dc_event_log_size); + return 0; } EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL); +/** + * cxl_dev_dynamic_capacity_identify() - Reads the dynamic capacity + * information from the device. + * @mds: The memory device state + * Return: 0 if identify was executed successfully. + * + * This will dispatch the get_dynamic_capacity command to the device + * and on success populate structures to be exported to sysfs. + */ +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds) +{ + struct cxl_dev_state *cxlds = &mds->cxlds; + struct device *dev = cxlds->dev; + struct cxl_mbox_dynamic_capacity *dc; + struct cxl_mbox_get_dc_config get_dc; + struct cxl_mbox_cmd mbox_cmd; + u64 next_dc_region_start; + int rc, i; + + for (i = 0; i < CXL_MAX_DC_REGION; i++) + sprintf(mds->dc_region[i].name, "dc%d", i); + + /* Check GET_DC_CONFIG is supported by device */ + if (!test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds)) { + dev_dbg(dev, "unsupported cmd: get_dynamic_capacity_config\n"); + return 0; + } + + dc = kvmalloc(mds->payload_size, GFP_KERNEL); + if (!dc) + return -ENOMEM; + + get_dc = (struct cxl_mbox_get_dc_config) { + .region_count = CXL_MAX_DC_REGION, + .start_region_index = 0, + }; + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_CONFIG, + .payload_in = &get_dc, + .size_in = sizeof(get_dc), + .size_out = mds->payload_size, + .payload_out = dc, + .min_out = 1, + }; + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + goto dc_error; + + mds->nr_dc_region = dc->avail_region_count; + + if (mds->nr_dc_region < 1 || mds->nr_dc_region > CXL_MAX_DC_REGION) { + dev_err(dev, "Invalid num of dynamic capacity regions %d\n", + mds->nr_dc_region); + rc = -EINVAL; + goto dc_error; + } + + for (i = 0; i < mds->nr_dc_region; i++) { + struct cxl_dc_region_info *dcr = &mds->dc_region[i]; + + dcr->base = le64_to_cpu(dc->region[i].region_base); + dcr->decode_len = + le64_to_cpu(dc->region[i].region_decode_length); + dcr->decode_len *= CXL_CAPACITY_MULTIPLIER; + dcr->len = le64_to_cpu(dc->region[i].region_length); + dcr->blk_size = le64_to_cpu(dc->region[i].region_block_size); + + /* Check regions are in increasing DPA order */ + if ((i + 1) < mds->nr_dc_region) { + next_dc_region_start = + le64_to_cpu(dc->region[i + 1].region_base); + if ((dcr->base > next_dc_region_start) || + ((dcr->base + dcr->decode_len) > next_dc_region_start)) { + dev_err(dev, + "DPA ordering violation for DC region %d and %d\n", + i, i + 1); + rc = -EINVAL; + goto dc_error; + } + } + + /* Check the region is 256 MB aligned */ + if (!IS_ALIGNED(dcr->base, SZ_256M)) { + dev_err(dev, "DC region %d not aligned to 256MB\n", i); + rc = -EINVAL; + goto dc_error; + } + + /* Check Region base and length are aligned to block size */ + if (!IS_ALIGNED(dcr->base, dcr->blk_size) || + !IS_ALIGNED(dcr->len, dcr->blk_size)) { + dev_err(dev, "DC region %d not aligned to %#llx\n", i, + dcr->blk_size); + rc = -EINVAL; + goto dc_error; + } + + dcr->dsmad_handle = + le32_to_cpu(dc->region[i].region_dsmad_handle); + dcr->flags = dc->region[i].flags; + sprintf(dcr->name, "dc%d", i); + + dev_dbg(dev, + "DC region %s DPA: %#llx LEN: %#llx BLKSZ: %#llx\n", + dcr->name, dcr->base, dcr->decode_len, dcr->blk_size); + } + + /* + * Calculate entire DPA range of all configured regions which will be mapped by + * one or more HDM decoders + */ + mds->total_dynamic_capacity = + mds->dc_region[mds->nr_dc_region - 1].base + + mds->dc_region[mds->nr_dc_region - 1].decode_len - + mds->dc_region[0].base; + dev_dbg(dev, "Total dynamic capacity: %#llx\n", + mds->total_dynamic_capacity); + +dc_error: + kvfree(dc); + return rc; +} +EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) @@ -1112,6 +1275,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) struct cxl_dev_state *cxlds = &mds->cxlds; struct device *dev = cxlds->dev; int rc; + size_t untenanted_mem = + mds->dc_region[0].base - mds->total_static_capacity; + + mds->total_capacity = mds->total_static_capacity + + untenanted_mem + mds->total_dynamic_capacity; if (!cxlds->media_ready) { cxlds->dpa_res = DEFINE_RES_MEM(0, 0); @@ -1121,13 +1289,23 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) } cxlds->dpa_res = - (struct resource)DEFINE_RES_MEM(0, mds->total_bytes); + (struct resource)DEFINE_RES_MEM(0, mds->total_capacity); + + for (int i = 0; i < CXL_MAX_DC_REGION; i++) { + struct cxl_dc_region_info *dcr = &mds->dc_region[i]; + + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->dc_res[i], + dcr->base, dcr->decode_len, dcr->name); + if (rc) + return rc; + } if (mds->partition_align_bytes == 0) { rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, mds->volatile_only_bytes, "ram"); if (rc) return rc; + return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res, mds->volatile_only_bytes, mds->persistent_only_bytes, "pmem"); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 89e560ea14c0..9c0b2fa72bdd 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -239,6 +239,15 @@ struct cxl_event_state { struct mutex log_lock; }; +/* Device enabled DCD commands */ +enum dcd_cmd_enabled_bits { + CXL_DCD_ENABLED_GET_CONFIG, + CXL_DCD_ENABLED_GET_EXTENT_LIST, + CXL_DCD_ENABLED_ADD_RESPONSE, + CXL_DCD_ENABLED_RELEASE, + CXL_DCD_ENABLED_MAX +}; + /* Device enabled poison commands */ enum poison_cmd_enabled_bits { CXL_POISON_ENABLED_LIST, @@ -284,6 +293,9 @@ enum cxl_devtype { CXL_DEVTYPE_CLASSMEM, }; +#define CXL_MAX_DC_REGION 8 +#define CXL_DC_REGION_SRTLEN 8 + /** * struct cxl_dev_state - The driver device state * @@ -300,6 +312,8 @@ enum cxl_devtype { * @dpa_res: Overall DPA resource tree for the device * @pmem_res: Active Persistent memory capacity configuration * @ram_res: Active Volatile memory capacity configuration + * @dc_res: Active Dynamic Capacity memory configuration for each possible + * region * @component_reg_phys: register base of component registers * @info: Cached DVSEC information about the device. * @serial: PCIe Device Serial Number @@ -315,6 +329,7 @@ struct cxl_dev_state { struct resource dpa_res; struct resource pmem_res; struct resource ram_res; + struct resource dc_res[CXL_MAX_DC_REGION]; resource_size_t component_reg_phys; u64 serial; enum cxl_devtype type; @@ -334,9 +349,12 @@ struct cxl_dev_state { * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device) * @mbox_mutex: Mutex to synchronize mailbox access. * @firmware_version: Firmware version for the memory device. + * @dcd_cmds: List of DCD commands implemented by memory device * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only - * @total_bytes: sum of all possible capacities + * @total_capacity: Sum of static and dynamic capacities + * @total_static_capacity: Sum of RAM and PMEM capacities + * @total_dynamic_capacity: Complete DPA range occupied by DC regions * @volatile_only_bytes: hard volatile capacity * @persistent_only_bytes: hard persistent capacity * @partition_align_bytes: alignment size for partition-able capacity @@ -344,6 +362,10 @@ struct cxl_dev_state { * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset + * @nr_dc_region: number of DC regions implemented in the memory device + * @dc_region: array containing info about the DC regions + * @dc_event_log_size: The number of events the device can store in the + * Dynamic Capacity Event Log before it overflows * @event: event log driver state * @poison: poison driver state info * @mbox_send: @dev specific transport for transmitting mailbox commands @@ -357,9 +379,13 @@ struct cxl_memdev_state { size_t lsa_size; struct mutex mbox_mutex; /* Protects device mailbox and firmware */ char firmware_version[0x10]; + DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX); DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); - u64 total_bytes; + + u64 total_capacity; + u64 total_static_capacity; + u64 total_dynamic_capacity; u64 volatile_only_bytes; u64 persistent_only_bytes; u64 partition_align_bytes; @@ -367,6 +393,20 @@ struct cxl_memdev_state { u64 active_persistent_bytes; u64 next_volatile_bytes; u64 next_persistent_bytes; + + u8 nr_dc_region; + + struct cxl_dc_region_info { + u8 name[CXL_DC_REGION_SRTLEN]; + u64 base; + u64 decode_len; + u64 len; + u64 blk_size; + u32 dsmad_handle; + u8 flags; + } dc_region[CXL_MAX_DC_REGION]; + + size_t dc_event_log_size; struct cxl_event_state event; struct cxl_poison_state poison; int (*mbox_send)(struct cxl_memdev_state *mds, @@ -415,6 +455,10 @@ enum cxl_opcode { CXL_MBOX_OP_UNLOCK = 0x4503, CXL_MBOX_OP_FREEZE_SECURITY = 0x4504, CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE = 0x4505, + CXL_MBOX_OP_GET_DC_CONFIG = 0x4800, + CXL_MBOX_OP_GET_DC_EXTENT_LIST = 0x4801, + CXL_MBOX_OP_ADD_DC_RESPONSE = 0x4802, + CXL_MBOX_OP_RELEASE_DC = 0x4803, CXL_MBOX_OP_MAX = 0x10000 }; @@ -462,6 +506,7 @@ struct cxl_mbox_identify { __le16 inject_poison_limit; u8 poison_caps; u8 qos_telemetry_caps; + __le16 dc_event_log_size; } __packed; /* @@ -617,7 +662,27 @@ struct cxl_mbox_set_partition_info { u8 flags; } __packed; +struct cxl_mbox_get_dc_config { + u8 region_count; + u8 start_region_index; +} __packed; + +/* See CXL 3.0 Table 125 get dynamic capacity config Output Payload */ +struct cxl_mbox_dynamic_capacity { + u8 avail_region_count; + u8 rsvd[7]; + struct cxl_dc_region_config { + __le64 region_base; + __le64 region_decode_length; + __le64 region_length; + __le64 region_block_size; + __le32 region_dsmad_handle; + u8 flags; + u8 rsvd[3]; + } __packed region[]; +} __packed; #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0) +#define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0) /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */ struct cxl_mbox_set_timestamp_in { @@ -742,6 +807,7 @@ enum { int cxl_internal_send_cmd(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd); int cxl_dev_state_identify(struct cxl_memdev_state *mds); +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); int cxl_mem_create_range_info(struct cxl_memdev_state *mds); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 4e2845b7331a..ac1a41bc083d 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -742,6 +742,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; + rc = cxl_dev_dynamic_capacity_identify(mds); + if (rc) + return rc; + rc = cxl_mem_create_range_info(mds); if (rc) return rc; From patchwork Wed Jun 14 19:16:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13280394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B8C3EB64D9 for ; Wed, 14 Jun 2023 19:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229703AbjFNTTp (ORCPT ); Wed, 14 Jun 2023 15:19:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233317AbjFNTTn (ORCPT ); Wed, 14 Jun 2023 15:19:43 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FD92213B for ; Wed, 14 Jun 2023 12:19:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686770381; x=1718306381; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to; bh=uJm36zQioyW+aHDtY6CPyx0sFRoyu4XDHca2ZHlmCUk=; b=POVGeG91eZ0pTg+QncFUKmbln+yCnYVaVK4tUTuoljc1VU9iHwoWONgc p60H4EEciqaJjoTr3LFPs0dlLiYBA0GJl1Txf6PoMlhBA98yggU+CqyKh 2oUO/A9/88yTD6sVApYi/h3KZNswbOm7trFAA+N/vW49EZJWRhqq9Z7rL xss7+t9cI/HhP19LMw2Xuq2gQOFOhhUMFgS18GuRnCnQuq8pk3eF6wvp8 rcIv/tceCB9gvz4dyAa7n3VIpFLcJiGkVxaqD9chOf6edcnBQXtw6wd5L UASDp11RvUqEF8Eh+aTonJM0oTghEgxkb8vyn5ROIDo22iYK/7rSA7zh/ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="338347302" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="338347302" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="886384247" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="886384247" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.212.116.198]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:37 -0700 From: ira.weiny@intel.com Date: Wed, 14 Jun 2023 12:16:29 -0700 Subject: [PATCH 2/5] cxl/region: Add dynamic capacity cxl region support. MIME-Version: 1.0 Message-Id: <20230604-dcd-type2-upstream-v1-2-71b6341bae54@intel.com> References: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> In-Reply-To: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> To: Navneet Singh , Fan Ni , Jonathan Cameron , Ira Weiny , Dan Williams , linux-cxl@vger.kernel.org X-Mailer: b4 0.13-dev-9a8cd X-Developer-Signature: v=1; a=ed25519-sha256; t=1686770367; l=22743; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=K/uMHEZD0fcmKBF1SJYEUZrIDrGBbjYB1JzM+tJZHp0=; b=mzBzdjFQFkZKMuvq5s+tQkx4Q7B1T4Xn5vq9Jm1ROkL21O0uQ0KnN3JicTrxD+cDdeye2H2p1 wrk11Ict5DfCmqQWGAm0F+GuadC/AwwsG6Ylmv4DjWuT3OGE/VUIqej X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Navneet Singh CXL devices optionally support dynamic capacity. CXL Regions must be created to access this capacity. Add sysfs entries to create dynamic capacity cxl regions. Provide a new Dynamic Capacity decoder mode which targets dynamic capacity on devices which are added to that region. Below are the steps to create and delete dynamic capacity region0 (example). region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region) echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity echo 1 > /sys/bus/cxl/devices/$region/interleave_ways echo "dc0" >/sys/bus/cxl/devices/decoder1.0/mode echo 0x400000000 >/sys/bus/cxl/devices/decoder1.0/dpa_size echo 0x400000000 > /sys/bus/cxl/devices/$region/size echo "decoder1.0" > /sys/bus/cxl/devices/$region/target0 echo 1 > /sys/bus/cxl/devices/$region/commit echo $region > /sys/bus/cxl/drivers/cxl_region/bind echo $region> /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Navneet Singh --- [iweiny: fixups] [iweiny: remove unused CXL_DC_REGION_MODE macro] [iweiny: Make dc_mode_to_region_index static] [iweiny: simplify /create_dc_region] [iweiny: introduce decoder_mode_is_dc] [djbw: fixups, no sign-off: preview only] --- drivers/cxl/Kconfig | 11 +++ drivers/cxl/core/core.h | 7 ++ drivers/cxl/core/hdm.c | 234 ++++++++++++++++++++++++++++++++++++++++++---- drivers/cxl/core/port.c | 18 ++++ drivers/cxl/core/region.c | 135 ++++++++++++++++++++++++-- drivers/cxl/cxl.h | 28 ++++++ drivers/dax/cxl.c | 4 + 7 files changed, 409 insertions(+), 28 deletions(-) diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index ff4e78117b31..df034889d053 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -121,6 +121,17 @@ config CXL_REGION If unsure say 'y' +config CXL_DCD + bool "CXL: DCD Support" + default CXL_BUS + depends on CXL_REGION + help + Enable the CXL core to provision CXL DCD regions. + CXL devices optionally support dynamic capacity and DCD region + maps the dynamic capacity regions DPA's into Host HPA ranges. + + If unsure say 'y' + config CXL_REGION_INVALIDATION_TEST bool "CXL: Region Cache Management Bypass (TEST)" depends on CXL_REGION diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 27f0968449de..725700ab5973 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -9,6 +9,13 @@ extern const struct device_type cxl_nvdimm_type; extern struct attribute_group cxl_base_attribute_group; +#ifdef CONFIG_CXL_DCD +extern struct device_attribute dev_attr_create_dc_region; +#define SET_CXL_DC_REGION_ATTR(x) (&dev_attr_##x.attr), +#else +#define SET_CXL_DC_REGION_ATTR(x) +#endif + #ifdef CONFIG_CXL_REGION extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 514d30131d92..29649b47d177 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -233,14 +233,23 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) struct cxl_dev_state *cxlds = cxlmd->cxlds; struct resource *res = cxled->dpa_res; resource_size_t skip_start; + resource_size_t skipped = cxled->skip; lockdep_assert_held_write(&cxl_dpa_rwsem); /* save @skip_start, before @res is released */ - skip_start = res->start - cxled->skip; + skip_start = res->start - skipped; __release_region(&cxlds->dpa_res, res->start, resource_size(res)); - if (cxled->skip) - __release_region(&cxlds->dpa_res, skip_start, cxled->skip); + if (cxled->skip != 0) { + while (skipped != 0) { + res = xa_load(&cxled->skip_res, skip_start); + __release_region(&cxlds->dpa_res, skip_start, + resource_size(res)); + xa_erase(&cxled->skip_res, skip_start); + skip_start += resource_size(res); + skipped -= resource_size(res); + } + } cxled->skip = 0; cxled->dpa_res = NULL; put_device(&cxled->cxld.dev); @@ -267,6 +276,19 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) __cxl_dpa_release(cxled); } +static int dc_mode_to_region_index(enum cxl_decoder_mode mode) +{ + int index = 0; + + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) { + if (mode == i) + return index; + index++; + } + + return -EINVAL; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -275,7 +297,11 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &port->dev; + struct device *ed_dev = &cxled->cxld.dev; + struct resource *dpa_res = &cxlds->dpa_res; + resource_size_t skip_len = 0; struct resource *res; + int rc, index; lockdep_assert_held_write(&cxl_dpa_rwsem); @@ -304,28 +330,119 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, } if (skipped) { - res = __request_region(&cxlds->dpa_res, base - skipped, skipped, - dev_name(&cxled->cxld.dev), 0); - if (!res) { - dev_dbg(dev, - "decoder%d.%d: failed to reserve skipped space\n", - port->id, cxled->cxld.id); - return -EBUSY; + resource_size_t skip_base = base - skipped; + + if (decoder_mode_is_dc(cxled->mode)) { + if (resource_size(&cxlds->ram_res) && + skip_base <= cxlds->ram_res.end) { + skip_len = cxlds->ram_res.end - skip_base + 1; + res = __request_region(dpa_res, skip_base, + skip_len, dev_name(ed_dev), 0); + if (!res) + goto error; + + rc = xa_insert(&cxled->skip_res, skip_base, res, + GFP_KERNEL); + skip_base += skip_len; + } + + if (resource_size(&cxlds->ram_res) && + skip_base <= cxlds->pmem_res.end) { + skip_len = cxlds->pmem_res.end - skip_base + 1; + res = __request_region(dpa_res, skip_base, + skip_len, dev_name(ed_dev), 0); + if (!res) + goto error; + + rc = xa_insert(&cxled->skip_res, skip_base, res, + GFP_KERNEL); + skip_base += skip_len; + } + + index = dc_mode_to_region_index(cxled->mode); + for (int i = 0; i <= index; i++) { + struct resource *dcr = &cxlds->dc_res[i]; + + if (skip_base < dcr->start) { + skip_len = dcr->start - skip_base; + res = __request_region(dpa_res, + skip_base, skip_len, + dev_name(ed_dev), 0); + if (!res) + goto error; + + rc = xa_insert(&cxled->skip_res, skip_base, + res, GFP_KERNEL); + skip_base += skip_len; + } + + if (skip_base == base) { + dev_dbg(dev, "skip done!\n"); + break; + } + + if (resource_size(dcr) && + skip_base <= dcr->end) { + if (skip_base > base) + dev_err(dev, "Skip error\n"); + + skip_len = dcr->end - skip_base + 1; + res = __request_region(dpa_res, skip_base, + skip_len, + dev_name(ed_dev), 0); + if (!res) + goto error; + + rc = xa_insert(&cxled->skip_res, skip_base, + res, GFP_KERNEL); + skip_base += skip_len; + } + } + } else { + res = __request_region(dpa_res, base - skipped, skipped, + dev_name(ed_dev), 0); + if (!res) + goto error; + + rc = xa_insert(&cxled->skip_res, skip_base, res, + GFP_KERNEL); } } - res = __request_region(&cxlds->dpa_res, base, len, - dev_name(&cxled->cxld.dev), 0); + + res = __request_region(dpa_res, base, len, dev_name(ed_dev), 0); if (!res) { dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n", - port->id, cxled->cxld.id); - if (skipped) - __release_region(&cxlds->dpa_res, base - skipped, - skipped); + port->id, cxled->cxld.id); + if (skipped) { + resource_size_t skip_base = base - skipped; + + while (skipped != 0) { + if (skip_base > base) + dev_err(dev, "Skip error\n"); + + res = xa_load(&cxled->skip_res, skip_base); + __release_region(dpa_res, skip_base, + resource_size(res)); + xa_erase(&cxled->skip_res, skip_base); + skip_base += resource_size(res); + skipped -= resource_size(res); + } + } return -EBUSY; } cxled->dpa_res = res; cxled->skip = skipped; + for (int mode = CXL_DECODER_DC0; mode <= CXL_DECODER_DC7; mode++) { + int index = dc_mode_to_region_index(mode); + + if (resource_contains(&cxlds->dc_res[index], res)) { + cxled->mode = mode; + dev_dbg(dev, "decoder%d.%d: %pr mode: %d\n", port->id, + cxled->cxld.id, cxled->dpa_res, cxled->mode); + goto success; + } + } if (resource_contains(&cxlds->pmem_res, res)) cxled->mode = CXL_DECODER_PMEM; else if (resource_contains(&cxlds->ram_res, res)) @@ -336,9 +453,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->mode = CXL_DECODER_MIXED; } +success: port->hdm_end++; get_device(&cxled->cxld.dev); return 0; + +error: + dev_dbg(dev, "decoder%d.%d: failed to reserve skipped space\n", + port->id, cxled->cxld.id); + return -EBUSY; + } int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, @@ -429,6 +553,14 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, switch (mode) { case CXL_DECODER_RAM: case CXL_DECODER_PMEM: + case CXL_DECODER_DC0: + case CXL_DECODER_DC1: + case CXL_DECODER_DC2: + case CXL_DECODER_DC3: + case CXL_DECODER_DC4: + case CXL_DECODER_DC5: + case CXL_DECODER_DC6: + case CXL_DECODER_DC7: break; default: dev_dbg(dev, "unsupported mode: %d\n", mode); @@ -456,6 +588,16 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, goto out; } + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) { + int index = dc_mode_to_region_index(i); + + if (mode == i && !resource_size(&cxlds->dc_res[index])) { + dev_dbg(dev, "no available dynamic capacity\n"); + rc = -ENXIO; + goto out; + } + } + cxled->mode = mode; rc = 0; out: @@ -469,10 +611,12 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled, resource_size_t *skip_out) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - resource_size_t free_ram_start, free_pmem_start; + resource_size_t free_ram_start, free_pmem_start, free_dc_start; struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *dev = &cxled->cxld.dev; resource_size_t start, avail, skip; struct resource *p, *last; + int index; lockdep_assert_held(&cxl_dpa_rwsem); @@ -490,6 +634,20 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled, else free_pmem_start = cxlds->pmem_res.start; + /* + * One HDM Decoder per DC region to map memory with different + * DSMAS entry. + */ + index = dc_mode_to_region_index(cxled->mode); + if (index >= 0) { + if (cxlds->dc_res[index].child) { + dev_err(dev, "Cannot allocated DPA from DC Region: %d\n", + index); + return -EINVAL; + } + free_dc_start = cxlds->dc_res[index].start; + } + if (cxled->mode == CXL_DECODER_RAM) { start = free_ram_start; avail = cxlds->ram_res.end - start + 1; @@ -511,6 +669,29 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled, else skip_end = start - 1; skip = skip_end - skip_start + 1; + } else if (decoder_mode_is_dc(cxled->mode)) { + resource_size_t skip_start, skip_end; + + start = free_dc_start; + avail = cxlds->dc_res[index].end - start + 1; + if ((resource_size(&cxlds->pmem_res) == 0) || !cxlds->pmem_res.child) + skip_start = free_ram_start; + else + skip_start = free_pmem_start; + /* + * If some dc region is already mapped, then that allocation + * already handled the RAM and PMEM skip.Check for DC region + * skip. + */ + for (int i = index - 1; i >= 0 ; i--) { + if (cxlds->dc_res[i].child) { + skip_start = cxlds->dc_res[i].child->end + 1; + break; + } + } + + skip_end = start - 1; + skip = skip_end - skip_start + 1; } else { dev_dbg(cxled_dev(cxled), "mode not set\n"); avail = 0; @@ -548,10 +729,25 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) avail = cxl_dpa_freespace(cxled, &start, &skip); + dev_dbg(dev, "DPA Allocation start: %llx len: %llx Skip: %llx\n", + start, size, skip); if (size > avail) { + static const char * const names[] = { + [CXL_DECODER_NONE] = "none", + [CXL_DECODER_RAM] = "ram", + [CXL_DECODER_PMEM] = "pmem", + [CXL_DECODER_MIXED] = "mixed", + [CXL_DECODER_DC0] = "dc0", + [CXL_DECODER_DC1] = "dc1", + [CXL_DECODER_DC2] = "dc2", + [CXL_DECODER_DC3] = "dc3", + [CXL_DECODER_DC4] = "dc4", + [CXL_DECODER_DC5] = "dc5", + [CXL_DECODER_DC6] = "dc6", + [CXL_DECODER_DC7] = "dc7", + }; dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, - cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem", - &avail); + names[cxled->mode], &avail); rc = -ENOSPC; goto out; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 5e21b53362e6..a1a98aba24ed 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -195,6 +195,22 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr, mode = CXL_DECODER_PMEM; else if (sysfs_streq(buf, "ram")) mode = CXL_DECODER_RAM; + else if (sysfs_streq(buf, "dc0")) + mode = CXL_DECODER_DC0; + else if (sysfs_streq(buf, "dc1")) + mode = CXL_DECODER_DC1; + else if (sysfs_streq(buf, "dc2")) + mode = CXL_DECODER_DC2; + else if (sysfs_streq(buf, "dc3")) + mode = CXL_DECODER_DC3; + else if (sysfs_streq(buf, "dc4")) + mode = CXL_DECODER_DC4; + else if (sysfs_streq(buf, "dc5")) + mode = CXL_DECODER_DC5; + else if (sysfs_streq(buf, "dc6")) + mode = CXL_DECODER_DC6; + else if (sysfs_streq(buf, "dc7")) + mode = CXL_DECODER_DC7; else return -EINVAL; @@ -296,6 +312,7 @@ static struct attribute *cxl_decoder_root_attrs[] = { &dev_attr_target_list.attr, SET_CXL_REGION_ATTR(create_pmem_region) SET_CXL_REGION_ATTR(create_ram_region) + SET_CXL_DC_REGION_ATTR(create_dc_region) SET_CXL_REGION_ATTR(delete_region) NULL, }; @@ -1691,6 +1708,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) return ERR_PTR(-ENOMEM); cxled->pos = -1; + xa_init(&cxled->skip_res); cxld = &cxled->cxld; rc = cxl_decoder_init(port, cxld); if (rc) { diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 543c4499379e..144232c8305e 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1733,7 +1733,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, lockdep_assert_held_write(&cxl_region_rwsem); lockdep_assert_held_read(&cxl_dpa_rwsem); - if (cxled->mode != cxlr->mode) { + if (decoder_mode_is_dc(cxlr->mode) && !decoder_mode_is_dc(cxled->mode)) { dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); return -EINVAL; @@ -2211,6 +2211,14 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, switch (mode) { case CXL_DECODER_RAM: case CXL_DECODER_PMEM: + case CXL_DECODER_DC0: + case CXL_DECODER_DC1: + case CXL_DECODER_DC2: + case CXL_DECODER_DC3: + case CXL_DECODER_DC4: + case CXL_DECODER_DC5: + case CXL_DECODER_DC6: + case CXL_DECODER_DC7: break; default: dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); @@ -2321,6 +2329,43 @@ static ssize_t create_ram_region_store(struct device *dev, } DEVICE_ATTR_RW(create_ram_region); +static ssize_t store_dcN_region(struct cxl_root_decoder *cxlrd, + const char *buf, enum cxl_decoder_mode mode, + size_t len) +{ + struct cxl_region *cxlr; + int rc, id; + + rc = sscanf(buf, "region%d\n", &id); + if (rc != 1) + return -EINVAL; + + cxlr = __create_region(cxlrd, id, mode, CXL_DECODER_HOSTMEM); + if (IS_ERR(cxlr)) + return PTR_ERR(cxlr); + + return len; +} + +static ssize_t create_dc_region_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return __create_region_show(to_cxl_root_decoder(dev), buf); +} + +static ssize_t create_dc_region_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + /* + * All DC regions use decoder mode DC0 as the region does not need the + * index information + */ + return store_dcN_region(to_cxl_root_decoder(dev), buf, + CXL_DECODER_DC0, len); +} +DEVICE_ATTR_RW(create_dc_region); + static ssize_t region_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2799,6 +2844,61 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr) return rc; } +static void cxl_dc_region_release(void *data) +{ + struct cxl_region *cxlr = data; + struct cxl_dc_region *cxlr_dc = cxlr->cxlr_dc; + + xa_destroy(&cxlr_dc->dax_dev_list); + kfree(cxlr_dc); +} + +static int devm_cxl_add_dc_region(struct cxl_region *cxlr) +{ + struct cxl_dc_region *cxlr_dc; + struct cxl_dax_region *cxlr_dax; + struct device *dev; + int rc = 0; + + cxlr_dax = cxl_dax_region_alloc(cxlr); + if (IS_ERR(cxlr_dax)) + return PTR_ERR(cxlr_dax); + + cxlr_dc = kzalloc(sizeof(*cxlr_dc), GFP_KERNEL); + if (!cxlr_dc) { + rc = -ENOMEM; + goto err; + } + + dev = &cxlr_dax->dev; + rc = dev_set_name(dev, "dax_region%d", cxlr->id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent), + dev_name(dev)); + + rc = devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister, + cxlr_dax); + if (rc) + goto err; + + cxlr_dc->cxlr_dax = cxlr_dax; + xa_init(&cxlr_dc->dax_dev_list); + cxlr->cxlr_dc = cxlr_dc; + rc = devm_add_action_or_reset(&cxlr->dev, cxl_dc_region_release, cxlr); + if (!rc) + return 0; +err: + put_device(dev); + kfree(cxlr_dc); + return rc; +} + static int match_decoder_by_range(struct device *dev, void *data) { struct range *r1, *r2 = data; @@ -3140,6 +3240,19 @@ static int is_system_ram(struct resource *res, void *arg) return 1; } +/* + * The region can not be manged by CXL if any portion of + * it is already online as 'System RAM' + */ +static bool region_is_system_ram(struct cxl_region *cxlr, + struct cxl_region_params *p) +{ + return (walk_iomem_res_desc(IORES_DESC_NONE, + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, + p->res->start, p->res->end, cxlr, + is_system_ram) > 0); +} + static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); @@ -3174,14 +3287,7 @@ static int cxl_region_probe(struct device *dev) case CXL_DECODER_PMEM: return devm_cxl_add_pmem_region(cxlr); case CXL_DECODER_RAM: - /* - * The region can not be manged by CXL if any portion of - * it is already online as 'System RAM' - */ - if (walk_iomem_res_desc(IORES_DESC_NONE, - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, - p->res->start, p->res->end, cxlr, - is_system_ram) > 0) + if (region_is_system_ram(cxlr, p)) return 0; /* @@ -3193,6 +3299,17 @@ static int cxl_region_probe(struct device *dev) /* HDM-H routes to device-dax */ return devm_cxl_add_dax_region(cxlr); + case CXL_DECODER_DC0: + case CXL_DECODER_DC1: + case CXL_DECODER_DC2: + case CXL_DECODER_DC3: + case CXL_DECODER_DC4: + case CXL_DECODER_DC5: + case CXL_DECODER_DC6: + case CXL_DECODER_DC7: + if (region_is_system_ram(cxlr, p)) + return 0; + return devm_cxl_add_dc_region(cxlr); default: dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", cxlr->mode); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 8400af85d99f..7ac1237938b7 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -335,6 +335,14 @@ enum cxl_decoder_mode { CXL_DECODER_NONE, CXL_DECODER_RAM, CXL_DECODER_PMEM, + CXL_DECODER_DC0, + CXL_DECODER_DC1, + CXL_DECODER_DC2, + CXL_DECODER_DC3, + CXL_DECODER_DC4, + CXL_DECODER_DC5, + CXL_DECODER_DC6, + CXL_DECODER_DC7, CXL_DECODER_MIXED, CXL_DECODER_DEAD, }; @@ -345,6 +353,14 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) [CXL_DECODER_NONE] = "none", [CXL_DECODER_RAM] = "ram", [CXL_DECODER_PMEM] = "pmem", + [CXL_DECODER_DC0] = "dc0", + [CXL_DECODER_DC1] = "dc1", + [CXL_DECODER_DC2] = "dc2", + [CXL_DECODER_DC3] = "dc3", + [CXL_DECODER_DC4] = "dc4", + [CXL_DECODER_DC5] = "dc5", + [CXL_DECODER_DC6] = "dc6", + [CXL_DECODER_DC7] = "dc7", [CXL_DECODER_MIXED] = "mixed", }; @@ -353,6 +369,11 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +static inline bool decoder_mode_is_dc(enum cxl_decoder_mode mode) +{ + return (mode >= CXL_DECODER_DC0 && mode <= CXL_DECODER_DC7); +} + /* * Track whether this decoder is reserved for region autodiscovery, or * free for userspace provisioning. @@ -375,6 +396,7 @@ struct cxl_endpoint_decoder { struct cxl_decoder cxld; struct resource *dpa_res; resource_size_t skip; + struct xarray skip_res; enum cxl_decoder_mode mode; enum cxl_decoder_state state; int pos; @@ -475,6 +497,11 @@ struct cxl_region_params { */ #define CXL_REGION_F_AUTO 1 +struct cxl_dc_region { + struct xarray dax_dev_list; + struct cxl_dax_region *cxlr_dax; +}; + /** * struct cxl_region - CXL region * @dev: This region's device @@ -493,6 +520,7 @@ struct cxl_region { enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; + struct cxl_dc_region *cxlr_dc; unsigned long flags; struct cxl_region_params params; }; diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index ccdf8de85bd5..eb5eb81bfbd7 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -23,11 +23,15 @@ static int cxl_dax_region_probe(struct device *dev) if (!dax_region) return -ENOMEM; + if (decoder_mode_is_dc(cxlr->mode)) + return 0; + data = (struct dev_dax_data) { .dax_region = dax_region, .id = -1, .size = range_len(&cxlr_dax->hpa_range), }; + dev_dax = devm_create_dev_dax(&data); if (IS_ERR(dev_dax)) return PTR_ERR(dev_dax); From patchwork Wed Jun 14 19:16:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13280395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5E31EB64D8 for ; Wed, 14 Jun 2023 19:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233083AbjFNTTq (ORCPT ); Wed, 14 Jun 2023 15:19:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229576AbjFNTTp (ORCPT ); Wed, 14 Jun 2023 15:19:45 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 954EB2135 for ; Wed, 14 Jun 2023 12:19:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686770384; x=1718306384; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to; bh=wc3sJukGyeQA36C7Ys0D/O75E9VKGDdHLKsuYCJXNV0=; b=DB+alultGLMxge1Xi++nfZwlGZgLUiXV0yYcwS0P4jvSLBd+Ggwi6X5Q KB92dwccXd7rs/YD1xqqUezj7yNy9p8ln8IA4YA5M2Fs0MTJJ1Mz5SSwx k7ZmqugjMEUTUEoHwTODG5d6sOp8aSfRt0yv8n+m9T6l1boZGChCGA4XC lHEjkFlXsinTCTPecg2EnST0rgKIJoXhzhyCE1k7I3xsmKqLMLygZ2Dlr e1ZWlJ9DnS2VlwXqOA0+N4aC9B+ItYPBz5xd9FcDZDJ+aUD3LKrKLfUHo Gnz2SPv5ICebgxarNAUo+Pv15dzU8YC9a2Zv6Xvk1mloT+5bdIUA6Qim4 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="338347329" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="338347329" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="886384262" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="886384262" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.212.116.198]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:42 -0700 From: ira.weiny@intel.com Date: Wed, 14 Jun 2023 12:16:30 -0700 Subject: [PATCH 3/5] cxl/mem : Expose dynamic capacity configuration to userspace MIME-Version: 1.0 Message-Id: <20230604-dcd-type2-upstream-v1-3-71b6341bae54@intel.com> References: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> In-Reply-To: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> To: Navneet Singh , Fan Ni , Jonathan Cameron , Ira Weiny , Dan Williams , linux-cxl@vger.kernel.org X-Mailer: b4 0.13-dev-9a8cd X-Developer-Signature: v=1; a=ed25519-sha256; t=1686770367; l=3766; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=fljXcPz+ZQgKoffq99siBW3SK44DWVen0PdBI49pnCM=; b=+M2qKT6J7yGGimy4rXyHTr1x1qQJ2lTdEkAGcMTfPebtq7VOu3ZHcl3lQxT+01YYQm4ksyUXw O1ZsrSSHtwlAx9gNlabVqPHnOfrVMc72hYYFwWIp/8cMxlEz0J+ntvx X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Navneet Singh Exposing driver cached dynamic capacity configuration through sysfs attributes.User will create one or more dynamic capacity cxl regions based on this information and map the dynamic capacity of the device into HDM ranges using one or more HDM decoders. Signed-off-by: Navneet Singh --- [iweiny: fixups] [djbw: fixups, no sign-off: preview only] --- drivers/cxl/core/memdev.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 5d1ba7a72567..beeb5fa3a0aa 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -99,6 +99,20 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr, static struct device_attribute dev_attr_pmem_size = __ATTR(size, 0444, pmem_size_show, NULL); +static ssize_t dc_regions_count_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + int len = 0; + + len = sysfs_emit(buf, "0x%x\n", mds->nr_dc_region); + return len; +} + +struct device_attribute dev_attr_dc_regions_count = + __ATTR(dc_regions_count, 0444, dc_regions_count_show, NULL); + static ssize_t serial_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -362,6 +376,57 @@ static struct attribute *cxl_memdev_ram_attributes[] = { NULL, }; +static ssize_t show_size_regionN(struct cxl_memdev *cxlmd, char *buf, int pos) +{ + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + return sysfs_emit(buf, "0x%llx\n", mds->dc_region[pos].decode_len); +} + +#define SIZE_ATTR_RO(n) \ +static ssize_t dc##n##_size_show( \ + struct device *dev, struct device_attribute *attr, char *buf) \ +{ \ + return show_size_regionN(to_cxl_memdev(dev), buf, (n)); \ +} \ +static DEVICE_ATTR_RO(dc##n##_size) +SIZE_ATTR_RO(0); +SIZE_ATTR_RO(1); +SIZE_ATTR_RO(2); +SIZE_ATTR_RO(3); +SIZE_ATTR_RO(4); +SIZE_ATTR_RO(5); +SIZE_ATTR_RO(6); +SIZE_ATTR_RO(7); + +static struct attribute *cxl_memdev_dc_attributes[] = { + &dev_attr_dc0_size.attr, + &dev_attr_dc1_size.attr, + &dev_attr_dc2_size.attr, + &dev_attr_dc3_size.attr, + &dev_attr_dc4_size.attr, + &dev_attr_dc5_size.attr, + &dev_attr_dc6_size.attr, + &dev_attr_dc7_size.attr, + &dev_attr_dc_regions_count.attr, + NULL, +}; + +static umode_t cxl_dc_visible(struct kobject *kobj, struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + if (a == &dev_attr_dc_regions_count.attr) + return a->mode; + + if (n < mds->nr_dc_region) + return a->mode; + + return 0; +} + static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a, int n) { @@ -385,10 +450,17 @@ static struct attribute_group cxl_memdev_pmem_attribute_group = { .attrs = cxl_memdev_pmem_attributes, }; +static struct attribute_group cxl_memdev_dc_attribute_group = { + .name = "dc", + .attrs = cxl_memdev_dc_attributes, + .is_visible = cxl_dc_visible, +}; + static const struct attribute_group *cxl_memdev_attribute_groups[] = { &cxl_memdev_attribute_group, &cxl_memdev_ram_attribute_group, &cxl_memdev_pmem_attribute_group, + &cxl_memdev_dc_attribute_group, NULL, }; From patchwork Wed Jun 14 19:16:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13280397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAAD7EB64DB for ; Wed, 14 Jun 2023 19:20:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233317AbjFNTUB (ORCPT ); Wed, 14 Jun 2023 15:20:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235944AbjFNTTw (ORCPT ); Wed, 14 Jun 2023 15:19:52 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7497C2135 for ; Wed, 14 Jun 2023 12:19:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686770389; x=1718306389; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to; bh=jQg/IKNsIX3ExQXTsdU1UOMrpqCfaC2ONKPLm68j8tI=; b=LScQhJwDypnK43ISRE28FdmJzOaCF1lS0k0rtCAQ+4SiEpH2z90OQ9T+ r35tYFufYdSumOMLB/oKw9f7cqBQoKRPYxMCaE/v+zpivZGw+ijw5JaMP K4L8WSBp/CXOhbNNi3D48lKuKKnPyKp8scVSI6yTQhUdSDS5OyEHA7AYk MpWrtrB5mU4ISosO5FvnxvjcE4HYvgWAvAjuqqDWc7q9y+AfTwtfItHic i3XCfjZZlM2rAUtV5GREafLvd+8fOvEz+oil1NYguUfP8b+zgtwZYq+Ue edfcb3aqGTFmy7NZ9sX7y9cnqJYijwBOjLjczZWSaZj3aB9YLCsSeFP0u g==; X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="338347360" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="338347360" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="886384288" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="886384288" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.212.116.198]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:46 -0700 From: ira.weiny@intel.com Date: Wed, 14 Jun 2023 12:16:31 -0700 Subject: [PATCH 4/5] cxl/mem: Add support to handle DCD add and release capacity events. MIME-Version: 1.0 Message-Id: <20230604-dcd-type2-upstream-v1-4-71b6341bae54@intel.com> References: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> In-Reply-To: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> To: Navneet Singh , Fan Ni , Jonathan Cameron , Ira Weiny , Dan Williams , linux-cxl@vger.kernel.org X-Mailer: b4 0.13-dev-9a8cd X-Developer-Signature: v=1; a=ed25519-sha256; t=1686770367; l=28894; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=lnyIy7/2CBuaRNyRNHpGdKUJC3jhjcbKwn6Z+csYfrs=; b=VT6kYxcug1TOI9pztroMKPntR4gRbs+47LwMa5KZWythdmdjxzww3uCOU3KLfE4+pINgu1yQT /TU/FKWGmXLCC5e+fso43tW2wM47CjB8vbZZEqd/5xNLLO24kORPL6C X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Navneet Singh A dynamic capacity device utilizes events to signal the host about the changes to the allocation of DC blocks. The device communicates the state of these blocks of dynamic capacity through an extent list that describes the starting DPA and length of all blocks the host can access. Based on the dynamic capacity add or release event type, dynamic memory represented by the extents are either added or removed as devdax device. Process the dynamic capacity add and release events. Signed-off-by: Navneet Singh --- [iweiny: Remove invalid comment] --- drivers/cxl/core/mbox.c | 345 +++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/core/region.c | 214 +++++++++++++++++++++++++++- drivers/cxl/core/trace.h | 3 +- drivers/cxl/cxl.h | 4 +- drivers/cxl/cxlmem.h | 76 ++++++++++ drivers/cxl/pci.c | 10 +- drivers/dax/bus.c | 11 +- drivers/dax/bus.h | 5 +- 8 files changed, 652 insertions(+), 16 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index c5b696737c87..db9295216de5 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -767,6 +767,14 @@ static const uuid_t log_uuid[] = { [VENDOR_DEBUG_UUID] = DEFINE_CXL_VENDOR_DEBUG_UUID, }; +/* See CXL 3.0 8.2.9.2.1.5 */ +enum dc_event { + ADD_CAPACITY, + RELEASE_CAPACITY, + FORCED_CAPACITY_RELEASE, + REGION_CONFIGURATION_UPDATED, +}; + /** * cxl_enumerate_cmds() - Enumerate commands for a device. * @mds: The driver data for the operation @@ -852,6 +860,14 @@ static const uuid_t mem_mod_event_uuid = UUID_INIT(0xfe927475, 0xdd59, 0x4339, 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74); +/* + * Dynamic Capacity Event Record + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45 + */ +static const uuid_t dc_event_uuid = + UUID_INIT(0xca95afa7, 0xf183, 0x4018, 0x8c, + 0x2f, 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a); + static void cxl_event_trace_record(const struct cxl_memdev *cxlmd, enum cxl_event_log_type type, struct cxl_event_record_raw *record) @@ -945,6 +961,188 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds, return rc; } +static int cxl_send_dc_cap_response(struct cxl_memdev_state *mds, + struct cxl_mbox_dc_response *res, + int extent_cnt, int opcode) +{ + struct cxl_mbox_cmd mbox_cmd; + int rc, size; + + size = struct_size(res, extent_list, extent_cnt); + res->extent_list_size = cpu_to_le32(extent_cnt); + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = opcode, + .size_in = size, + .payload_in = res, + }; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + + return rc; + +} + +static int cxl_prepare_ext_list(struct cxl_mbox_dc_response **res, + int *n, struct range *extent) +{ + struct cxl_mbox_dc_response *dc_res; + unsigned int size; + + if (!extent) + size = struct_size(dc_res, extent_list, 0); + else + size = struct_size(dc_res, extent_list, *n + 1); + + dc_res = krealloc(*res, size, GFP_KERNEL); + if (!dc_res) + return -ENOMEM; + + if (extent) { + dc_res->extent_list[*n].dpa_start = cpu_to_le64(extent->start); + memset(dc_res->extent_list[*n].reserved, 0, 8); + dc_res->extent_list[*n].length = + cpu_to_le64(range_len(extent)); + (*n)++; + } + + *res = dc_res; + return 0; +} +/** + * cxl_handle_dcd_event_records() - Read DCD event records. + * @mds: The memory device state + * + * Returns 0 if enumerate completed successfully. + * + * CXL devices can generate DCD events to add or remove extents in the list. + */ +static int cxl_handle_dcd_event_records(struct cxl_memdev_state *mds, + struct cxl_event_record_raw *rec) +{ + struct cxl_mbox_dc_response *dc_res = NULL; + struct device *dev = mds->cxlds.dev; + uuid_t *id = &rec->hdr.id; + struct dcd_event_dyn_cap *record = + (struct dcd_event_dyn_cap *)rec; + int extent_cnt = 0, rc = 0; + struct cxl_dc_extent_data *extent; + struct range alloc_range, rel_range; + resource_size_t dpa, size; + + if (!uuid_equal(id, &dc_event_uuid)) + return -EINVAL; + + switch (record->data.event_type) { + case ADD_CAPACITY: + extent = devm_kzalloc(dev, sizeof(*extent), GFP_ATOMIC); + if (!extent) + return -ENOMEM; + + extent->dpa_start = le64_to_cpu(record->data.extent.start_dpa); + extent->length = le64_to_cpu(record->data.extent.length); + memcpy(extent->tag, record->data.extent.tag, + sizeof(record->data.extent.tag)); + extent->shared_extent_seq = + le16_to_cpu(record->data.extent.shared_extn_seq); + dev_dbg(dev, "Add DC extent DPA:0x%llx LEN:%llx\n", + extent->dpa_start, extent->length); + alloc_range = (struct range) { + .start = extent->dpa_start, + .end = extent->dpa_start + extent->length - 1, + }; + + rc = cxl_add_dc_extent(mds, &alloc_range); + if (rc < 0) { + dev_dbg(dev, "unconsumed DC extent DPA:0x%llx LEN:%llx\n", + extent->dpa_start, extent->length); + rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, NULL); + if (rc < 0) { + dev_err(dev, "Couldn't create extent list %d\n", + rc); + devm_kfree(dev, extent); + return rc; + } + + rc = cxl_send_dc_cap_response(mds, dc_res, + extent_cnt, CXL_MBOX_OP_ADD_DC_RESPONSE); + if (rc < 0) { + devm_kfree(dev, extent); + goto out; + } + + kfree(dc_res); + devm_kfree(dev, extent); + + return 0; + } + + rc = xa_insert(&mds->dc_extent_list, extent->dpa_start, extent, + GFP_KERNEL); + if (rc < 0) + goto out; + + mds->num_dc_extents++; + rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, &alloc_range); + if (rc < 0) { + dev_err(dev, "Couldn't create extent list %d\n", rc); + return rc; + } + + rc = cxl_send_dc_cap_response(mds, dc_res, extent_cnt, + CXL_MBOX_OP_ADD_DC_RESPONSE); + if (rc < 0) + goto out; + + break; + + case RELEASE_CAPACITY: + dpa = le64_to_cpu(record->data.extent.start_dpa); + size = le64_to_cpu(record->data.extent.length); + dev_dbg(dev, "Release DC extents DPA:0x%llx LEN:%llx\n", + dpa, size); + extent = xa_load(&mds->dc_extent_list, dpa); + if (!extent) { + dev_err(dev, "No extent found with DPA:0x%llx\n", dpa); + return -EINVAL; + } + + rel_range = (struct range) { + .start = dpa, + .end = dpa + size - 1, + }; + + rc = cxl_release_dc_extent(mds, &rel_range); + if (rc < 0) { + dev_dbg(dev, "withhold DC extent DPA:0x%llx LEN:%llx\n", + dpa, size); + return 0; + } + + xa_erase(&mds->dc_extent_list, dpa); + devm_kfree(dev, extent); + mds->num_dc_extents--; + rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, &rel_range); + if (rc < 0) { + dev_err(dev, "Couldn't create extent list %d\n", rc); + return rc; + } + + rc = cxl_send_dc_cap_response(mds, dc_res, extent_cnt, + CXL_MBOX_OP_RELEASE_DC); + if (rc < 0) + goto out; + + break; + + default: + return -EINVAL; + } +out: + kfree(dc_res); + return rc; +} + static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, enum cxl_event_log_type type) { @@ -982,9 +1180,17 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, if (!nr_rec) break; - for (i = 0; i < nr_rec; i++) + for (i = 0; i < nr_rec; i++) { cxl_event_trace_record(cxlmd, type, &payload->records[i]); + if (type == CXL_EVENT_TYPE_DCD) { + rc = cxl_handle_dcd_event_records(mds, + &payload->records[i]); + if (rc) + dev_err_ratelimited(dev, + "dcd event failed: %d\n", rc); + } + } if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW) trace_cxl_overflow(cxlmd, type, payload); @@ -1024,6 +1230,8 @@ void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status) cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_WARN); if (status & CXLDEV_EVENT_STATUS_INFO) cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_INFO); + if (status & CXLDEV_EVENT_STATUS_DCD) + cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_DCD); } EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL); @@ -1244,6 +1452,140 @@ int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); +int cxl_dev_get_dc_extent_cnt(struct cxl_memdev_state *mds, + unsigned int *extent_gen_num) +{ + struct device *dev = mds->cxlds.dev; + struct cxl_mbox_dc_extents *dc_extents; + struct cxl_mbox_get_dc_extent get_dc_extent; + unsigned int total_extent_cnt; + struct cxl_mbox_cmd mbox_cmd; + int rc; + + /* Check GET_DC_EXTENT_LIST is supported by device */ + if (!test_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds)) { + dev_dbg(dev, "unsupported cmd : get dyn cap extent list\n"); + return 0; + } + + dc_extents = kvmalloc(mds->payload_size, GFP_KERNEL); + if (!dc_extents) + return -ENOMEM; + + get_dc_extent = (struct cxl_mbox_get_dc_extent) { + .extent_cnt = 0, + .start_extent_index = 0, + }; + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST, + .payload_in = &get_dc_extent, + .size_in = sizeof(get_dc_extent), + .size_out = mds->payload_size, + .payload_out = dc_extents, + .min_out = 1, + }; + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + goto out; + + total_extent_cnt = le32_to_cpu(dc_extents->total_extent_cnt); + *extent_gen_num = le32_to_cpu(dc_extents->extent_list_num); + dev_dbg(dev, "Total extent count :%d Extent list Generation Num: %d\n", + total_extent_cnt, *extent_gen_num); +out: + + kvfree(dc_extents); + if (rc < 0) + return rc; + + return total_extent_cnt; + +} +EXPORT_SYMBOL_NS_GPL(cxl_dev_get_dc_extent_cnt, CXL); + +int cxl_dev_get_dc_extents(struct cxl_memdev_state *mds, + unsigned int index, unsigned int cnt) +{ + /* See CXL 3.0 Table 125 dynamic capacity config Output Payload */ + struct device *dev = mds->cxlds.dev; + struct cxl_mbox_dc_extents *dc_extents; + struct cxl_mbox_get_dc_extent get_dc_extent; + unsigned int extent_gen_num, available_extents, total_extent_cnt; + int rc; + struct cxl_dc_extent_data *extent; + struct cxl_mbox_cmd mbox_cmd; + struct range alloc_range; + + /* Check GET_DC_EXTENT_LIST is supported by device */ + if (!test_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds)) { + dev_dbg(dev, "unsupported cmd : get dyn cap extent list\n"); + return 0; + } + + dc_extents = kvmalloc(mds->payload_size, GFP_KERNEL); + if (!dc_extents) + return -ENOMEM; + get_dc_extent = (struct cxl_mbox_get_dc_extent) { + .extent_cnt = cnt, + .start_extent_index = index, + }; + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST, + .payload_in = &get_dc_extent, + .size_in = sizeof(get_dc_extent), + .size_out = mds->payload_size, + .payload_out = dc_extents, + .min_out = 1, + }; + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + goto out; + + available_extents = le32_to_cpu(dc_extents->ret_extent_cnt); + total_extent_cnt = le32_to_cpu(dc_extents->total_extent_cnt); + extent_gen_num = le32_to_cpu(dc_extents->extent_list_num); + dev_dbg(dev, "No Total extent count :%d Extent list Generation Num:%d\n", + total_extent_cnt, extent_gen_num); + + + for (int i = 0; i < available_extents ; i++) { + extent = devm_kzalloc(dev, sizeof(*extent), GFP_KERNEL); + if (!extent) { + rc = -ENOMEM; + goto out; + } + extent->dpa_start = le64_to_cpu(dc_extents->extent[i].start_dpa); + extent->length = le64_to_cpu(dc_extents->extent[i].length); + memcpy(extent->tag, dc_extents->extent[i].tag, + sizeof(dc_extents->extent[i].tag)); + extent->shared_extent_seq = + le16_to_cpu(dc_extents->extent[i].shared_extn_seq); + dev_dbg(dev, "dynamic capacity extent[%d] DPA:0x%llx LEN:%llx\n", + i, extent->dpa_start, extent->length); + + alloc_range = (struct range){ + .start = extent->dpa_start, + .end = extent->dpa_start + extent->length - 1, + }; + + rc = cxl_add_dc_extent(mds, &alloc_range); + if (rc < 0) + goto out; + rc = xa_insert(&mds->dc_extent_list, extent->dpa_start, extent, + GFP_KERNEL); + } + +out: + kvfree(dc_extents); + if (rc < 0) + return rc; + + return available_extents; +} +EXPORT_SYMBOL_NS_GPL(cxl_dev_get_dc_extents, CXL); + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) @@ -1452,6 +1794,7 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mutex_init(&mds->event.log_lock); mds->cxlds.dev = dev; mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; + xa_init(&mds->dc_extent_list); return mds; } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 144232c8305e..ba45c1c3b0a9 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ #include +#include #include #include #include @@ -11,6 +12,8 @@ #include #include #include "core.h" +#include "../../dax/bus.h" +#include "../../dax/dax-private.h" /** * DOC: cxl core region @@ -166,6 +169,38 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) return 0; } +static int cxl_region_manage_dc(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + unsigned int extent_gen_num; + int i, rc; + + /* Designed for Non Interleaving flow with the assumption one + * cxl_region will map the complete device DC region's DPA range + */ + for (i = 0; i < p->nr_targets; i++) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + rc = cxl_dev_get_dc_extent_cnt(mds, &extent_gen_num); + if (rc < 0) + goto err; + else if (rc > 1) { + rc = cxl_dev_get_dc_extents(mds, rc, 0); + if (rc < 0) + goto err; + mds->num_dc_extents = rc; + mds->dc_extents_index = rc - 1; + } + mds->dc_list_gen_num = extent_gen_num; + dev_dbg(mds->cxlds.dev, "No of preallocated extents :%d\n", rc); + } + return 0; +err: + return rc; +} + static int commit_decoder(struct cxl_decoder *cxld) { struct cxl_switch_decoder *cxlsd = NULL; @@ -2865,11 +2900,14 @@ static int devm_cxl_add_dc_region(struct cxl_region *cxlr) return PTR_ERR(cxlr_dax); cxlr_dc = kzalloc(sizeof(*cxlr_dc), GFP_KERNEL); - if (!cxlr_dc) { - rc = -ENOMEM; - goto err; - } + if (!cxlr_dc) + return -ENOMEM; + rc = request_module("dax_cxl"); + if (rc) { + dev_err(dev, "failed to load dax-ctl module\n"); + goto load_err; + } dev = &cxlr_dax->dev; rc = dev_set_name(dev, "dax_region%d", cxlr->id); if (rc) @@ -2891,10 +2929,24 @@ static int devm_cxl_add_dc_region(struct cxl_region *cxlr) xa_init(&cxlr_dc->dax_dev_list); cxlr->cxlr_dc = cxlr_dc; rc = devm_add_action_or_reset(&cxlr->dev, cxl_dc_region_release, cxlr); - if (!rc) - return 0; + if (rc) + goto err; + + if (!dev->driver) { + dev_err(dev, "%s Driver not attached\n", dev_name(dev)); + rc = -ENXIO; + goto err; + } + + rc = cxl_region_manage_dc(cxlr); + if (rc) + goto err; + + return 0; + err: put_device(dev); +load_err: kfree(cxlr_dc); return rc; } @@ -3076,6 +3128,156 @@ struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd, } EXPORT_SYMBOL_NS_GPL(cxl_create_region, CXL); +static int match_ep_decoder_by_range(struct device *dev, void *data) +{ + struct cxl_endpoint_decoder *cxled; + struct range *dpa_range = data; + + if (!is_endpoint_decoder(dev)) + return 0; + + cxled = to_cxl_endpoint_decoder(dev); + if (!cxled->cxld.region) + return 0; + + if (cxled->dpa_res->start <= dpa_range->start && + cxled->dpa_res->end >= dpa_range->end) + return 1; + + return 0; +} + +int cxl_release_dc_extent(struct cxl_memdev_state *mds, + struct range *rel_range) +{ + struct cxl_memdev *cxlmd = mds->cxlds.cxlmd; + struct cxl_endpoint_decoder *cxled; + struct cxl_dc_region *cxlr_dc; + struct dax_region *dax_region; + resource_size_t dpa_offset; + struct cxl_region *cxlr; + struct range hpa_range; + struct dev_dax *dev_dax; + resource_size_t hpa; + struct device *dev; + int ranges, rc = 0; + + /* + * Find the cxl endpoind decoder with which has the extent dpa range and + * get the cxl_region, dax_region refrences. + */ + dev = device_find_child(&cxlmd->endpoint->dev, rel_range, + match_ep_decoder_by_range); + if (!dev) { + dev_err(mds->cxlds.dev, "%pr not mapped\n", rel_range); + return PTR_ERR(dev); + } + + cxled = to_cxl_endpoint_decoder(dev); + hpa_range = cxled->cxld.hpa_range; + cxlr = cxled->cxld.region; + cxlr_dc = cxlr->cxlr_dc; + + /* DPA to HPA translation */ + if (cxled->cxld.interleave_ways == 1) { + dpa_offset = rel_range->start - cxled->dpa_res->start; + hpa = hpa_range.start + dpa_offset; + } else { + dev_err(mds->cxlds.dev, "Interleaving DC not supported\n"); + return -EINVAL; + } + + dev_dax = xa_load(&cxlr_dc->dax_dev_list, hpa); + if (!dev_dax) + return -EINVAL; + + dax_region = dev_dax->region; + ranges = dev_dax->nr_range; + + while (ranges) { + int i = ranges - 1; + struct dax_mapping *mapping = dev_dax->ranges[i].mapping; + + devm_release_action(dax_region->dev, unregister_dax_mapping, + &mapping->dev); + ranges--; + } + + dev_dbg(mds->cxlds.dev, "removing devdax device:%s\n", + dev_name(&dev_dax->dev)); + devm_release_action(dax_region->dev, unregister_dev_dax, + &dev_dax->dev); + xa_erase(&cxlr_dc->dax_dev_list, hpa); + + return rc; +} + +int cxl_add_dc_extent(struct cxl_memdev_state *mds, struct range *alloc_range) +{ + struct cxl_memdev *cxlmd = mds->cxlds.cxlmd; + struct cxl_endpoint_decoder *cxled; + struct cxl_dax_region *cxlr_dax; + struct cxl_dc_region *cxlr_dc; + struct dax_region *dax_region; + resource_size_t dpa_offset; + struct dev_dax_data data; + struct dev_dax *dev_dax; + struct cxl_region *cxlr; + struct range hpa_range; + resource_size_t hpa; + struct device *dev; + int rc; + + /* + * Find the cxl endpoind decoder with which has the extent dpa range and + * get the cxl_region, dax_region refrences. + */ + dev = device_find_child(&cxlmd->endpoint->dev, alloc_range, + match_ep_decoder_by_range); + if (!dev) { + dev_err(mds->cxlds.dev, "%pr not mapped\n", alloc_range); + return PTR_ERR(dev); + } + + cxled = to_cxl_endpoint_decoder(dev); + hpa_range = cxled->cxld.hpa_range; + cxlr = cxled->cxld.region; + cxlr_dc = cxlr->cxlr_dc; + cxlr_dax = cxlr_dc->cxlr_dax; + dax_region = dev_get_drvdata(&cxlr_dax->dev); + + /* DPA to HPA translation */ + if (cxled->cxld.interleave_ways == 1) { + dpa_offset = alloc_range->start - cxled->dpa_res->start; + hpa = hpa_range.start + dpa_offset; + } else { + dev_err(mds->cxlds.dev, "Interleaving DC not supported\n"); + return -EINVAL; + } + + data = (struct dev_dax_data) { + .dax_region = dax_region, + .id = -1, + .size = 0, + }; + + dev_dax = devm_create_dev_dax(&data); + if (IS_ERR(dev_dax)) + return PTR_ERR(dev_dax); + + if (IS_ALIGNED(range_len(alloc_range), max_t(unsigned long, + dev_dax->align, memremap_compat_align()))) { + rc = alloc_dev_dax_range(dev_dax, hpa, + range_len(alloc_range)); + if (rc) + return rc; + } + + rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL); + + return rc; +} + /* Establish an empty region covering the given HPA range */ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled) diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index a0b5819bc70b..e11651255780 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -122,7 +122,8 @@ TRACE_EVENT(cxl_aer_correctable_error, { CXL_EVENT_TYPE_INFO, "Informational" }, \ { CXL_EVENT_TYPE_WARN, "Warning" }, \ { CXL_EVENT_TYPE_FAIL, "Failure" }, \ - { CXL_EVENT_TYPE_FATAL, "Fatal" }) + { CXL_EVENT_TYPE_FATAL, "Fatal" }, \ + { CXL_EVENT_TYPE_DCD, "DCD" }) TRACE_EVENT(cxl_overflow, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 7ac1237938b7..60c436b7ebb1 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -163,11 +163,13 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw) #define CXLDEV_EVENT_STATUS_WARN BIT(1) #define CXLDEV_EVENT_STATUS_FAIL BIT(2) #define CXLDEV_EVENT_STATUS_FATAL BIT(3) +#define CXLDEV_EVENT_STATUS_DCD BIT(4) #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO | \ CXLDEV_EVENT_STATUS_WARN | \ CXLDEV_EVENT_STATUS_FAIL | \ - CXLDEV_EVENT_STATUS_FATAL) + CXLDEV_EVENT_STATUS_FATAL| \ + CXLDEV_EVENT_STATUS_DCD) /* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */ #define CXLDEV_EVENT_INT_MODE_MASK GENMASK(1, 0) diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 9c0b2fa72bdd..0440b5c04ef6 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -5,6 +5,7 @@ #include #include #include +#include #include "cxl.h" /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ @@ -226,6 +227,7 @@ struct cxl_event_interrupt_policy { u8 warn_settings; u8 failure_settings; u8 fatal_settings; + u8 dyncap_settings; } __packed; /** @@ -296,6 +298,13 @@ enum cxl_devtype { #define CXL_MAX_DC_REGION 8 #define CXL_DC_REGION_SRTLEN 8 +struct cxl_dc_extent_data { + u64 dpa_start; + u64 length; + u8 tag[16]; + u16 shared_extent_seq; +}; + /** * struct cxl_dev_state - The driver device state * @@ -406,6 +415,11 @@ struct cxl_memdev_state { u8 flags; } dc_region[CXL_MAX_DC_REGION]; + u32 dc_list_gen_num; + u32 dc_extents_index; + struct xarray dc_extent_list; + u32 num_dc_extents; + size_t dc_event_log_size; struct cxl_event_state event; struct cxl_poison_state poison; @@ -470,6 +484,17 @@ enum cxl_opcode { UUID_INIT(0xe1819d9, 0x11a9, 0x400c, 0x81, 0x1f, 0xd6, 0x07, 0x19, \ 0x40, 0x3d, 0x86) + +struct cxl_mbox_dc_response { + __le32 extent_list_size; + u8 reserved[4]; + struct updated_extent_list { + __le64 dpa_start; + __le64 length; + u8 reserved[8]; + } __packed extent_list[]; +} __packed; + struct cxl_mbox_get_supported_logs { __le16 entries; u8 rsvd[6]; @@ -555,6 +580,7 @@ enum cxl_event_log_type { CXL_EVENT_TYPE_WARN, CXL_EVENT_TYPE_FAIL, CXL_EVENT_TYPE_FATAL, + CXL_EVENT_TYPE_DCD, CXL_EVENT_TYPE_MAX }; @@ -639,6 +665,35 @@ struct cxl_event_mem_module { u8 reserved[0x3d]; } __packed; +/* + * Dynamic Capacity Event Record + * CXL rev 3.0 section 8.2.9.2.1.5; Table 8-47 + */ + +#define CXL_EVENT_DC_TAG_SIZE 0x10 +struct cxl_dc_extent { + __le64 start_dpa; + __le64 length; + u8 tag[CXL_EVENT_DC_TAG_SIZE]; + __le16 shared_extn_seq; + u8 reserved[6]; +} __packed; + +struct dcd_record_data { + u8 event_type; + u8 reserved; + __le16 host_id; + u8 region_index; + u8 reserved1[3]; + struct cxl_dc_extent extent; + u8 reserved2[32]; +} __packed; + +struct dcd_event_dyn_cap { + struct cxl_event_record_hdr hdr; + struct dcd_record_data data; +} __packed; + struct cxl_mbox_get_partition_info { __le64 active_volatile_cap; __le64 active_persistent_cap; @@ -684,6 +739,19 @@ struct cxl_mbox_dynamic_capacity { #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0) #define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0) +struct cxl_mbox_get_dc_extent { + __le32 extent_cnt; + __le32 start_extent_index; +} __packed; + +struct cxl_mbox_dc_extents { + __le32 ret_extent_cnt; + __le32 total_extent_cnt; + __le32 extent_list_num; + u8 rsvd[4]; + struct cxl_dc_extent extent[]; +} __packed; + /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */ struct cxl_mbox_set_timestamp_in { __le64 timestamp; @@ -826,6 +894,14 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd); int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa); int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa); +/* FIXME why not have these be static in mbox.c? */ +int cxl_add_dc_extent(struct cxl_memdev_state *mds, struct range *alloc_range); +int cxl_release_dc_extent(struct cxl_memdev_state *mds, struct range *rel_range); +int cxl_dev_get_dc_extent_cnt(struct cxl_memdev_state *mds, + unsigned int *extent_gen_num); +int cxl_dev_get_dc_extents(struct cxl_memdev_state *mds, unsigned int cnt, + unsigned int index); + #ifdef CONFIG_CXL_SUSPEND void cxl_mem_active_inc(void); void cxl_mem_active_dec(void); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index ac1a41bc083d..558ffbcb9b34 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -522,8 +522,8 @@ static int cxl_event_req_irq(struct cxl_dev_state *cxlds, u8 setting) return irq; return devm_request_threaded_irq(dev, irq, NULL, cxl_event_thread, - IRQF_SHARED | IRQF_ONESHOT, NULL, - dev_id); + IRQF_SHARED | IRQF_ONESHOT, NULL, + dev_id); } static int cxl_event_get_int_policy(struct cxl_memdev_state *mds, @@ -555,6 +555,7 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds, .warn_settings = CXL_INT_MSI_MSIX, .failure_settings = CXL_INT_MSI_MSIX, .fatal_settings = CXL_INT_MSI_MSIX, + .dyncap_settings = CXL_INT_MSI_MSIX, }; mbox_cmd = (struct cxl_mbox_cmd) { @@ -608,6 +609,11 @@ static int cxl_event_irqsetup(struct cxl_memdev_state *mds) return rc; } + rc = cxl_event_req_irq(cxlds, policy.dyncap_settings); + if (rc) { + dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n"); + return rc; + } return 0; } diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 227800053309..b2b27033f589 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -434,7 +434,7 @@ static void free_dev_dax_ranges(struct dev_dax *dev_dax) trim_dev_dax_range(dev_dax); } -static void unregister_dev_dax(void *dev) +void unregister_dev_dax(void *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); @@ -445,6 +445,7 @@ static void unregister_dev_dax(void *dev) free_dev_dax_ranges(dev_dax); put_device(dev); } +EXPORT_SYMBOL_GPL(unregister_dev_dax); /* a return value >= 0 indicates this invocation invalidated the id */ static int __free_dev_dax_id(struct dev_dax *dev_dax) @@ -641,7 +642,7 @@ static void dax_mapping_release(struct device *dev) kfree(mapping); } -static void unregister_dax_mapping(void *data) +void unregister_dax_mapping(void *data) { struct device *dev = data; struct dax_mapping *mapping = to_dax_mapping(dev); @@ -658,7 +659,7 @@ static void unregister_dax_mapping(void *data) device_del(dev); put_device(dev); } - +EXPORT_SYMBOL_GPL(unregister_dax_mapping); static struct dev_dax_range *get_dax_range(struct device *dev) { struct dax_mapping *mapping = to_dax_mapping(dev); @@ -793,7 +794,7 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) return 0; } -static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, +int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, resource_size_t size) { struct dax_region *dax_region = dev_dax->region; @@ -853,6 +854,8 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, return rc; } +EXPORT_SYMBOL_GPL(alloc_dev_dax_range); + static int adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, resource_size_t size) { diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 8cd79ab34292..aa8418c7aead 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -47,8 +47,11 @@ int __dax_driver_register(struct dax_device_driver *dax_drv, __dax_driver_register(driver, THIS_MODULE, KBUILD_MODNAME) void dax_driver_unregister(struct dax_device_driver *dax_drv); void kill_dev_dax(struct dev_dax *dev_dax); +void unregister_dev_dax(void *dev); +void unregister_dax_mapping(void *data); bool static_dev_dax(struct dev_dax *dev_dax); - +int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, + resource_size_t size); /* * While run_dax() is potentially a generic operation that could be * defined in include/linux/dax.h we don't want to grow any users From patchwork Wed Jun 14 19:16:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13280396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10185EB64D8 for ; Wed, 14 Jun 2023 19:20:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229576AbjFNTUA (ORCPT ); Wed, 14 Jun 2023 15:20:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236002AbjFNTTy (ORCPT ); Wed, 14 Jun 2023 15:19:54 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E8142682 for ; Wed, 14 Jun 2023 12:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686770393; x=1718306393; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to; bh=ID0J+3Kp49Mg9ETt5IT049MYB4ZipCCxdK/sLh9D2L4=; b=CG+KJzi22mS01DFpzfBS4EnhSJo250f7ydkRrIIaRSZGbnZIh7FdtiS2 uSA7x12AJFiWt3KoHT4sQIvWcHzl0EWPDp4Klw75bA17PFkfktn3Blnvd 2ac28y6gVKJHq2J/jBXFADypmPTDrVQLYY4dIXtbNdoSo6qFRLRGQt5ta mW91hZDWB6mOmBEN05H6Si7KQZbzDUIypfpUMGSakZFZqfpzycEdIMT57 hlYOXdBtxzHg6wa80hGbxpHz9RtzNhtAo9E59QB1WlmAXxv/g4MnXipk1 lJNswwzTf34PSx6fAsfzTPYFJqtO5jd0J2Y7PsMFOu3f6PykW0g6sju0b Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="338347383" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="338347383" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10741"; a="886384309" X-IronPort-AV: E=Sophos;i="6.00,243,1681196400"; d="scan'208";a="886384309" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.212.116.198]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2023 12:19:51 -0700 From: ira.weiny@intel.com Date: Wed, 14 Jun 2023 12:16:32 -0700 Subject: [PATCH 5/5] cxl/mem: Trace Dynamic capacity Event Record MIME-Version: 1.0 Message-Id: <20230604-dcd-type2-upstream-v1-5-71b6341bae54@intel.com> References: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> In-Reply-To: <20230604-dcd-type2-upstream-v1-0-71b6341bae54@intel.com> To: Navneet Singh , Fan Ni , Jonathan Cameron , Ira Weiny , Dan Williams , linux-cxl@vger.kernel.org X-Mailer: b4 0.13-dev-9a8cd X-Developer-Signature: v=1; a=ed25519-sha256; t=1686770367; l=3458; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=o0Z/5qX3MI+qx48VNONUHgNMI4ceicmOt0MHT01Dktc=; b=KYQ5/LY/WR+yPSOWA4ld4H+appCY+HtJAG0jy3X+Ef/PWNXcHHsKEvhX7GEGFiGqCNNiI8Zjj Tu0TkfdSbJTB/jW6X7SwAX7SsZ48WL/6d9GKmgJ71mg60rwyI4Gn3iB X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Navneet Singh CXL rev 3.0 section 8.2.9.2.1.5 defines the Dynamic Capacity Event Record Determine if the event read is a Dynamic capacity event record and if so trace the record for the debug purpose. Add DC trace points to the trace log. Signed-off-by: Navneet Singh --- [iweiny: fixups] [djbw: no sign-off: preview only] --- drivers/cxl/core/mbox.c | 5 ++++ drivers/cxl/core/trace.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index db9295216de5..802dacd09772 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -888,6 +888,11 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd, (struct cxl_event_mem_module *)record; trace_cxl_memory_module(cxlmd, type, rec); + } else if (uuid_equal(id, &dc_event_uuid)) { + struct dcd_event_dyn_cap *rec = + (struct dcd_event_dyn_cap *)record; + + trace_cxl_dynamic_capacity(cxlmd, type, rec); } else { /* For unknown record types print just the header */ trace_cxl_generic_event(cxlmd, type, record); diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index e11651255780..468c2c8b4347 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -704,6 +704,71 @@ TRACE_EVENT(cxl_poison, ) ); +/* + * DYNAMIC CAPACITY Event Record - DER + * + * CXL rev 3.0 section 8.2.9.2.1.5 Table 8-47 + */ + +#define CXL_DC_ADD_CAPACITY 0x00 +#define CXL_DC_REL_CAPACITY 0x01 +#define CXL_DC_FORCED_REL_CAPACITY 0x02 +#define CXL_DC_REG_CONF_UPDATED 0x03 +#define show_dc_evt_type(type) __print_symbolic(type, \ + { CXL_DC_ADD_CAPACITY, "Add capacity"}, \ + { CXL_DC_REL_CAPACITY, "Release capacity"}, \ + { CXL_DC_FORCED_REL_CAPACITY, "Forced capacity release"}, \ + { CXL_DC_REG_CONF_UPDATED, "Region Configuration Updated" } \ +) + +TRACE_EVENT(cxl_dynamic_capacity, + + TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log, + struct dcd_event_dyn_cap *rec), + + TP_ARGS(cxlmd, log, rec), + + TP_STRUCT__entry( + CXL_EVT_TP_entry + + /* Dynamic capacity Event */ + __field(u8, event_type) + __field(u16, hostid) + __field(u8, region_id) + __field(u64, dpa_start) + __field(u64, length) + __array(u8, tag, CXL_EVENT_DC_TAG_SIZE) + __field(u16, sh_extent_seq) + ), + + TP_fast_assign( + CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr); + + /* Dynamic_capacity Event */ + __entry->event_type = rec->data.event_type; + + /* DCD event record data */ + __entry->hostid = le16_to_cpu(rec->data.host_id); + __entry->region_id = rec->data.region_index; + __entry->dpa_start = le64_to_cpu(rec->data.extent.start_dpa); + __entry->length = le64_to_cpu(rec->data.extent.length); + memcpy(__entry->tag, &rec->data.extent.tag, CXL_EVENT_DC_TAG_SIZE); + __entry->sh_extent_seq = le16_to_cpu(rec->data.extent.shared_extn_seq); + ), + + CXL_EVT_TP_printk("event_type='%s' host_id='%d' region_id='%d' " \ + "starting_dpa=%llx length=%llx tag=%s " \ + "shared_extent_sequence=%d", + show_dc_evt_type(__entry->event_type), + __entry->hostid, + __entry->region_id, + __entry->dpa_start, + __entry->length, + __print_hex(__entry->tag, CXL_EVENT_DC_TAG_SIZE), + __entry->sh_extent_seq + ) +); + #endif /* _CXL_EVENTS_H */ #define TRACE_INCLUDE_FILE trace