From patchwork Fri Aug 16 14:08:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766377 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F6E71BC09F; Fri, 16 Aug 2024 14:08:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817300; cv=none; b=ELFWPNROre3EFusLsLsVGyou/k84e4eUh8HHSMnZav2nlggbZIJBI286TOPWPqp8L2BEwYy/Xmz/W8H4OL3Aa7t82b0kYecPN9CT/M+JpE2PPBOMJZ3YfMew1+TIzWPcIO5uRx9dYo00Aa7Eg0EQa10Nj9lqiKISa4y9W4mDH7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817300; c=relaxed/simple; bh=RjlLUaj7o1Ekf7+NkDPCFuIDq4h3i49rOee7IMQvwS4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=HcN7hTrAUjMrM+AQ8eu5wuWjRo8q56c/Mbw+FSFwYwS+EdZVGMtNVZi/dgcRmOOxGz36KkzxAk6r0PtLharGW5KBLTP2CzcP95tUOuCMMsrfn/gV5AbHSpQ16wZvhsDef8FhrDTBzkeGzTy+hL7W1iAArcUa4egWizxMiJqhru0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XL3lh2aF; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XL3lh2aF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817300; x=1755353300; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=RjlLUaj7o1Ekf7+NkDPCFuIDq4h3i49rOee7IMQvwS4=; b=XL3lh2aFz5cJP22FgStlvcotguShfjVC31JxQMrUPbEO4Bh/Owqhk9J8 mkios40tpjnyx4V52E7r1vG9Ew8qlrQzLz3A78iQ8S3IccD1z2QigL/tq z+/OzYNeahuyzBpuEIhJefXWaZLWJhb5tF7Ex3KexD+KSv6o8EkXuA1cC G8KPnDFfs5VfFZcrkSPppE49ofK2Ms6GtWRnWFF5zAF1od1T0CHwxsWhW D9x6dDrI32Nfc7q+6WoB6gEsBdsDVexiuTLbjZofgPNTx2+BoOZRwjjl+ 7Ibamdc0Iwx+TyHAZq+eDhAEWkqHCl66HaQviyaWFn6jJniSEKy/CqKVj g==; X-CSE-ConnectionGUID: b2N+KiTdRxC4K/okFRBviQ== X-CSE-MsgGUID: j+6pHEDFSRqKxewPDThGQg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22260877" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22260877" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:14 -0700 X-CSE-ConnectionGUID: 9mQLGiCBQWuZHgjuotTugQ== X-CSE-MsgGUID: GQ1Rc6l7SXuBLYYVNI1Plw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="59847763" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:13 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:06 -0500 Subject: [PATCH RESEND v2 01/18] cxl/hdm: Debug, use decoder name function Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-1-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=875; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=RjlLUaj7o1Ekf7+NkDPCFuIDq4h3i49rOee7IMQvwS4=; b=li2Ej/7R8NQoJSN9yvuCULPks0LlOkKOQpVFmfrqSfQRGWM9lkN9pYoLik/0f2oC0gOQsf5hV N15OklMeak+C8RzIa+bLzn6B4pa43i8B0dMkADdEEPO8JNCrt30afSB X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The decoder enum has a name conversion function defined now. Use that instead of open coding. Suggested-by: Navneet Singh Signed-off-by: Ira Weiny --- Changes for v2: [iweiny: new patch, split out] --- drivers/cxl/core/hdm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index b01a77b67511..a254f79dd4e8 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -550,8 +550,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) if (size > avail) { dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, - cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem", - &avail); + cxl_decoder_mode_name(cxled->mode), &avail); rc = -ENOSPC; goto out; } From patchwork Fri Aug 16 14:08:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766378 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E7CC1BC9EB; Fri, 16 Aug 2024 14:08:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817301; cv=none; b=bs37pMr937VBrSX4t3hxWLHeOXqWNgz0owgBW+5YAQe4EcCcqHwPgyHg26dWaPd3M8gNLnT/T6oygptFQh2ylpCDHZPJLFvjuILNx18xjJFARtMv12RY0G3M+dEfipuSxRIc98N2eJ1+sPN5ki7d/3OQEf2p3Zw9XQAs3w1/x90= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817301; c=relaxed/simple; bh=ER61a1N0JVP79hKUHmI91kDMkAKxu8opPjqQ1EyBvjY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=iu/cGcaEm359nXBEYuficIekH6pBNnOl/GuMZirVFkXj7ydQAhMfL/I5B6qOmFRhHrFhpJcG0QyFiDQH13Rh7j6aokLnDcCExXMf0IXbJsQr7n8bI+f39wBVRbABjZPANjbnnnEmJawdqVB6CExYykkDBLW0pu766HXyXM2pm8g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nKb8lb16; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nKb8lb16" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817300; x=1755353300; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=ER61a1N0JVP79hKUHmI91kDMkAKxu8opPjqQ1EyBvjY=; b=nKb8lb16vBvvZ7HMBsjEGDFyHIfeey5ZDYl/38dS8vMNP4bHcHX+IEN7 IGcjrIoUJiJB2bv3/mtsmmtX98nl6G+IvAveV8vIR7K+tWBRVNVkUu9m+ qDPjp0Q38rod8uSHggwsHrGuhWXsJeK/8S9DQII8FuY2jEfCrAPrQ+S1k mZhiMr0dmTcyVfMxD73xCrur+/tiwHEX18QphjUXYpVSZ/YS3JOM6ES9x z3oM4aTkRf/yf/yZ0pCt/3dW0K7oNvAP3nE5hPZNzGqJvFW0MKbr/ajEb pEMeP7ibRFeG217gIcu6U5d5uO0FQndoj6OiMRHvbjE+srYNj9u3q94zL Q==; X-CSE-ConnectionGUID: 5Xc71UkMR0yuithxw/2W4w== X-CSE-MsgGUID: 4PstT6VSTkeAuMTM2imnig== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22260884" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22260884" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:16 -0700 X-CSE-ConnectionGUID: hMhDEQpmRU+aXNk6rrIqGg== X-CSE-MsgGUID: dlg93kTUTCqPU3gmBDKg6Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="59847777" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:15 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:07 -0500 Subject: [PATCH RESEND v2 02/18] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-2-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=4400; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=ER61a1N0JVP79hKUHmI91kDMkAKxu8opPjqQ1EyBvjY=; b=q1WQqgbWEWf2ihXKsodQiNiQ0C6zpL/0Gofq+U12XeK3dmBSJuGcyzAdBR2pP9AVgxn2vXHbZ tk6NxzqkMRdAoukeOisY9sTP/AAee6jyF/Bvcv7ymEaNTRFTSroYhdq X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Per the CXL 3.0 specification software must check the Command Effects Log (CEL) to know if a device supports DC. If the device does support DC the specifics of the DC Regions (0-7) are read through the mailbox. Flag DC Device (DCD) commands in a device if they are supported. Subsequent patches will key off these bits to configure a DCD. Co-developed-by: Navneet Singh Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- Changes for v2 [iweiny: new patch] --- drivers/cxl/core/mbox.c | 38 +++++++++++++++++++++++++++++++++++--- drivers/cxl/cxlmem.h | 15 +++++++++++++++ 2 files changed, 50 insertions(+), 3 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index f052d5f174ee..554ec97a7c39 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -111,6 +111,34 @@ static u8 security_command_sets[] = { 0x46, /* Security Passthrough */ }; +static bool cxl_is_dcd_command(u16 opcode) +{ +#define CXL_MBOX_OP_DCD_CMDS 0x48 + + return (opcode >> 8) == CXL_MBOX_OP_DCD_CMDS; +} + +static void cxl_set_dcd_cmd_enabled(struct cxl_memdev_state *mds, + u16 opcode) +{ + switch (opcode) { + case CXL_MBOX_OP_GET_DC_CONFIG: + set_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); + break; + case CXL_MBOX_OP_GET_DC_EXTENT_LIST: + set_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds); + break; + case CXL_MBOX_OP_ADD_DC_RESPONSE: + set_bit(CXL_DCD_ENABLED_ADD_RESPONSE, mds->dcd_cmds); + break; + case CXL_MBOX_OP_RELEASE_DC: + set_bit(CXL_DCD_ENABLED_RELEASE, mds->dcd_cmds); + break; + default: + break; + } +} + static bool cxl_is_security_command(u16 opcode) { int i; @@ -677,9 +705,10 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel) u16 opcode = le16_to_cpu(cel_entry[i].opcode); struct cxl_mem_command *cmd = cxl_mem_find_command(opcode); - if (!cmd && !cxl_is_poison_command(opcode)) { - dev_dbg(dev, - "Opcode 0x%04x unsupported by driver\n", opcode); + if (!cmd && !cxl_is_poison_command(opcode) && + !cxl_is_dcd_command(opcode)) { + dev_dbg(dev, "Opcode 0x%04x unsupported by driver\n", + opcode); continue; } @@ -689,6 +718,9 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel) if (cxl_is_poison_command(opcode)) cxl_set_poison_cmd_enabled(&mds->poison, opcode); + if (cxl_is_dcd_command(opcode)) + cxl_set_dcd_cmd_enabled(mds, opcode); + dev_dbg(dev, "Opcode 0x%04x enabled\n", opcode); } } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index adfba72445fc..5f2e65204bf9 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -247,6 +247,15 @@ struct cxl_event_state { struct mutex log_lock; }; +/* Device enabled DCD commands */ +enum dcd_cmd_enabled_bits { + CXL_DCD_ENABLED_GET_CONFIG, + CXL_DCD_ENABLED_GET_EXTENT_LIST, + CXL_DCD_ENABLED_ADD_RESPONSE, + CXL_DCD_ENABLED_RELEASE, + CXL_DCD_ENABLED_MAX +}; + /* Device enabled poison commands */ enum poison_cmd_enabled_bits { CXL_POISON_ENABLED_LIST, @@ -436,6 +445,7 @@ struct cxl_dev_state { * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device) * @mbox_mutex: Mutex to synchronize mailbox access. * @firmware_version: Firmware version for the memory device. + * @dcd_cmds: List of DCD commands implemented by memory device * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only * @total_bytes: sum of all possible capacities @@ -460,6 +470,7 @@ struct cxl_memdev_state { size_t lsa_size; struct mutex mbox_mutex; /* Protects device mailbox and firmware */ char firmware_version[0x10]; + DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX); DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); u64 total_bytes; @@ -525,6 +536,10 @@ enum cxl_opcode { CXL_MBOX_OP_UNLOCK = 0x4503, CXL_MBOX_OP_FREEZE_SECURITY = 0x4504, CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE = 0x4505, + CXL_MBOX_OP_GET_DC_CONFIG = 0x4800, + CXL_MBOX_OP_GET_DC_EXTENT_LIST = 0x4801, + CXL_MBOX_OP_ADD_DC_RESPONSE = 0x4802, + CXL_MBOX_OP_RELEASE_DC = 0x4803, CXL_MBOX_OP_MAX = 0x10000 }; From patchwork Fri Aug 16 14:08:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766379 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E72E81BD02C; Fri, 16 Aug 2024 14:08:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817304; cv=none; b=anKmBFvBFmjhiCgJJWp3jKKLBTgZgxugL7R5fN74eIkGY/njpWbhT8xIjJ5bQvILxI4Nt4Scv+c0cXsGIkGbfhh1o1WTTDQ4Oe91q5ZvESnYpYPIqOgCfOwpspKlNRjmjkv6Wnlc1mZf+5gCNjJH8ZFt6Ki7A3px1/H/bX4XUV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817304; c=relaxed/simple; bh=4yrdIzt0+D6nKJzV+1HBy7mHZqwAl8qFl4/nCycW7to=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=NY0nG0B8UTIpRaOtv0mtBf4nrE3gzYULlYyLwHcgt2uppbHiQ/huLcer1JPPM68je/H1BewyuKNiKXDEv9eKTsuxkCKkc6s+1V/gzoh2XHdMPvzX0BFoX8hk6PMoZKQPHIDeClT0RK2bqZwuLdj4RiHsmYFzzMnbfGucqNsZThM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cOsZbK60; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cOsZbK60" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817302; x=1755353302; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=4yrdIzt0+D6nKJzV+1HBy7mHZqwAl8qFl4/nCycW7to=; b=cOsZbK60kj9z+fT7Fs9bA8rot6a0A3MNIfZ7EBaFYUE7mGPEsSZ67UpU mDEdccrg6tYtS4Jg04ZJefLEyLFAUm4SM7RXNeVnyAA+Ks4brrdTKn4QI LzOCuAU7Dnsj6z4MoaVspjb3ia8wWuUJfEvtgl3Cb2W3EzclWc/6JIFN+ p6QzfG/4p7CMIv2xP3pxPH24BfhnfQit0q7yyThQ1xt+M2Jc/oSzSbU1K 0w4BmfuwWXiki2CRp+5n4BKQN8/o+soaLiBJwoDKGwhb1ZioVfbL6crrv f0qevTECwfPlHA9dv4Uv2NHDOu1WTOW6dr+DHYNwUuleQD6lYhFFWO/F4 w==; X-CSE-ConnectionGUID: HXlyFxm5SfG+Di8nbrXzPg== X-CSE-MsgGUID: E4tITOgFSBSPmbG4PlqZjw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22260906" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22260906" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:18 -0700 X-CSE-ConnectionGUID: JfJqD37QQeedxxRyEv5OKQ== X-CSE-MsgGUID: ilXDVs3JQQulwrx58b56eQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="59847788" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:17 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 09:08:08 -0500 Subject: [PATCH RESEND v2 03/18] cxl/mem: Read Dynamic capacity configuration from the device Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-3-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=22852; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=vWXdqlDstmrMUknasiEVosin1B7Lm84XfRtpvdPbiTE=; b=loyZfxEsoBOO9wyKnmMqkHIMOGO0n0W4JVL8yGVMsnQ6uVs3B9Tec35TJUyoZd/vmMUSjjZGd RE5SI5u50A0AkXsdB+zCpugJOc9D0DjCCsx5teej7OS2KOq8h7c2we3 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Devices can optionally support Dynamic Capacity (DC). These devices are known as Dynamic Capacity Devices (DCD). Implement the DC (opcode 48XXh) mailbox commands as specified in CXL 3.0 section 8.2.9.8.9. Read the DC configuration and store the DC region information in the device state. Co-developed-by: Navneet Singh Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- Changes for v2 [iweiny: Rebased to latest master type2 work] [jonathan: s/dc/dc_resp/] [iweiny: Clean up commit message] [iweiny: Clean kernel docs] [djiang: Fix up cxl_is_dcd_command] [djiang: extra blank line] [alison: s/total_capacity/cap/ etc...] [alison: keep partition flag with partition structures] [alison: reformat untenanted_mem declaration] [alison: move 'cmd' definition back] [alison: fix comment line length] [alison: reverse x-tree] [jonathan: fix and adjust CXL_DC_REGION_STRLEN] [Jonathan/iweiny: Factor out storing each DC region read from the device] [Jonathan: place all dcr initializers together] [Jonathan/iweiny: flip around the region DPA order check] [jonathan: Account for short read of mailbox command] [iweiny: use snprintf for region name] [iweiny: use '' for missing region names] [iweiny: factor out struct cxl_dc_region_info] [iweiny: Split out reading CEL] --- drivers/cxl/core/mbox.c | 179 +++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/core/region.c | 75 +++++++++++++------ drivers/cxl/cxl.h | 27 ++++++- drivers/cxl/cxlmem.h | 55 +++++++++++++- drivers/cxl/pci.c | 4 ++ 5 files changed, 314 insertions(+), 26 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 554ec97a7c39..d769814f80e2 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1096,7 +1096,7 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds) if (rc < 0) return rc; - mds->total_bytes = + mds->static_cap = le64_to_cpu(id.total_capacity) * CXL_CAPACITY_MULTIPLIER; mds->volatile_only_bytes = le64_to_cpu(id.volatile_capacity) * CXL_CAPACITY_MULTIPLIER; @@ -1114,6 +1114,8 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds) mds->poison.max_errors = min_t(u32, val, CXL_POISON_LIST_MAX); } + mds->dc_event_log_size = le16_to_cpu(id.dc_event_log_size); + return 0; } EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL); @@ -1178,6 +1180,165 @@ int cxl_mem_sanitize(struct cxl_memdev_state *mds, u16 cmd) } EXPORT_SYMBOL_NS_GPL(cxl_mem_sanitize, CXL); +static int cxl_dc_save_region_info(struct cxl_memdev_state *mds, int index, + struct cxl_dc_region_config *region_config) +{ + struct cxl_dc_region_info *dcr = &mds->dc_region[index]; + struct device *dev = mds->cxlds.dev; + + dcr->base = le64_to_cpu(region_config->region_base); + dcr->decode_len = le64_to_cpu(region_config->region_decode_length); + dcr->decode_len *= CXL_CAPACITY_MULTIPLIER; + dcr->len = le64_to_cpu(region_config->region_length); + dcr->blk_size = le64_to_cpu(region_config->region_block_size); + dcr->dsmad_handle = le32_to_cpu(region_config->region_dsmad_handle); + dcr->flags = region_config->flags; + snprintf(dcr->name, CXL_DC_REGION_STRLEN, "dc%d", index); + + /* Check regions are in increasing DPA order */ + if (index > 0) { + struct cxl_dc_region_info *prev_dcr = &mds->dc_region[index - 1]; + + if ((prev_dcr->base + prev_dcr->decode_len) > dcr->base) { + dev_err(dev, + "DPA ordering violation for DC region %d and %d\n", + index - 1, index); + return -EINVAL; + } + } + + /* Check the region is 256 MB aligned */ + if (!IS_ALIGNED(dcr->base, SZ_256M)) { + dev_err(dev, "DC region %d not aligned to 256MB: %#llx\n", + index, dcr->base); + return -EINVAL; + } + + /* Check Region base and length are aligned to block size */ + if (!IS_ALIGNED(dcr->base, dcr->blk_size) || + !IS_ALIGNED(dcr->len, dcr->blk_size)) { + dev_err(dev, "DC region %d not aligned to %#llx\n", index, + dcr->blk_size); + return -EINVAL; + } + + dev_dbg(dev, + "DC region %s DPA: %#llx LEN: %#llx BLKSZ: %#llx\n", + dcr->name, dcr->base, dcr->decode_len, dcr->blk_size); + + return 0; +} + +/* Returns the number of regions in dc_resp or -ERRNO */ +static int cxl_get_dc_id(struct cxl_memdev_state *mds, u8 start_region, + struct cxl_mbox_dynamic_capacity *dc_resp, + size_t dc_resp_size) +{ + struct cxl_mbox_get_dc_config get_dc = (struct cxl_mbox_get_dc_config) { + .region_count = CXL_MAX_DC_REGION, + .start_region_index = start_region, + }; + struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_CONFIG, + .payload_in = &get_dc, + .size_in = sizeof(get_dc), + .size_out = dc_resp_size, + .payload_out = dc_resp, + .min_out = 1, + }; + struct device *dev = mds->cxlds.dev; + int rc; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + return rc; + + rc = dc_resp->avail_region_count - start_region; + + /* + * The number of regions in the payload may have been truncated due to + * payload_size limits; if so adjust the count in this query. + */ + if (mbox_cmd.size_out < sizeof(*dc_resp)) + rc = CXL_REGIONS_RETURNED(mbox_cmd.size_out); + + dev_dbg(dev, "Read %d/%d DC regions\n", rc, dc_resp->avail_region_count); + + return rc; +} + +/** + * cxl_dev_dynamic_capacity_identify() - Reads the dynamic capacity + * information from the device. + * @mds: The memory device state + * + * This will dispatch the get_dynamic_capacity command to the device + * and on success populate structures to be exported to sysfs. + * + * Return: 0 if identify was executed successfully, -ERRNO on error. + */ +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds) +{ + struct cxl_mbox_dynamic_capacity *dc_resp; + struct device *dev = mds->cxlds.dev; + size_t dc_resp_size = mds->payload_size; + u8 start_region; + int i, rc = 0; + + for (i = 0; i < CXL_MAX_DC_REGION; i++) + snprintf(mds->dc_region[i].name, CXL_DC_REGION_STRLEN, ""); + + /* Check GET_DC_CONFIG is supported by device */ + if (!test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds)) { + dev_dbg(dev, "unsupported cmd: get_dynamic_capacity_config\n"); + return 0; + } + + dc_resp = kvmalloc(dc_resp_size, GFP_KERNEL); + if (!dc_resp) + return -ENOMEM; + + start_region = 0; + do { + int j; + + rc = cxl_get_dc_id(mds, start_region, dc_resp, dc_resp_size); + if (rc < 0) + goto free_resp; + + mds->nr_dc_region += rc; + + if (mds->nr_dc_region < 1 || mds->nr_dc_region > CXL_MAX_DC_REGION) { + dev_err(dev, "Invalid num of dynamic capacity regions %d\n", + mds->nr_dc_region); + rc = -EINVAL; + goto free_resp; + } + + for (i = start_region, j = 0; i < mds->nr_dc_region; i++, j++) { + rc = cxl_dc_save_region_info(mds, i, &dc_resp->region[j]); + if (rc) + goto free_resp; + } + + start_region = mds->nr_dc_region; + + } while (mds->nr_dc_region < dc_resp->avail_region_count); + + mds->dynamic_cap = + mds->dc_region[mds->nr_dc_region - 1].base + + mds->dc_region[mds->nr_dc_region - 1].decode_len - + mds->dc_region[0].base; + dev_dbg(dev, "Total dynamic capacity: %#llx\n", mds->dynamic_cap); + +free_resp: + kfree(dc_resp); + if (rc) + dev_err(dev, "Failed to get DC info: %d\n", rc); + return rc; +} +EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) @@ -1208,8 +1369,12 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) { struct cxl_dev_state *cxlds = &mds->cxlds; struct device *dev = cxlds->dev; + size_t untenanted_mem; int rc; + untenanted_mem = mds->dc_region[0].base - mds->static_cap; + mds->total_bytes = mds->static_cap + untenanted_mem + mds->dynamic_cap; + if (!cxlds->media_ready) { cxlds->dpa_res = DEFINE_RES_MEM(0, 0); cxlds->ram_res = DEFINE_RES_MEM(0, 0); @@ -1217,8 +1382,16 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) return 0; } - cxlds->dpa_res = - (struct resource)DEFINE_RES_MEM(0, mds->total_bytes); + cxlds->dpa_res = (struct resource)DEFINE_RES_MEM(0, mds->total_bytes); + + for (int i = 0; i < mds->nr_dc_region; i++) { + struct cxl_dc_region_info *dcr = &mds->dc_region[i]; + + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->dc_res[i], + dcr->base, dcr->decode_len, dcr->name); + if (rc) + return rc; + } if (mds->partition_align_bytes == 0) { rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 252bc8e1f103..75041903b72c 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -46,7 +46,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, rc = down_read_interruptible(&cxl_region_rwsem); if (rc) return rc; - if (cxlr->mode != CXL_DECODER_PMEM) + if (cxlr->mode != CXL_REGION_PMEM) rc = sysfs_emit(buf, "\n"); else rc = sysfs_emit(buf, "%pUb\n", &p->uuid); @@ -359,7 +359,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, * Support tooling that expects to find a 'uuid' attribute for all * regions regardless of mode. */ - if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM) + if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_REGION_PMEM) return 0444; return a->mode; } @@ -537,7 +537,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, { struct cxl_region *cxlr = to_cxl_region(dev); - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode)); + return sysfs_emit(buf, "%s\n", cxl_region_mode_name(cxlr->mode)); } static DEVICE_ATTR_RO(mode); @@ -563,7 +563,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) /* ways, granularity and uuid (if PMEM) need to be set before HPA */ if (!p->interleave_ways || !p->interleave_granularity || - (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid))) + (cxlr->mode == CXL_REGION_PMEM && uuid_is_null(&p->uuid))) return -ENXIO; div_u64_rem(size, SZ_256M * p->interleave_ways, &remainder); @@ -1765,6 +1765,17 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) return rc; } +static bool cxl_modes_compatible(enum cxl_region_mode rmode, + enum cxl_decoder_mode dmode) +{ + if (rmode == CXL_REGION_RAM && dmode == CXL_DECODER_RAM) + return true; + if (rmode == CXL_REGION_PMEM && dmode == CXL_DECODER_PMEM) + return true; + + return false; +} + static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { @@ -1778,9 +1789,11 @@ static int cxl_region_attach(struct cxl_region *cxlr, lockdep_assert_held_write(&cxl_region_rwsem); lockdep_assert_held_read(&cxl_dpa_rwsem); - if (cxled->mode != cxlr->mode) { - dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", - dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); + if (!cxl_modes_compatible(cxlr->mode, cxled->mode)) { + dev_dbg(&cxlr->dev, "%s region mode: %s mismatch decoder: %s\n", + dev_name(&cxled->cxld.dev), + cxl_region_mode_name(cxlr->mode), + cxl_decoder_mode_name(cxled->mode)); return -EINVAL; } @@ -2234,7 +2247,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i * devm_cxl_add_region - Adds a region to a decoder * @cxlrd: root decoder * @id: memregion id to create, or memregion_free() on failure - * @mode: mode for the endpoint decoders of this region + * @mode: mode of this region * @type: select whether this is an expander or accelerator (type-2 or type-3) * * This is the second step of region initialization. Regions exist within an @@ -2245,7 +2258,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i */ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, int id, - enum cxl_decoder_mode mode, + enum cxl_region_mode mode, enum cxl_decoder_type type) { struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); @@ -2254,11 +2267,12 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, int rc; switch (mode) { - case CXL_DECODER_RAM: - case CXL_DECODER_PMEM: + case CXL_REGION_RAM: + case CXL_REGION_PMEM: break; default: - dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); + dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n", + cxl_region_mode_name(mode)); return ERR_PTR(-EINVAL); } @@ -2308,7 +2322,7 @@ static ssize_t create_ram_region_show(struct device *dev, } static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, - int id, enum cxl_decoder_mode mode, + int id, enum cxl_region_mode mode, enum cxl_decoder_type type) { int rc; @@ -2337,7 +2351,7 @@ static ssize_t create_pmem_region_store(struct device *dev, if (rc != 1) return -EINVAL; - cxlr = __create_region(cxlrd, id, CXL_DECODER_PMEM, + cxlr = __create_region(cxlrd, id, CXL_REGION_PMEM, CXL_DECODER_HOSTONLYMEM); if (IS_ERR(cxlr)) return PTR_ERR(cxlr); @@ -2358,7 +2372,7 @@ static ssize_t create_ram_region_store(struct device *dev, if (rc != 1) return -EINVAL; - cxlr = __create_region(cxlrd, id, CXL_DECODER_RAM, + cxlr = __create_region(cxlrd, id, CXL_REGION_RAM, CXL_DECODER_HOSTONLYMEM); if (IS_ERR(cxlr)) return PTR_ERR(cxlr); @@ -2886,10 +2900,31 @@ static void construct_region_end(void) up_write(&cxl_region_rwsem); } +static enum cxl_region_mode +cxl_decoder_to_region_mode(enum cxl_decoder_mode mode) +{ + switch (mode) { + case CXL_DECODER_NONE: + return CXL_REGION_NONE; + case CXL_DECODER_RAM: + return CXL_REGION_RAM; + case CXL_DECODER_PMEM: + return CXL_REGION_PMEM; + case CXL_DECODER_DEAD: + return CXL_REGION_DEAD; + case CXL_DECODER_MIXED: + default: + return CXL_REGION_MIXED; + } + + return CXL_REGION_MIXED; +} + static struct cxl_region * construct_region_begin(struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled) { + enum cxl_region_mode mode = cxl_decoder_to_region_mode(cxled->mode); struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_region_params *p; struct cxl_region *cxlr; @@ -2897,7 +2932,7 @@ construct_region_begin(struct cxl_root_decoder *cxlrd, do { cxlr = __create_region(cxlrd, atomic_read(&cxlrd->region_id), - cxled->mode, cxled->cxld.target_type); + mode, cxled->cxld.target_type); } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY); if (IS_ERR(cxlr)) { @@ -3200,9 +3235,9 @@ static int cxl_region_probe(struct device *dev) return rc; switch (cxlr->mode) { - case CXL_DECODER_PMEM: + case CXL_REGION_PMEM: return devm_cxl_add_pmem_region(cxlr); - case CXL_DECODER_RAM: + case CXL_REGION_RAM: /* * The region can not be manged by CXL if any portion of * it is already online as 'System RAM' @@ -3223,8 +3258,8 @@ static int cxl_region_probe(struct device *dev) /* HDM-H routes to device-dax */ return devm_cxl_add_dax_region(cxlr); default: - dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", - cxlr->mode); + dev_dbg(&cxlr->dev, "unsupported region mode: %s\n", + cxl_region_mode_name(cxlr->mode)); return -ENXIO; } } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index cd4a9ffdacc7..ed282dcd5cf5 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -374,6 +374,28 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +enum cxl_region_mode { + CXL_REGION_NONE, + CXL_REGION_RAM, + CXL_REGION_PMEM, + CXL_REGION_MIXED, + CXL_REGION_DEAD, +}; + +static inline const char *cxl_region_mode_name(enum cxl_region_mode mode) +{ + static const char * const names[] = { + [CXL_REGION_NONE] = "none", + [CXL_REGION_RAM] = "ram", + [CXL_REGION_PMEM] = "pmem", + [CXL_REGION_MIXED] = "mixed", + }; + + if (mode >= CXL_REGION_NONE && mode <= CXL_REGION_MIXED) + return names[mode]; + return "mixed"; +} + /* * Track whether this decoder is reserved for region autodiscovery, or * free for userspace provisioning. @@ -502,7 +524,8 @@ struct cxl_region_params { * struct cxl_region - CXL region * @dev: This region's device * @id: This region's id. Id is globally unique across all regions - * @mode: Endpoint decoder allocation / access mode + * @mode: Region mode which defines which endpoint decoder mode the region is + * compatible with * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge @@ -512,7 +535,7 @@ struct cxl_region_params { struct cxl_region { struct device dev; int id; - enum cxl_decoder_mode mode; + enum cxl_region_mode mode; enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 5f2e65204bf9..8c8f47b397ab 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -396,6 +396,7 @@ enum cxl_devtype { CXL_DEVTYPE_CLASSMEM, }; +#define CXL_MAX_DC_REGION 8 /** * struct cxl_dev_state - The driver device state * @@ -412,6 +413,8 @@ enum cxl_devtype { * @dpa_res: Overall DPA resource tree for the device * @pmem_res: Active Persistent memory capacity configuration * @ram_res: Active Volatile memory capacity configuration + * @dc_res: Active Dynamic Capacity memory configuration for each possible + * region * @component_reg_phys: register base of component registers * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device @@ -426,11 +429,23 @@ struct cxl_dev_state { struct resource dpa_res; struct resource pmem_res; struct resource ram_res; + struct resource dc_res[CXL_MAX_DC_REGION]; resource_size_t component_reg_phys; u64 serial; enum cxl_devtype type; }; +#define CXL_DC_REGION_STRLEN 7 +struct cxl_dc_region_info { + u64 base; + u64 decode_len; + u64 len; + u64 blk_size; + u32 dsmad_handle; + u8 flags; + u8 name[CXL_DC_REGION_STRLEN]; +}; + /** * struct cxl_memdev_state - Generic Type-3 Memory Device Class driver data * @@ -449,6 +464,8 @@ struct cxl_dev_state { * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only * @total_bytes: sum of all possible capacities + * @static_cap: Sum of RAM and PMEM capacities + * @dynamic_cap: Complete DPA range occupied by DC regions * @volatile_only_bytes: hard volatile capacity * @persistent_only_bytes: hard persistent capacity * @partition_align_bytes: alignment size for partition-able capacity @@ -456,6 +473,10 @@ struct cxl_dev_state { * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset + * @nr_dc_region: number of DC regions implemented in the memory device + * @dc_region: array containing info about the DC regions + * @dc_event_log_size: The number of events the device can store in the + * Dynamic Capacity Event Log before it overflows * @event: event log driver state * @poison: poison driver state info * @fw: firmware upload / activation state @@ -473,7 +494,10 @@ struct cxl_memdev_state { DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX); DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); + u64 total_bytes; + u64 static_cap; + u64 dynamic_cap; u64 volatile_only_bytes; u64 persistent_only_bytes; u64 partition_align_bytes; @@ -481,6 +505,11 @@ struct cxl_memdev_state { u64 active_persistent_bytes; u64 next_volatile_bytes; u64 next_persistent_bytes; + + u8 nr_dc_region; + struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; + size_t dc_event_log_size; + struct cxl_event_state event; struct cxl_poison_state poison; struct cxl_security_state security; @@ -587,6 +616,7 @@ struct cxl_mbox_identify { __le16 inject_poison_limit; u8 poison_caps; u8 qos_telemetry_caps; + __le16 dc_event_log_size; } __packed; /* @@ -741,9 +771,31 @@ struct cxl_mbox_set_partition_info { __le64 volatile_capacity; u8 flags; } __packed; - #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0) +struct cxl_mbox_get_dc_config { + u8 region_count; + u8 start_region_index; +} __packed; + +/* See CXL 3.0 Table 125 get dynamic capacity config Output Payload */ +struct cxl_mbox_dynamic_capacity { + u8 avail_region_count; + u8 rsvd[7]; + struct cxl_dc_region_config { + __le64 region_base; + __le64 region_decode_length; + __le64 region_length; + __le64 region_block_size; + __le32 region_dsmad_handle; + u8 flags; + u8 rsvd[3]; + } __packed region[]; +} __packed; +#define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0) +#define CXL_REGIONS_RETURNED(size_out) \ + ((size_out - 8) / sizeof(struct cxl_dc_region_config)) + /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */ struct cxl_mbox_set_timestamp_in { __le64 timestamp; @@ -867,6 +919,7 @@ enum { int cxl_internal_send_cmd(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd); int cxl_dev_state_identify(struct cxl_memdev_state *mds); +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); int cxl_mem_create_range_info(struct cxl_memdev_state *mds); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 5242dbf0044d..a9b110ff1176 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -879,6 +879,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; + rc = cxl_dev_dynamic_capacity_identify(mds); + if (rc) + return rc; + rc = cxl_mem_create_range_info(mds); if (rc) return rc; From patchwork Fri Aug 16 14:08:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766380 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A38B1BBBF9; Fri, 16 Aug 2024 14:08:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817305; cv=none; b=G+vUky/NjIWU2nGsdLuogMMVG7PnMgOi1r27K4glecaWZIlOKD4xldZAHmDcFeNWbZghTHwrTSLLPyY1u6MkmE8ephzuyhwWpiY069RDnMYFJ44hJ2dmNWYq94nO9i13n3PUtEGw9Ly+wcfFQPt3W6VPre2nFfh4olgGdxBzVPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817305; c=relaxed/simple; bh=pD8p/9S+sncOcesMU35Uc8/b65hfJ8KX+axALCm4SG8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=shEZHsG+NcNjHcAerXlKP5gptatewaCYVKkrb/euEEuRAgH8Fy1G13cT1J8qv0d5n9bl1BZPYja1OzvErFi3AMZjNIa4opJ0hS5Bd/Wtj7TVFnGkCc7Y5LukpENKhJrKHuNPVydiu7li8tPb+sV5znQN1TvSN5yBdASe5piBmoU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VlpxbLXR; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VlpxbLXR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817304; x=1755353304; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=pD8p/9S+sncOcesMU35Uc8/b65hfJ8KX+axALCm4SG8=; b=VlpxbLXRLbXB03PQ46bjmym8WuzYNozUzMNOSTSzEYQhcpuhACCMZfxo B20Z9o3r+mR87q85cHqbbPXQkL7Zqy4Ljjh5qEI7yoJ8R23jKcSqjfQSX v3QQtBu5PSNnrsza1/IJObIZyQWG8BbT7gEDtTdqEQQz7vl5JbE7QC/Bk VPvlUlctcg1JNIK+OHg2oSIccC9pU0K+ypgzZIYqBXKTXawfauYeQwEQG lNOsfva7laFveJmARHyCm+PxY/5HWdjl/tN5BpkpK6tFouTlDMNEJA2s+ UqUUwZz0Zb/U7VVI3XBtW3YD/1fEeup9eYN51xq+BTvBYOrie+kZK7aF+ A==; X-CSE-ConnectionGUID: EDLQjDNCQwWarOu8DLUe2g== X-CSE-MsgGUID: xj2SPHIMRmKyin0/77JvQQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22260923" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22260923" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:20 -0700 X-CSE-ConnectionGUID: dJONcCdDSQqRefI95j0KjA== X-CSE-MsgGUID: jKYN4ndARgSxiXN6/OGC8g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="59847797" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:19 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:09 -0500 Subject: [PATCH RESEND v2 04/18] cxl/region: Add Dynamic Capacity decoder and region modes Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-4-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=3161; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=pD8p/9S+sncOcesMU35Uc8/b65hfJ8KX+axALCm4SG8=; b=k99UO8CBm4r7xXRvJf1vUGE+t6KECYfpwweoONbxWxW8u6QydRXM57VyxlwGV5OyzUcpOQFG/ Pmo+f5ocMtPA7qvhbNgWI8GahoqvOZh5QoZhLgTZNIqCTK/WFKZ/Aur X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Both regions and decoders will need a new mode to reflect the new type of partition they are targeting on a device. Regions reflect a dynamic capacity type which may point to different Dynamic Capacity (DC) Regions. Decoder mode reflects a specific DC Region. Define the new modes to use in subsequent patches and the helper functions associated with them. Co-developed-by: Navneet Singh Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- Changes for v2: [iweiny: split out from: Add dynamic capacity cxl region support.] --- drivers/cxl/core/region.c | 4 ++++ drivers/cxl/cxl.h | 23 +++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 75041903b72c..69af1354bc5b 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1772,6 +1772,8 @@ static bool cxl_modes_compatible(enum cxl_region_mode rmode, return true; if (rmode == CXL_REGION_PMEM && dmode == CXL_DECODER_PMEM) return true; + if (rmode == CXL_REGION_DC && cxl_decoder_mode_is_dc(dmode)) + return true; return false; } @@ -2912,6 +2914,8 @@ cxl_decoder_to_region_mode(enum cxl_decoder_mode mode) return CXL_REGION_PMEM; case CXL_DECODER_DEAD: return CXL_REGION_DEAD; + case CXL_DECODER_DC0 ... CXL_DECODER_DC7: + return CXL_REGION_DC; case CXL_DECODER_MIXED: default: return CXL_REGION_MIXED; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index ed282dcd5cf5..d41f3f14fbe3 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -356,6 +356,14 @@ enum cxl_decoder_mode { CXL_DECODER_NONE, CXL_DECODER_RAM, CXL_DECODER_PMEM, + CXL_DECODER_DC0, + CXL_DECODER_DC1, + CXL_DECODER_DC2, + CXL_DECODER_DC3, + CXL_DECODER_DC4, + CXL_DECODER_DC5, + CXL_DECODER_DC6, + CXL_DECODER_DC7, CXL_DECODER_MIXED, CXL_DECODER_DEAD, }; @@ -366,6 +374,14 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) [CXL_DECODER_NONE] = "none", [CXL_DECODER_RAM] = "ram", [CXL_DECODER_PMEM] = "pmem", + [CXL_DECODER_DC0] = "dc0", + [CXL_DECODER_DC1] = "dc1", + [CXL_DECODER_DC2] = "dc2", + [CXL_DECODER_DC3] = "dc3", + [CXL_DECODER_DC4] = "dc4", + [CXL_DECODER_DC5] = "dc5", + [CXL_DECODER_DC6] = "dc6", + [CXL_DECODER_DC7] = "dc7", [CXL_DECODER_MIXED] = "mixed", }; @@ -374,10 +390,16 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +static inline bool cxl_decoder_mode_is_dc(enum cxl_decoder_mode mode) +{ + return (mode >= CXL_DECODER_DC0 && mode <= CXL_DECODER_DC7); +} + enum cxl_region_mode { CXL_REGION_NONE, CXL_REGION_RAM, CXL_REGION_PMEM, + CXL_REGION_DC, CXL_REGION_MIXED, CXL_REGION_DEAD, }; @@ -388,6 +410,7 @@ static inline const char *cxl_region_mode_name(enum cxl_region_mode mode) [CXL_REGION_NONE] = "none", [CXL_REGION_RAM] = "ram", [CXL_REGION_PMEM] = "pmem", + [CXL_REGION_DC] = "dc", [CXL_REGION_MIXED] = "mixed", }; From patchwork Fri Aug 16 14:08:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766381 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A55A1BE228; Fri, 16 Aug 2024 14:08:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817306; cv=none; b=DE6Lxz45CUdnFilanO9q7weKhP5RsB3NwgqQpv8JQxllpqTNrXjs95ijgzaDU/yll0i4OZyUms1pJ4sAah2LWaGFMWXof8YtMlET1KoCVfvV8rIECcx9MopmX+xg+Vqtqxwi5p9GBAN6Cgu+HVeMzFvquvJXiWkub+SzEd5QSsw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817306; c=relaxed/simple; bh=Yxs6rOhOAnILUvfuNu1XjV+CZeVOQEQlcDsNNQlt028=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ijcHa71g89SQ/sNgvSoQXKUrOoTJ7kXTRgTTR7llRvY/vKL6Fe+t2py9tI5h6Lrk7Rcx0tPY92dPj7dlSxQRx3k2wYHOQWuKS1Eq0d77KflyUe5MQcD9xwZNvFeCz1HsSETR/l6QCnuGvpDBCCxgliQzns5jQi1C6VJy/lNxrOE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=P7v5HVfl; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="P7v5HVfl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817305; x=1755353305; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=Yxs6rOhOAnILUvfuNu1XjV+CZeVOQEQlcDsNNQlt028=; b=P7v5HVflJ4tC0KT7QrwOu4kijV4psB3Ci0IQIdnrqVu4rHyYyVXjMi22 G6hj5LdMSbvQ5YPAWFZt9xWT5k3gj79ZcNCyBInygb3sOCLWGsgooFJHP Jt/U6NHxGHlMpTlfwD5l7EDdAkGCeePUjHObI1LrCDmAw6RwfUOYZUvDP 6RJ4FSF534G2fBxTvNZdV81zQF8OI56G66F5NPF7wHN4qr8lF1hKT+aKr 1hq2/vprwvvboayaw1KmjXWmbcNNxHyBlXndXFgnl89/BJUEa2TkvrIuv gELHBlPFK0p1x0+HczIAVMtC0RjSu64JUE4fJ3Grz7f0wtkk3TwH7pa5A Q==; X-CSE-ConnectionGUID: 25qrmMi4Su2EOG/XyEWhxA== X-CSE-MsgGUID: Sd7HzWM9Tm+AqOo3ftQBrw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22260930" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22260930" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:23 -0700 X-CSE-ConnectionGUID: vWz7P9ObQTaT5PUT6g1GPA== X-CSE-MsgGUID: /pAqIhuTRSGFMrI1WEY0AQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="59847803" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:22 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:10 -0500 Subject: [PATCH RESEND v2 05/18] cxl/port: Add Dynamic Capacity mode support to endpoint decoders Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-5-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=4798; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=Yxs6rOhOAnILUvfuNu1XjV+CZeVOQEQlcDsNNQlt028=; b=3l33tFCLllnU9JCsTSrD8Rz1f5ZGrxCYpYrXf/hp15HGMgmdBhThaAT2l73OzcbzivpCsD0u5 O+HLo7tpiM1AZzcRo/ny1kWbBUkf7b14q0VWthrh5gqXhUJ6svjT0cr X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Endpoint decoders used to map Dynamic Capacity must be configured to point to the correct Dynamic Capacity (DC) Region. The decoder mode currently represents the partition the decoder points to such as ram or pmem. Expand the mode to include DC Regions. Co-developed-by: Navneet Singh Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- Changes for v2: [iweiny: split from region creation patch] --- Documentation/ABI/testing/sysfs-bus-cxl | 19 ++++++++++--------- drivers/cxl/core/hdm.c | 24 ++++++++++++++++++++++++ drivers/cxl/core/port.c | 16 ++++++++++++++++ 3 files changed, 50 insertions(+), 9 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 6350dd82b9a9..2268ffcdb604 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -257,22 +257,23 @@ Description: What: /sys/bus/cxl/devices/decoderX.Y/mode Date: May, 2022 -KernelVersion: v6.0 +KernelVersion: v6.0, v6.6 (dcY) Contact: linux-cxl@vger.kernel.org Description: (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it translates from a host physical address range, to a device local address range. Device-local address ranges are further split - into a 'ram' (volatile memory) range and 'pmem' (persistent - memory) range. The 'mode' attribute emits one of 'ram', 'pmem', - 'mixed', or 'none'. The 'mixed' indication is for error cases - when a decoder straddles the volatile/persistent partition - boundary, and 'none' indicates the decoder is not actively - decoding, or no DPA allocation policy has been set. + into a 'ram' (volatile memory) range, 'pmem' (persistent + memory) range, or Dynamic Capacity (DC) range. The 'mode' + attribute emits one of 'ram', 'pmem', 'dcY', 'mixed', or + 'none'. The 'mixed' indication is for error cases when a + decoder straddles the volatile/persistent partition boundary, + and 'none' indicates the decoder is not actively decoding, or + no DPA allocation policy has been set. 'mode' can be written, when the decoder is in the 'disabled' - state, with either 'ram' or 'pmem' to set the boundaries for the - next allocation. + state, with 'ram', 'pmem', or 'dcY' to set the boundaries for + the next allocation. What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index a254f79dd4e8..3f4af1f5fac8 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -267,6 +267,19 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) __cxl_dpa_release(cxled); } +static int dc_mode_to_region_index(enum cxl_decoder_mode mode) +{ + int index = 0; + + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) { + if (mode == i) + return index; + index++; + } + + return -EINVAL; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -429,6 +442,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, switch (mode) { case CXL_DECODER_RAM: case CXL_DECODER_PMEM: + case CXL_DECODER_DC0 ... CXL_DECODER_DC7: break; default: dev_dbg(dev, "unsupported mode: %d\n", mode); @@ -456,6 +470,16 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, goto out; } + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) { + int index = dc_mode_to_region_index(i); + + if (mode == i && !resource_size(&cxlds->dc_res[index])) { + dev_dbg(dev, "no available dynamic capacity\n"); + rc = -ENXIO; + goto out; + } + } + cxled->mode = mode; rc = 0; out: diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index f58cf01f8d2c..ce4a66865db3 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -197,6 +197,22 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr, mode = CXL_DECODER_PMEM; else if (sysfs_streq(buf, "ram")) mode = CXL_DECODER_RAM; + else if (sysfs_streq(buf, "dc0")) + mode = CXL_DECODER_DC0; + else if (sysfs_streq(buf, "dc1")) + mode = CXL_DECODER_DC1; + else if (sysfs_streq(buf, "dc2")) + mode = CXL_DECODER_DC2; + else if (sysfs_streq(buf, "dc3")) + mode = CXL_DECODER_DC3; + else if (sysfs_streq(buf, "dc4")) + mode = CXL_DECODER_DC4; + else if (sysfs_streq(buf, "dc5")) + mode = CXL_DECODER_DC5; + else if (sysfs_streq(buf, "dc6")) + mode = CXL_DECODER_DC6; + else if (sysfs_streq(buf, "dc7")) + mode = CXL_DECODER_DC7; else return -EINVAL; From patchwork Fri Aug 16 14:08:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766382 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5669F1BE235; Fri, 16 Aug 2024 14:08:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817307; cv=none; b=mYw/gkqjnrNVmDXlJ59IXHTjqYL9ro4wcXn1q+OCz4r8Ylw9gAEC5BkBJvs+qGTbG/F01gzookRC54fMYgmDXWVJ0p4aW2j0ylUK7B6JhMXWUYoR1kq5Pj5fS75kNS/ZvMI/ixJGukceJGXhEzXUj+3neqI718LzCsVvosYjCr8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817307; c=relaxed/simple; bh=vZCrYDGgJsqWw4zWKqT6JaErepZLQw01/HjqmWA8bGw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=OqBibxuUyO/T4eC7iyipkST/P82OcUnBgkm1pSDlBcy8bkVoYCdGh7w8Uv7LYt2Xy01N4nGuvxF/8GzkpHMi3yajSVy5iYLpCYGAv2s1GF7yJnA+O9jE5ADnr/Sl6V21ScNDvTYET8SRnW4gIWS2zqtFy69HxAtjJLZ1OFMf0LA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kH/EoOza; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kH/EoOza" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817306; x=1755353306; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=vZCrYDGgJsqWw4zWKqT6JaErepZLQw01/HjqmWA8bGw=; b=kH/EoOzanT0SDyIq5cS/dxUsNPnVHIAqKKutzqkvMtoT2lk8AUbL4I8J H9DTs3+BlLXVLZCox57akT4hWVmrFK8/nPtrD/TLGLEtjaP7NFBnM+7z0 un9VDK6mD5aab/5/LNVLTL70rjx7C4q9KnzR/seOvWkxAKLlxfAM5Obnx oKFyC9qspIt1p5ginM2cxkme9f70Qy7O2dS4qIbOEAQt1vlpZGnddMuk1 1UJnx/9K14cukyCjPBcx+Da/7tkkjGJ8XH1Qev20I82uOg1rSlLn4GYh2 iPdkm0KFrVubyW34MQpmuhCICTJxeaPg1AJ8XMqEHtO80jO2zToNeYCRd A==; X-CSE-ConnectionGUID: PNb+koRHRwCcN6tEnCaHaA== X-CSE-MsgGUID: 0f7Fwt93RQSRCN0Iqk9MVw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22260936" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22260936" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:25 -0700 X-CSE-ConnectionGUID: kBqN+tubT+mREEettzWBfg== X-CSE-MsgGUID: CoP6vh28SWa8Ko/jAsD0kQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="59847818" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:24 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:11 -0500 Subject: [PATCH RESEND v2 06/18] cxl/port: Add Dynamic Capacity size support to endpoint decoders Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-6-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=14411; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=vZCrYDGgJsqWw4zWKqT6JaErepZLQw01/HjqmWA8bGw=; b=ukWikDPl1wcEi2l9/ZKbkCuj0q9VTJyfrif6LB+5oDRXKWTC0FTMwDdlYCREvYJrVhGlWcVro AIK0/BVe9A8D8Pe7xKjHupO8gY66yn9zx/yoK+c/rsLHVUvsDED4zgd X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= To support Dynamic Capacity Devices (DCD) endpoint decoders will need to map DC Regions (partitions). Part of this is assigning the size of the DC Region DPA to the decoder in addition to any skip value from the previous decoder which exists. This must be done within a continuous DPA space. Two complications arise with Dynamic Capacity regions which did not exist with Ram and PMEM partitions. First, gaps in the DPA space can exist between and around the DC Regions. Second, the Linux resource tree does not allow a resource to be marked across existing nodes within a tree. For clarity, below is an example of an 60GB device with 10GB of RAM, 10GB of PMEM and 10GB for each of 2 DC Regions. The desired CXL mapping is 5GB of RAM, 5GB of PMEM, and all 10GB of DC1. DPA RANGE (dpa_res) 0GB 10GB 20GB 30GB 40GB 50GB 60GB |----------|----------|----------|----------|----------|----------| RAM PMEM DC0 DC1 (ram_res) (pmem_res) (dc_res[0]) (dc_res[1]) |----------|----------| |----------| |----------| RAM PMEM DC1 |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXXXXXXX| 0GB 5GB 10GB 15GB 20GB 30GB 40GB 50GB 60GB The previous skip resource between RAM and PMEM was always a child of the RAM resource and fit nicely (see X below). Because of this simplicity this skip resource reference was not stored in any CXL state. On release the skip range could be calculated based on the endpoint decoders stored values. Now when DC1 is being mapped 4 skip resources must be created as children. One of the PMEM resource (A), two of the parent DPA resource (B,D), and one more child of the DC0 resource (C). 0GB 10GB 20GB 30GB 40GB 50GB 60GB |----------|----------|----------|----------|----------|----------| | | |----------|----------| | |----------| | |----------| | | | | | (X) (A) (B) (C) (D) v v v v v |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXXXXXXX| skip skip skip skip skip Expand the calculation of DPA freespace and enhance the logic to support mapping/unmapping DC DPA space. To track the potential of multiple skip resources an xarray is attached to the endpoint decoder. The existing algorithm is consolidated with the new one to store a single skip resource in the same way as multiple skip resources. Co-developed-by: Navneet Singh Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- An alternative of using reserve_region_with_split() was considered. The advantage of that would be keeping all the resource information stored solely in the resource tree rather than having separate references to them. However, it would best be implemented with a call such as release_split_region() [name TBD?] which could find all the leaf resources in the range and release them. Furthermore, it is not clear if reserve_region_with_split() is really intended for anything outside of init code. In the end this algorithm seems straight forward enough. Changes for v2: [iweiny: write commit message] [iweiny: remove unneeded changes] [iweiny: split from region creation patch] [iweiny: Alter skip algorithm to use 'anonymous regions'] [iweiny: enhance debug messages] [iweiny: consolidate skip resource creation] [iweiny: ensure xa_destroy() is called] [iweiny: consolidate region requests further] [iweiny: ensure resource is released on xa_insert] --- drivers/cxl/core/hdm.c | 188 +++++++++++++++++++++++++++++++++++++++++++----- drivers/cxl/core/port.c | 2 + drivers/cxl/cxl.h | 2 + 3 files changed, 176 insertions(+), 16 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 3f4af1f5fac8..3cd048677816 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -222,6 +222,25 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, CXL); +static void cxl_skip_release(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds; + struct cxl_port *port = cxled_to_port(cxled); + struct device *dev = &port->dev; + unsigned long index; + void *entry; + + xa_for_each(&cxled->skip_res, index, entry) { + struct resource *res = entry; + + dev_dbg(dev, "decoder%d.%d: releasing skipped space; %pr\n", + port->id, cxled->cxld.id, res); + __release_region(&cxlds->dpa_res, res->start, + resource_size(res)); + xa_erase(&cxled->skip_res, index); + } +} + /* * Must be called in a context that synchronizes against this decoder's * port ->remove() callback (like an endpoint decoder sysfs attribute) @@ -232,15 +251,11 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct resource *res = cxled->dpa_res; - resource_size_t skip_start; lockdep_assert_held_write(&cxl_dpa_rwsem); - /* save @skip_start, before @res is released */ - skip_start = res->start - cxled->skip; __release_region(&cxlds->dpa_res, res->start, resource_size(res)); - if (cxled->skip) - __release_region(&cxlds->dpa_res, skip_start, cxled->skip); + cxl_skip_release(cxled); cxled->skip = 0; cxled->dpa_res = NULL; put_device(&cxled->cxld.dev); @@ -280,6 +295,98 @@ static int dc_mode_to_region_index(enum cxl_decoder_mode mode) return -EINVAL; } +static int cxl_request_skip(struct cxl_endpoint_decoder *cxled, + resource_size_t skip_base, resource_size_t skip_len) +{ + struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds; + const char *name = dev_name(&cxled->cxld.dev); + struct cxl_port *port = cxled_to_port(cxled); + struct resource *dpa_res = &cxlds->dpa_res; + struct device *dev = &port->dev; + struct resource *res; + int rc; + + res = __request_region(dpa_res, skip_base, skip_len, name, 0); + if (!res) + return -EBUSY; + + rc = xa_insert(&cxled->skip_res, skip_base, res, GFP_KERNEL); + if (rc) { + __release_region(dpa_res, skip_base, skip_len); + return rc; + } + + dev_dbg(dev, "decoder%d.%d: skipped space; %pr\n", + port->id, cxled->cxld.id, res); + return 0; +} + +static int cxl_reserve_dpa_skip(struct cxl_endpoint_decoder *cxled, + resource_size_t base, resource_size_t skipped) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *port = cxled_to_port(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + resource_size_t skip_base = base - skipped; + resource_size_t size, skip_len = 0; + struct device *dev = &port->dev; + int rc, index; + + size = resource_size(&cxlds->ram_res); + if (size && skip_base <= cxlds->ram_res.end) { + skip_len = cxlds->ram_res.end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + if (skip_base == base) { + dev_dbg(dev, "skip done!\n"); + return 0; + } + + size = resource_size(&cxlds->pmem_res); + if (size && skip_base <= cxlds->pmem_res.end) { + skip_len = cxlds->pmem_res.end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + index = dc_mode_to_region_index(cxled->mode); + for (int i = 0; i <= index; i++) { + struct resource *dcr = &cxlds->dc_res[i]; + + if (skip_base < dcr->start) { + skip_len = dcr->start - skip_base; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + if (skip_base == base) { + dev_dbg(dev, "skip done!\n"); + break; + } + + if (resource_size(dcr) && skip_base <= dcr->end) { + if (skip_base > base) + dev_err(dev, "Skip error\n"); + + skip_len = dcr->end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + } + + return 0; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -317,13 +424,12 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, } if (skipped) { - res = __request_region(&cxlds->dpa_res, base - skipped, skipped, - dev_name(&cxled->cxld.dev), 0); - if (!res) { - dev_dbg(dev, - "decoder%d.%d: failed to reserve skipped space\n", - port->id, cxled->cxld.id); - return -EBUSY; + int rc = cxl_reserve_dpa_skip(cxled, base, skipped); + + if (rc) { + dev_dbg(dev, "decoder%d.%d: failed to reserve skipped space; %#llx - %#llx\n", + port->id, cxled->cxld.id, base, skipped); + return rc; } } res = __request_region(&cxlds->dpa_res, base, len, @@ -331,14 +437,20 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, if (!res) { dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n", port->id, cxled->cxld.id); - if (skipped) - __release_region(&cxlds->dpa_res, base - skipped, - skipped); + cxl_skip_release(cxled); return -EBUSY; } cxled->dpa_res = res; cxled->skip = skipped; + for (int mode = CXL_DECODER_DC0; mode <= CXL_DECODER_DC7; mode++) { + int index = dc_mode_to_region_index(mode); + + if (resource_contains(&cxlds->dc_res[index], res)) { + cxled->mode = mode; + goto success; + } + } if (resource_contains(&cxlds->pmem_res, res)) cxled->mode = CXL_DECODER_PMEM; else if (resource_contains(&cxlds->ram_res, res)) @@ -349,6 +461,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->mode = CXL_DECODER_MIXED; } +success: + dev_dbg(dev, "decoder%d.%d: %pr mode: %d\n", port->id, cxled->cxld.id, + cxled->dpa_res, cxled->mode); port->hdm_end++; get_device(&cxled->cxld.dev); return 0; @@ -492,11 +607,13 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled, resource_size_t *start_out, resource_size_t *skip_out) { + resource_size_t free_ram_start, free_pmem_start, free_dc_start; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - resource_size_t free_ram_start, free_pmem_start; struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *dev = &cxled->cxld.dev; resource_size_t start, avail, skip; struct resource *p, *last; + int index; lockdep_assert_held(&cxl_dpa_rwsem); @@ -514,6 +631,20 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled, else free_pmem_start = cxlds->pmem_res.start; + /* + * Limit each decoder to a single DC region to map memory with + * different DSMAS entry. + */ + index = dc_mode_to_region_index(cxled->mode); + if (index >= 0) { + if (cxlds->dc_res[index].child) { + dev_err(dev, "Cannot allocate DPA from DC Region: %d\n", + index); + return -EINVAL; + } + free_dc_start = cxlds->dc_res[index].start; + } + if (cxled->mode == CXL_DECODER_RAM) { start = free_ram_start; avail = cxlds->ram_res.end - start + 1; @@ -535,6 +666,29 @@ static resource_size_t cxl_dpa_freespace(struct cxl_endpoint_decoder *cxled, else skip_end = start - 1; skip = skip_end - skip_start + 1; + } else if (cxl_decoder_mode_is_dc(cxled->mode)) { + resource_size_t skip_start, skip_end; + + start = free_dc_start; + avail = cxlds->dc_res[index].end - start + 1; + if ((resource_size(&cxlds->pmem_res) == 0) || !cxlds->pmem_res.child) + skip_start = free_ram_start; + else + skip_start = free_pmem_start; + /* + * If any dc region is already mapped, then that allocation + * already handled the RAM and PMEM skip. Check for DC region + * skip. + */ + for (int i = index - 1; i >= 0 ; i--) { + if (cxlds->dc_res[i].child) { + skip_start = cxlds->dc_res[i].child->end + 1; + break; + } + } + + skip_end = start - 1; + skip = skip_end - skip_start + 1; } else { dev_dbg(cxled_dev(cxled), "mode not set\n"); avail = 0; @@ -572,6 +726,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) avail = cxl_dpa_freespace(cxled, &start, &skip); + dev_dbg(dev, "DPA Allocation start: %llx len: %llx Skip: %llx\n", + start, size, skip); if (size > avail) { dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, cxl_decoder_mode_name(cxled->mode), &avail); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index ce4a66865db3..a5db710a63bc 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -413,6 +413,7 @@ static void cxl_endpoint_decoder_release(struct device *dev) struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); __cxl_decoder_release(&cxled->cxld); + xa_destroy(&cxled->skip_res); kfree(cxled); } @@ -1769,6 +1770,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) return ERR_PTR(-ENOMEM); cxled->pos = -1; + xa_init(&cxled->skip_res); cxld = &cxled->cxld; rc = cxl_decoder_init(port, cxld); if (rc) { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index d41f3f14fbe3..0a225b0c20bf 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -433,6 +433,7 @@ enum cxl_decoder_state { * @cxld: base cxl_decoder_object * @dpa_res: actively claimed DPA span of this decoder * @skip: offset into @dpa_res where @cxld.hpa_range maps + * @skip_res: array of skipped resources from the previous decoder end * @mode: which memory type / access-mode-partition this decoder targets * @state: autodiscovery state * @pos: interleave position in @cxld.region @@ -441,6 +442,7 @@ struct cxl_endpoint_decoder { struct cxl_decoder cxld; struct resource *dpa_res; resource_size_t skip; + struct xarray skip_res; enum cxl_decoder_mode mode; enum cxl_decoder_state state; int pos; From patchwork Fri Aug 16 14:08:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766383 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F3EC1BE85C; Fri, 16 Aug 2024 14:08:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817309; cv=none; b=jQk/t4jJlPt3oXMb1oH2acU+6wGYGysuMuXMsyebWPbc1Hd1rjYmj90BPXmiP83jNp4F7cOmF44k6L8ykQ3QpITiARxsBDgU6l4pt/EPXvk1rf7JvLzc8rXFzQMfX1gOHehrOYY+jJeU9PVRTNL8XuA18ug3KKIU9cyruIsESf8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817309; c=relaxed/simple; bh=jaz45Cg2pqMvlzzf47/HowZPJBI3aqLOkcKiZz6RG0k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MuXiHyhQ3sbCA+OxdKUgExlQs0hpF9enpEHf9qvIvdsVeSVe0fcVYTQRXPzbgDn0rwxUPI4zeIugqcBQ1Aeu7SVR0rml7UVIfnQ5wTZ8pH6UTrQVgs9DVicB2cmj9Qj1UFPIxjT+iFxiorP4UzlPqEmUMmI7W3qQ36wMuhKP82Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jaJNFyas; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jaJNFyas" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817308; x=1755353308; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=jaz45Cg2pqMvlzzf47/HowZPJBI3aqLOkcKiZz6RG0k=; b=jaJNFyasv3ZXDj1qeih6za6v30ahTe0LUjQBic/zv07aTWZLuc5QxVx5 ojCdf+iXk9FoALZqsUKRxHo5Ms61ulD4gLTUma2PWL2aGiyeusAlJM/T2 SFYEMqe9U0qMeqy/OFZFj4AWw7uoocrH4wIwVzjGfAuvLsYSnuXb/iCnE H6LXn04EzSFFtsbAN68Zm23OURBj9y9IGFucUGTqnGMU6JzEbUIL87i12 Hi+Xa8JwwbCH+O/sBYsLYQUNIBttoFzTMixISfLS2V8rjm6D/BM6J1FDb GCtrvGr9dlA8C6ogBjr+qe4M9YBQ812CaDNjh0oTggUzsCHn+yJPQkbZe A==; X-CSE-ConnectionGUID: nyeEnjtaRUy95nHu4uaVpg== X-CSE-MsgGUID: Af0fhp6OQbCHRjEb4t+o8g== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22260942" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22260942" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:27 -0700 X-CSE-ConnectionGUID: 5RmTBmhqR+mgIfBlgqdd1g== X-CSE-MsgGUID: QJPeDAL+Rgi7x70810/Mhw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="59847830" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:27 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 09:08:12 -0500 Subject: [PATCH RESEND v2 07/18] cxl/mem: Expose device dynamic capacity configuration Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-7-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=5256; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=nBWifJ1VLMZQWZNywKm9WXbBuw4hP48Bp6JqPV2qOa8=; b=qKaSn7tI+sNOohsg7Z/Zj88wYKmhinqyp+TDcECXuE1QKo0b5uhoZ/szCRelUX7agCli4txtO Hfk0dY1QqOsCFBTj+b2EmZKRwkv60K2k2hJpf6DyicU5s+zMIhI6dw8 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh To properly configure CXL regions on Dynamic Capacity Devices (DCD), user space will need to know the details of the DC Regions available on a device. Expose driver dynamic capacity configuration through sysfs attributes. Co-developed-by: Navneet Singh Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- Changes for v2: [iweiny: Rebased on latest master/type2 work] [iweiny: add documentation for sysfs entries] [iweiny: s/dc_regions_count/region_count/] [iweiny: s/dcY_size/regionY_size/] [alison: change size format to %#llx] [iweiny: change count format to %d] [iweiny: Formatting updates] [iweiny: Fix crash when device is not a mem device: found with cxl-test] --- Documentation/ABI/testing/sysfs-bus-cxl | 17 ++++++++ drivers/cxl/core/memdev.c | 77 +++++++++++++++++++++++++++++++++ 2 files changed, 94 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 2268ffcdb604..aa65dc5b4e13 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -37,6 +37,23 @@ Description: identically named field in the Identify Memory Device Output Payload in the CXL-2.0 specification. +What: /sys/bus/cxl/devices/memX/dc/region_count +Date: July, 2023 +KernelVersion: v6.6 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) Number of Dynamic Capacity (DC) regions supported on the + device. May be 0 if the device does not support Dynamic + Capacity. + +What: /sys/bus/cxl/devices/memX/dc/regionY_size +Date: July, 2023 +KernelVersion: v6.6 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) Size of the Dynamic Capacity (DC) region Y. Only + available on devices which support DC and only for those + region indexes supported by the device. What: /sys/bus/cxl/devices/memX/serial Date: January, 2022 diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 492486707fd0..397262e0ebd2 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -101,6 +101,20 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr, static struct device_attribute dev_attr_pmem_size = __ATTR(size, 0444, pmem_size_show, NULL); +static ssize_t region_count_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + int len = 0; + + len = sysfs_emit(buf, "%d\n", mds->nr_dc_region); + return len; +} + +struct device_attribute dev_attr_region_count = + __ATTR(region_count, 0444, region_count_show, NULL); + static ssize_t serial_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -454,6 +468,62 @@ static struct attribute *cxl_memdev_security_attributes[] = { NULL, }; +static ssize_t show_size_regionN(struct cxl_memdev *cxlmd, char *buf, int pos) +{ + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + return sysfs_emit(buf, "%#llx\n", mds->dc_region[pos].decode_len); +} + +#define REGION_SIZE_ATTR_RO(n) \ +static ssize_t region##n##_size_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return show_size_regionN(to_cxl_memdev(dev), buf, (n)); \ +} \ +static DEVICE_ATTR_RO(region##n##_size) +REGION_SIZE_ATTR_RO(0); +REGION_SIZE_ATTR_RO(1); +REGION_SIZE_ATTR_RO(2); +REGION_SIZE_ATTR_RO(3); +REGION_SIZE_ATTR_RO(4); +REGION_SIZE_ATTR_RO(5); +REGION_SIZE_ATTR_RO(6); +REGION_SIZE_ATTR_RO(7); + +static struct attribute *cxl_memdev_dc_attributes[] = { + &dev_attr_region0_size.attr, + &dev_attr_region1_size.attr, + &dev_attr_region2_size.attr, + &dev_attr_region3_size.attr, + &dev_attr_region4_size.attr, + &dev_attr_region5_size.attr, + &dev_attr_region6_size.attr, + &dev_attr_region7_size.attr, + &dev_attr_region_count.attr, + NULL, +}; + +static umode_t cxl_dc_visible(struct kobject *kobj, struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + /* Not a memory device */ + if (!mds) + return 0; + + if (a == &dev_attr_region_count.attr) + return a->mode; + + if (n < mds->nr_dc_region) + return a->mode; + + return 0; +} + static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a, int n) { @@ -482,11 +552,18 @@ static struct attribute_group cxl_memdev_security_attribute_group = { .attrs = cxl_memdev_security_attributes, }; +static struct attribute_group cxl_memdev_dc_attribute_group = { + .name = "dc", + .attrs = cxl_memdev_dc_attributes, + .is_visible = cxl_dc_visible, +}; + static const struct attribute_group *cxl_memdev_attribute_groups[] = { &cxl_memdev_attribute_group, &cxl_memdev_ram_attribute_group, &cxl_memdev_pmem_attribute_group, &cxl_memdev_security_attribute_group, + &cxl_memdev_dc_attribute_group, NULL, }; From patchwork Fri Aug 16 14:08:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766384 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B20451C3796; Fri, 16 Aug 2024 14:08:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817311; cv=none; b=ZoX/0GLwY72j2X26ID3jwinWyxcScpxUWdFjMGwkkNz8yt6ZM16aEADYiHHc46KrBFENaqX+sduve33oI4G4ZZ57g2KGjepFc9Q9Q4iNByVOInc1AnrIb8WOtYIHhEdaJ0QQcvRwmHCm6YbWZICoJGpAVQQc8eYqN/q995Brw2Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817311; c=relaxed/simple; bh=bUoJ9pa8OJ8hZs/f1jveA0XXJDHOi7Kwe11VYvDkNLY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=WFAhiU3fb2gjA/+QN561MnvCLGXU+yc03efekXb3gRId4UkIP64bkg61ICrHnni/6nAY7KM5HBkWxx92lLNO+bvuXGgpwqljwXY6FZ6x/KKEgJdc44s4fjfcECMrTitl5bjTEJgZYJGSjotU6hjgYgh0B8ycqupVvw9Mem9jTAA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Hrx2u2SY; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Hrx2u2SY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817310; x=1755353310; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=bUoJ9pa8OJ8hZs/f1jveA0XXJDHOi7Kwe11VYvDkNLY=; b=Hrx2u2SY1Li8byLi8e6Zl9ML6X7fBA8DTlfXaV/Sk7h4Piap5gFUKrTd DogWdWWjtt64Bpb7UGjZ/9dLm5gvqCcGEHwBiXHcdQYUQ3O9KWJwPP2i8 4B4mFrE2j9MNqFs/Kx3f3Ll+RBNLvsgXLyZY/pOKs1FuQxezE64cFSnR7 dHa267HXAIztcC/ZmURav+jvViH5vGJ7G2aCA7hYZMgppOEoVXNv1ReMb O5paAKR3jghh5xPC4B9DgrcGDT4SJYiFBtKKAIdjsnDgie6qXLvEsAxHt 41KNAhLXuXcjBDTBXbUy3uqdUVRfx4sDD0sLBhgpJy/6PqvUj43/mPeTk A==; X-CSE-ConnectionGUID: E1Ko29mlQyCYe4y4VmmLhQ== X-CSE-MsgGUID: EgoTHr/VShq4DBBGAv8lwQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085241" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085241" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:29 -0700 X-CSE-ConnectionGUID: xoi8OM+mRMiLAWDVH/TlAA== X-CSE-MsgGUID: 31NgMwp1QI2x5RirxRMlng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571556" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:28 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:13 -0500 Subject: [PATCH RESEND v2 08/18] cxl/region: Add Dynamic Capacity CXL region support Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-8-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=10852; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=bUoJ9pa8OJ8hZs/f1jveA0XXJDHOi7Kwe11VYvDkNLY=; b=KwuCGu/PlxuEPtlkH0W8K7euvdRez928BJJiptZez1y/adH8Dd1YdqarHabt1ZeI6icAb/Rbz NIFjAaTqS78As6WZzZGE+T9trIRc8h8ps/6vYTj3hz6M7vnucKTeeeD X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= CXL devices optionally support dynamic capacity. CXL Regions must be configured correctly to access this capacity. Similar to ram and pmem partitions, DC Regions represent different partitions of the DPA space. Interleaving is deferred due to the complexity of managing extents on multiple devices at the same time. However, there is nothing which directly prevents interleave support at this time. The check allows for early rejection. To maintain backwards compatibility with older software, CXL regions need a default DAX device to hold the reference for the region until it is deleted. Add create_dc_region sysfs entry to create DC regions. Share the logic of devm_cxl_add_dax_region() and region_is_system_ram(). Special case DC capable CXL regions to create a 0 sized seed DAX device until others can be created on dynamic space later. Flag dax_regions to indicate 0 capacity available until dax_region extents are supported by the region. Co-developed-by: Navneet Singh Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- changes for v2: [iweiny: flag empty dax regions] [iweiny: Split out anything not directly related to creating a DC CXL region] [iweiny: Separate out dev dax stuff] [iweiny/navneet: create 0 sized DAX device by default] [iweiny: use new DC region mode] --- Documentation/ABI/testing/sysfs-bus-cxl | 20 +++++----- drivers/cxl/core/core.h | 1 + drivers/cxl/core/port.c | 1 + drivers/cxl/core/region.c | 71 ++++++++++++++++++++++++++++----- drivers/dax/bus.c | 8 ++++ drivers/dax/bus.h | 1 + drivers/dax/cxl.c | 15 ++++++- 7 files changed, 96 insertions(+), 21 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index aa65dc5b4e13..a0562938ecac 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -351,20 +351,20 @@ Description: interleave_granularity). -What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region +What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dc}_region Date: May, 2022, January, 2023 -KernelVersion: v6.0 (pmem), v6.3 (ram) +KernelVersion: v6.0 (pmem), v6.3 (ram), v6.6 (dc) Contact: linux-cxl@vger.kernel.org Description: (RW) Write a string in the form 'regionZ' to start the process - of defining a new persistent, or volatile memory region - (interleave-set) within the decode range bounded by root decoder - 'decoderX.Y'. The value written must match the current value - returned from reading this attribute. An atomic compare exchange - operation is done on write to assign the requested id to a - region and allocate the region-id for the next creation attempt. - EBUSY is returned if the region name written does not match the - current cached value. + of defining a new persistent, volatile, or Dynamic Capacity + (DC) memory region (interleave-set) within the decode range + bounded by root decoder 'decoderX.Y'. The value written must + match the current value returned from reading this attribute. + An atomic compare exchange operation is done on write to assign + the requested id to a region and allocate the region-id for the + next creation attempt. EBUSY is returned if the region name + written does not match the current cached value. What: /sys/bus/cxl/devices/decoderX.Y/delete_region diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 45e7e044cf4a..cf3cf01cb95d 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -13,6 +13,7 @@ extern struct attribute_group cxl_base_attribute_group; #ifdef CONFIG_CXL_REGION extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; +extern struct device_attribute dev_attr_create_dc_region; extern struct device_attribute dev_attr_delete_region; extern struct device_attribute dev_attr_region; extern const struct device_type cxl_pmem_region_type; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index a5db710a63bc..608901bb7d91 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -314,6 +314,7 @@ static struct attribute *cxl_decoder_root_attrs[] = { &dev_attr_target_list.attr, SET_CXL_REGION_ATTR(create_pmem_region) SET_CXL_REGION_ATTR(create_ram_region) + SET_CXL_REGION_ATTR(create_dc_region) SET_CXL_REGION_ATTR(delete_region) NULL, }; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 69af1354bc5b..fc8dee469244 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2271,6 +2271,7 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, switch (mode) { case CXL_REGION_RAM: case CXL_REGION_PMEM: + case CXL_REGION_DC: break; default: dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n", @@ -2383,6 +2384,33 @@ static ssize_t create_ram_region_store(struct device *dev, } DEVICE_ATTR_RW(create_ram_region); +static ssize_t create_dc_region_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return __create_region_show(to_cxl_root_decoder(dev), buf); +} + +static ssize_t create_dc_region_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + struct cxl_region *cxlr; + int rc, id; + + rc = sscanf(buf, "region%d\n", &id); + if (rc != 1) + return -EINVAL; + + cxlr = __create_region(cxlrd, id, CXL_REGION_DC, + CXL_DECODER_HOSTONLYMEM); + if (IS_ERR(cxlr)) + return PTR_ERR(cxlr); + + return len; +} +DEVICE_ATTR_RW(create_dc_region); + static ssize_t region_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2834,7 +2862,7 @@ static void cxlr_dax_unregister(void *_cxlr_dax) device_unregister(&cxlr_dax->dev); } -static int devm_cxl_add_dax_region(struct cxl_region *cxlr) +static int __devm_cxl_add_dax_region(struct cxl_region *cxlr) { struct cxl_dax_region *cxlr_dax; struct device *dev; @@ -2863,6 +2891,21 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr) return rc; } +static int devm_cxl_add_dax_region(struct cxl_region *cxlr) +{ + return __devm_cxl_add_dax_region(cxlr); +} + +static int devm_cxl_add_dc_dax_region(struct cxl_region *cxlr) +{ + if (cxlr->params.interleave_ways != 1) { + dev_err(&cxlr->dev, "Interleaving DC not supported\n"); + return -EINVAL; + } + + return __devm_cxl_add_dax_region(cxlr); +} + static int match_decoder_by_range(struct device *dev, void *data) { struct range *r1, *r2 = data; @@ -3203,6 +3246,19 @@ static int is_system_ram(struct resource *res, void *arg) return 1; } +/* + * The region can not be manged by CXL if any portion of + * it is already online as 'System RAM' + */ +static bool region_is_system_ram(struct cxl_region *cxlr, + struct cxl_region_params *p) +{ + return (walk_iomem_res_desc(IORES_DESC_NONE, + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, + p->res->start, p->res->end, cxlr, + is_system_ram) > 0); +} + static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); @@ -3242,14 +3298,7 @@ static int cxl_region_probe(struct device *dev) case CXL_REGION_PMEM: return devm_cxl_add_pmem_region(cxlr); case CXL_REGION_RAM: - /* - * The region can not be manged by CXL if any portion of - * it is already online as 'System RAM' - */ - if (walk_iomem_res_desc(IORES_DESC_NONE, - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, - p->res->start, p->res->end, cxlr, - is_system_ram) > 0) + if (region_is_system_ram(cxlr, p)) return 0; /* @@ -3261,6 +3310,10 @@ static int cxl_region_probe(struct device *dev) /* HDM-H routes to device-dax */ return devm_cxl_add_dax_region(cxlr); + case CXL_REGION_DC: + if (region_is_system_ram(cxlr, p)) + return 0; + return devm_cxl_add_dc_dax_region(cxlr); default: dev_dbg(&cxlr->dev, "unsupported region mode: %s\n", cxl_region_mode_name(cxlr->mode)); diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 0ee96e6fc426..b76e49813a39 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -169,6 +169,11 @@ static bool is_static(struct dax_region *dax_region) return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0; } +static bool is_dynamic(struct dax_region *dax_region) +{ + return (dax_region->res.flags & IORESOURCE_DAX_DYNAMIC_CAP) != 0; +} + bool static_dev_dax(struct dev_dax *dev_dax) { return is_static(dev_dax->region); @@ -285,6 +290,9 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region) device_lock_assert(dax_region->dev); + if (is_dynamic(dax_region)) + return 0; + for_each_dax_region_resource(dax_region, res) size -= resource_size(res); return size; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 1ccd23360124..74d8fe4a5532 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -13,6 +13,7 @@ struct dax_region; /* dax bus specific ioresource flags */ #define IORESOURCE_DAX_STATIC BIT(0) #define IORESOURCE_DAX_KMEM BIT(1) +#define IORESOURCE_DAX_DYNAMIC_CAP BIT(2) struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 8bc9d04034d6..147c8c69782b 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -13,19 +13,30 @@ static int cxl_dax_region_probe(struct device *dev) struct cxl_region *cxlr = cxlr_dax->cxlr; struct dax_region *dax_region; struct dev_dax_data data; + resource_size_t dev_size; + unsigned long flags; if (nid == NUMA_NO_NODE) nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start); + dev_size = range_len(&cxlr_dax->hpa_range); + + flags = IORESOURCE_DAX_KMEM; + if (cxlr->mode == CXL_REGION_DC) { + /* Add empty seed dax device */ + dev_size = 0; + flags |= IORESOURCE_DAX_DYNAMIC_CAP; + } + dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid, - PMD_SIZE, IORESOURCE_DAX_KMEM); + PMD_SIZE, flags); if (!dax_region) return -ENOMEM; data = (struct dev_dax_data) { .dax_region = dax_region, .id = -1, - .size = range_len(&cxlr_dax->hpa_range), + .size = dev_size, }; return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data)); From patchwork Fri Aug 16 14:08:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766385 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50FEA1C37B2; Fri, 16 Aug 2024 14:08:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817313; cv=none; b=Ea2cv5F9lcfd67jFUx7bZqlBG6eNfYnOafcwa4JbRF5Cnwtv+shxVc0gYUqOENMJF/GQaspxahMaI8F0/6+d5j9CgI1OFR45/G5Hs4Vp6yLNVNsBToIEbuRyvrkobGsMTIXmXRBzUofuEsEV9p1KBAYid0wvSc+u0jee+YpXCRc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817313; c=relaxed/simple; bh=SCE7K6VtxQclQDx39FbDnKWrpMjPz1vZO2RhDR+ocNI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=s+nPoEVQx2QbFcnIbl60CPfCHOcOALyXYuSi75bTP611uMzOCWYTiUyJlq+tec/90N6MXPUzRclgK1oNxBemohKSVV1BxMEcRJDfVY9pC+mknfOrH0toknHUQ2NY9kgd6/vkoyv1hXpTHb874jbEQqfa70Hdiy/B71ghh/1Ot6g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IZQT2kgc; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IZQT2kgc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817312; x=1755353312; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=SCE7K6VtxQclQDx39FbDnKWrpMjPz1vZO2RhDR+ocNI=; b=IZQT2kgcWnkMXprFxmo6tBS8AqXk33y2nxZb5157SJl0NHpsSyn9IxoJ LG3DFIE8PVSrMMiEVwfXIgCWsh+Jxds1PV3Aj+o6p5taBfh3xbTEfbZCg JFiBXQRlLGjdArzcCJSSP/X8RwiTJYs/9u62e/FtGQJtSFl/IlOheQuVF RwiUBtgF2MI/6sHRbLNGpJld4QHwaVCjuMjJEXFdABU5lJIlWkwoCwtST FSvX1EWj7QEq9xPf8oLoH3cQeyYxSkW5YvI9w1QJz4HUTmDQViPb/zX+d T5UDhIUazjOyq/k0hbItOYUg7S6jVKJqEn5IvohaX8/I13EZ3YBtOjeap g==; X-CSE-ConnectionGUID: CeoTJmuOT6WqiPdWz7Wmwg== X-CSE-MsgGUID: Gx210f/hRACZcxq++27HNQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085246" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085246" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:31 -0700 X-CSE-ConnectionGUID: e8Im7zRERpy4lOSW5gzzBw== X-CSE-MsgGUID: lUP6nyTKR8qfY0b4mZCMBA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571565" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:30 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:14 -0500 Subject: [PATCH RESEND v2 09/18] cxl/mem: Read extents on memory device discovery Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-9-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=10377; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=SCE7K6VtxQclQDx39FbDnKWrpMjPz1vZO2RhDR+ocNI=; b=mD6yeW8R6WSQns8krxn2Q/VE5alWbrmT3hmvcVSLxRo9u0BHpxTMeTDhw8OmQPrj4c6mlQ3ys oE3zCM1MeilAqcLmCE+58jz3Mg+JUBw/leX3GdWfe9Kac8jO1zxFhta X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= When a Dynamic Capacity Device (DCD) is realized some extents may already be available within the DC Regions. This can happen if the host has accepted extents and been rebooted or any other time the host driver software has become out of sync with the device hardware. Read the available extents during probe and store them for later use. Signed-off-by: Navneet Singh Co-developed-by: Navneet Singh Signed-off-by: Ira Weiny --- Change for v2: [iweiny: new patch] --- drivers/cxl/core/mbox.c | 195 ++++++++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/cxlmem.h | 36 +++++++++ drivers/cxl/pci.c | 4 + 3 files changed, 235 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index d769814f80e2..9b08c40ef484 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -824,6 +824,37 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL); +static int cxl_store_dc_extent(struct cxl_memdev_state *mds, + struct cxl_dc_extent *dc_extent) +{ + struct device *dev = mds->cxlds.dev; + struct cxl_dc_extent_data *extent; + int rc; + + extent = kzalloc(sizeof(*extent), GFP_KERNEL); + if (!extent) + return -ENOMEM; + + extent->dpa_start = le64_to_cpu(dc_extent->start_dpa); + extent->length = le64_to_cpu(dc_extent->length); + memcpy(extent->tag, dc_extent->tag, sizeof(extent->tag)); + extent->shared_extent_seq = le16_to_cpu(dc_extent->shared_extn_seq); + + dev_dbg(dev, "dynamic capacity extent DPA:0x%llx LEN:%llx\n", + extent->dpa_start, extent->length); + + rc = xa_insert(&mds->dc_extent_list, extent->dpa_start, extent, + GFP_KERNEL); + if (rc) { + if (rc == -EBUSY) + dev_warn_once(dev, "Duplicate extent DPA:%llx LEN:%llx\n", + extent->dpa_start, extent->length); + kfree(extent); + } + + return rc; +} + /* * General Media Event Record * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43 @@ -1339,6 +1370,149 @@ int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); +static int cxl_dev_get_dc_extent_cnt(struct cxl_memdev_state *mds, + unsigned int *extent_gen_num) +{ + struct cxl_mbox_get_dc_extent get_dc_extent; + struct cxl_mbox_dc_extents dc_extents; + struct device *dev = mds->cxlds.dev; + struct cxl_mbox_cmd mbox_cmd; + unsigned int count; + int rc; + + /* Check GET_DC_EXTENT_LIST is supported by device */ + if (!test_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds)) { + dev_dbg(dev, "unsupported cmd : get dyn cap extent list\n"); + return 0; + } + + get_dc_extent = (struct cxl_mbox_get_dc_extent) { + .extent_cnt = cpu_to_le32(0), + .start_extent_index = cpu_to_le32(0), + }; + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST, + .payload_in = &get_dc_extent, + .size_in = sizeof(get_dc_extent), + .size_out = mds->payload_size, + .payload_out = &dc_extents, + .min_out = 1, + }; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + return rc; + + count = le32_to_cpu(dc_extents.total_extent_cnt); + *extent_gen_num = le32_to_cpu(dc_extents.extent_list_num); + + return count; +} + +static int cxl_dev_get_dc_extents(struct cxl_memdev_state *mds, + unsigned int start_gen_num, + unsigned int exp_cnt) +{ + struct cxl_mbox_dc_extents *dc_extents; + unsigned int start_index, total_read; + struct device *dev = mds->cxlds.dev; + struct cxl_mbox_cmd mbox_cmd; + int retry = 3; + int rc; + + /* Check GET_DC_EXTENT_LIST is supported by device */ + if (!test_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds)) { + dev_dbg(dev, "unsupported cmd : get dyn cap extent list\n"); + return 0; + } + + dc_extents = kvmalloc(mds->payload_size, GFP_KERNEL); + if (!dc_extents) + return -ENOMEM; + +reset: + total_read = 0; + start_index = 0; + do { + unsigned int nr_ext, total_extent_cnt, gen_num; + struct cxl_mbox_get_dc_extent get_dc_extent; + + get_dc_extent = (struct cxl_mbox_get_dc_extent) { + .extent_cnt = exp_cnt - start_index, + .start_extent_index = start_index, + }; + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST, + .payload_in = &get_dc_extent, + .size_in = sizeof(get_dc_extent), + .size_out = mds->payload_size, + .payload_out = dc_extents, + .min_out = 1, + }; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + goto out; + + nr_ext = le32_to_cpu(dc_extents->ret_extent_cnt); + total_read += nr_ext; + total_extent_cnt = le32_to_cpu(dc_extents->total_extent_cnt); + gen_num = le32_to_cpu(dc_extents->extent_list_num); + + dev_dbg(dev, "Get extent list count:%d generation Num:%d\n", + total_extent_cnt, gen_num); + + if (gen_num != start_gen_num || exp_cnt != total_extent_cnt) { + dev_err(dev, "Extent list changed while reading; %u != %u : %u != %u\n", + gen_num, start_gen_num, exp_cnt, total_extent_cnt); + if (retry--) + goto reset; + return -EIO; + } + + for (int i = 0; i < nr_ext ; i++) { + dev_dbg(dev, "Storing extent %d/%d\n", + start_index + i, exp_cnt); + rc = cxl_store_dc_extent(mds, &dc_extents->extent[i]); + if (rc) + goto out; + } + + start_index += nr_ext; + } while (exp_cnt > total_read); + +out: + kvfree(dc_extents); + return rc; +} + +/** + * cxl_dev_get_dynamic_capacity_extents() - Reads the dynamic capacity + * extent list. + * @mds: The memory device state + * + * This will dispatch the get_dynamic_capacity_extent_list command to the device + * and on success add the extents to the host managed extent list. + * + * Return: 0 if command was executed successfully, -ERRNO on error. + */ +int cxl_dev_get_dynamic_capacity_extents(struct cxl_memdev_state *mds) +{ + unsigned int extent_gen_num; + int rc; + + rc = cxl_dev_get_dc_extent_cnt(mds, &extent_gen_num); + dev_dbg(mds->cxlds.dev, "Extent count: %d Generation Num: %d\n", + rc, extent_gen_num); + if (rc <= 0) /* 0 == no records found */ + return rc; + + return cxl_dev_get_dc_extents(mds, extent_gen_num, rc); +} +EXPORT_SYMBOL_NS_GPL(cxl_dev_get_dynamic_capacity_extents, CXL); + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) @@ -1530,9 +1704,23 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL); +static void cxl_destroy_mds(void *_mds) +{ + struct cxl_memdev_state *mds = _mds; + struct cxl_dc_extent_data *extent; + unsigned long index; + + xa_for_each(&mds->dc_extent_list, index, extent) { + xa_erase(&mds->dc_extent_list, index); + kfree(extent); + } + xa_destroy(&mds->dc_extent_list); +} + struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) { struct cxl_memdev_state *mds; + int rc; mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL); if (!mds) { @@ -1544,6 +1732,13 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mutex_init(&mds->event.log_lock); mds->cxlds.dev = dev; mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; + xa_init(&mds->dc_extent_list); + + rc = devm_add_action_or_reset(dev, cxl_destroy_mds, mds); + if (rc) { + dev_err(dev, "Failed to set up memdev state; %d\n", rc); + return ERR_PTR(rc); + } return mds; } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 8c8f47b397ab..ad690600c1b9 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -6,6 +6,7 @@ #include #include #include +#include #include "cxl.h" /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ @@ -509,6 +510,7 @@ struct cxl_memdev_state { u8 nr_dc_region; struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; size_t dc_event_log_size; + struct xarray dc_extent_list; struct cxl_event_state event; struct cxl_poison_state poison; @@ -749,6 +751,26 @@ struct cxl_event_mem_module { u8 reserved[0x3d]; } __packed; +#define CXL_DC_EXTENT_TAG_LEN 0x10 +struct cxl_dc_extent_data { + u64 dpa_start; + u64 length; + u8 tag[CXL_DC_EXTENT_TAG_LEN]; + u16 shared_extent_seq; +}; + +/* + * Dynamic Capacity Event Record + * CXL rev 3.0 section 8.2.9.2.1.5; Table 8-47 + */ +struct cxl_dc_extent { + __le64 start_dpa; + __le64 length; + u8 tag[CXL_DC_EXTENT_TAG_LEN]; + __le16 shared_extn_seq; + u8 reserved[6]; +} __packed; + struct cxl_mbox_get_partition_info { __le64 active_volatile_cap; __le64 active_persistent_cap; @@ -796,6 +818,19 @@ struct cxl_mbox_dynamic_capacity { #define CXL_REGIONS_RETURNED(size_out) \ ((size_out - 8) / sizeof(struct cxl_dc_region_config)) +struct cxl_mbox_get_dc_extent { + __le32 extent_cnt; + __le32 start_extent_index; +} __packed; + +struct cxl_mbox_dc_extents { + __le32 ret_extent_cnt; + __le32 total_extent_cnt; + __le32 extent_list_num; + u8 rsvd[4]; + struct cxl_dc_extent extent[]; +} __packed; + /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */ struct cxl_mbox_set_timestamp_in { __le64 timestamp; @@ -920,6 +955,7 @@ int cxl_internal_send_cmd(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd); int cxl_dev_state_identify(struct cxl_memdev_state *mds); int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds); +int cxl_dev_get_dynamic_capacity_extents(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); int cxl_mem_create_range_info(struct cxl_memdev_state *mds); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index a9b110ff1176..10c1a583113c 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -930,6 +930,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) dev_dbg(&pdev->dev, "No RAS reporting unmasked\n"); + rc = cxl_dev_get_dynamic_capacity_extents(mds); + if (rc) + return rc; + pci_save_state(pdev); return rc; From patchwork Fri Aug 16 14:08:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766386 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C16411C3F06; Fri, 16 Aug 2024 14:08:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817314; cv=none; b=e4S7nT6MY+PcRQy5Ci8FZaZeRhrotLF4itcT4pwLbOScmaGFdqeuG1QXu8pPzJAuaZ/sr5pM0pP8IK5NEiPKotVIE4wb/tUSlBY7o4EfRhJVc3nUyN2845FljW+MzoGHUPk0mZ50HBHTe/8g+wzq47dYVItmdAI8yDJ2E/104cM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817314; c=relaxed/simple; bh=EfWnf4VLfLpcCyt5zGNOVtk9o39fBwT3S9jyfxRNATo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=akHi/kQib62jLEPsl31yLrwiq9lNBTe10YkrGV/MyxBgR2fErgRp2nB2a53GKMil941rA3BBZ57yXXPLr2vB2MiNHf4Lf/pb9aPngwX8euBfqDvIbVfWvzi0gTQT5J2R3e/dGp71xqOfiw/WwQfgYIBYga3WDxMXrHBm4pVgyTY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kQLZYLrd; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kQLZYLrd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817313; x=1755353313; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=EfWnf4VLfLpcCyt5zGNOVtk9o39fBwT3S9jyfxRNATo=; b=kQLZYLrdS82WQt/+6ONe2OhRRpMiZ6zojg5k3EOJcPTQkdD14t+7IX3j OnqBCVySSo1+r84Y0WPwzRO5AY714pvTT8CQKeipUSwBDj4fYsaK/pbb2 Sj28A1P9hK2eG/2MyC43Cz7lsNR7fxptcW+r7exss/6QnRhFjkXF2ir/Y 3VgSvmnPaNMdtUFWRxc0927spPIYwKcr3KOYIwoTKZCTzvI/K3YnlyXZo 9Uhzm6bVHRNManCgPABlcDjyPadz7CkRoTM6o9pR7C+cG0Na+pkFDJ3xD Ii16yhNt4f+u4FtdTp26ECT/IckpFT7OjHNDyVSqPZKP1UvJ4c6ZeabqJ A==; X-CSE-ConnectionGUID: QBeISbWjSHyfAOqE6rY4LQ== X-CSE-MsgGUID: sTepQg2NT/ufOioOFy221A== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085262" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085262" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:33 -0700 X-CSE-ConnectionGUID: bi/Ds5o1TBitEG4sWah7mQ== X-CSE-MsgGUID: 3u3eohbYTXm5xb38PMWjww== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571575" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:31 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:15 -0500 Subject: [PATCH RESEND v2 10/18] cxl/mem: Handle DCD add and release capacity events. Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-10-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=12104; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=EfWnf4VLfLpcCyt5zGNOVtk9o39fBwT3S9jyfxRNATo=; b=vIagVjom/LbhOEzAwgFIJZ1RTIhbqeSqx+PqHO5/XK2N9k4xuRTVWSOI7C2QdEfW2kak1H1Cl PSdWgZU+Z9qDpRqSPItZteVwSwfzXQCzy4vk4FuvM1cH31jWuSND+s6 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= A Dynamic Capacity Device (DCD) utilizes events to signal the host about the changes to the allocation of Dynamic Capacity (DC) extents. The device communicates the state of DC extents through an extent list that describes the starting DPA, length, and meta data of the blocks the host can access. Process the dynamic capacity add and release events. The addition or removal of extents can occur at any time. Adding asynchronous memory is straight forward. Also remember the host is under no obligation to respond to a release event until it is done with the memory. Introduce extent kref's to handle the delay of extent release. In the case of a force removal, access to the memory will fail and may cause a crash. However, the extent tracking object is preserved for the region to safely tear down as long as the memory is not accessed. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- changes for v2: [iweiny: Totally new version of the patch] [iweiny: use kref to track when to release an extent] [iweiny: rebased to latest master/type2 work] [iweiny: use a kref to track if extents are being referenced] [alison: align commit message paragraphs] [alison: remove unnecessary return] [iweiny: Adjust for the new __devm_cxl_add_dax_region()] [navneet: Fix debug prints in adding/releasing extent] [alison: deal with odd if/else logic] [alison: reverse x-tree] [alison: reverse x-tree] [alison: s/total_extent_cnt/count/] [alison: make handle event reverse x-tree] [alison: cleanup/shorten/remove handle event comment] [iweiny/Alison: refactor cxl_handle_dcd_event_records function] [iweiny: keep cxl_dc_extent_data local to mbox.c] [jonathan: eliminate 'rc'] [iweiny: use proper type for mailbox size] [jonathan: put dc_extents on the stack] [jonathan: use direct returns instead of goto] [iweiny: Clean up comment] [Jonathan: define CXL_DC_EXTENT_TAG_LEN] [Jonathan: remove extraneous changes] [Jonathan: fix blank line issues] --- drivers/cxl/core/mbox.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxl.h | 9 +++ drivers/cxl/cxlmem.h | 30 ++++++++ 3 files changed, 224 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 9b08c40ef484..8474a28b16ca 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -839,6 +839,8 @@ static int cxl_store_dc_extent(struct cxl_memdev_state *mds, extent->length = le64_to_cpu(dc_extent->length); memcpy(extent->tag, dc_extent->tag, sizeof(extent->tag)); extent->shared_extent_seq = le16_to_cpu(dc_extent->shared_extn_seq); + kref_init(&extent->region_ref); + extent->mds = mds; dev_dbg(dev, "dynamic capacity extent DPA:0x%llx LEN:%llx\n", extent->dpa_start, extent->length); @@ -879,6 +881,14 @@ static const uuid_t mem_mod_event_uuid = UUID_INIT(0xfe927475, 0xdd59, 0x4339, 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74); +/* + * Dynamic Capacity Event Record + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45 + */ +static const uuid_t dc_event_uuid = + UUID_INIT(0xca95afa7, 0xf183, 0x4018, 0x8c, + 0x2f, 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a); + static void cxl_event_trace_record(const struct cxl_memdev *cxlmd, enum cxl_event_log_type type, struct cxl_event_record_raw *record) @@ -973,6 +983,171 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds, return rc; } +static int cxl_send_dc_cap_response(struct cxl_memdev_state *mds, + struct cxl_mbox_dc_response *res, + int extent_cnt, int opcode) +{ + struct cxl_mbox_cmd mbox_cmd; + size_t size; + + size = struct_size(res, extent_list, extent_cnt); + res->extent_list_size = cpu_to_le32(extent_cnt); + + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = opcode, + .size_in = size, + .payload_in = res, + }; + + return cxl_internal_send_cmd(mds, &mbox_cmd); +} + +static int cxl_prepare_ext_list(struct cxl_mbox_dc_response **res, + int *n, struct range *extent) +{ + struct cxl_mbox_dc_response *dc_res; + unsigned int size; + + if (!extent) + size = struct_size(dc_res, extent_list, 0); + else + size = struct_size(dc_res, extent_list, *n + 1); + + dc_res = krealloc(*res, size, GFP_KERNEL); + if (!dc_res) + return -ENOMEM; + + if (extent) { + dc_res->extent_list[*n].dpa_start = cpu_to_le64(extent->start); + memset(dc_res->extent_list[*n].reserved, 0, 8); + dc_res->extent_list[*n].length = cpu_to_le64(range_len(extent)); + (*n)++; + } + + *res = dc_res; + return 0; +} + +static void dc_extent_release(struct kref *kref) +{ + struct cxl_dc_extent_data *extent = container_of(kref, + struct cxl_dc_extent_data, + region_ref); + struct cxl_memdev_state *mds = extent->mds; + struct cxl_mbox_dc_response *dc_res = NULL; + struct range rel_range = (struct range) { + .start = extent->dpa_start, + .end = extent->dpa_start + extent->length - 1, + }; + struct device *dev = mds->cxlds.dev; + int extent_cnt = 0, rc; + + rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, &rel_range); + if (rc < 0) { + dev_err(dev, "Failed to create release response %d\n", rc); + goto free_extent; + } + rc = cxl_send_dc_cap_response(mds, dc_res, extent_cnt, + CXL_MBOX_OP_RELEASE_DC); + kfree(dc_res); + +free_extent: + kfree(extent); +} + +void cxl_dc_extent_put(struct cxl_dc_extent_data *extent) +{ + kref_put(&extent->region_ref, dc_extent_release); +} +EXPORT_SYMBOL_NS_GPL(cxl_dc_extent_put, CXL); + +static int cxl_handle_dcd_release_event(struct cxl_memdev_state *mds, + struct cxl_dc_extent *rel_extent) +{ + struct device *dev = mds->cxlds.dev; + struct cxl_dc_extent_data *extent; + resource_size_t dpa, size; + + dpa = le64_to_cpu(rel_extent->start_dpa); + size = le64_to_cpu(rel_extent->length); + dev_dbg(dev, "Release DC extent DPA:0x%llx LEN:%llx\n", + dpa, size); + + extent = xa_erase(&mds->dc_extent_list, dpa); + if (!extent) { + dev_err(dev, "No extent found with DPA:0x%llx\n", dpa); + return -EINVAL; + } + cxl_dc_extent_put(extent); + return 0; +} + +static int cxl_handle_dcd_add_event(struct cxl_memdev_state *mds, + struct cxl_dc_extent *add_extent) +{ + struct cxl_mbox_dc_response *dc_res = NULL; + struct range alloc_range, *resp_range; + struct device *dev = mds->cxlds.dev; + int extent_cnt = 0; + int rc; + + dev_dbg(dev, "Add DC extent DPA:0x%llx LEN:%llx\n", + le64_to_cpu(add_extent->start_dpa), + le64_to_cpu(add_extent->length)); + + alloc_range = (struct range){ + .start = le64_to_cpu(add_extent->start_dpa), + .end = le64_to_cpu(add_extent->start_dpa) + + le64_to_cpu(add_extent->length) - 1, + }; + resp_range = &alloc_range; + + rc = cxl_store_dc_extent(mds, add_extent); + if (rc) { + dev_dbg(dev, "unconsumed DC extent DPA:0x%llx LEN:%llx\n", + le64_to_cpu(add_extent->start_dpa), + le64_to_cpu(add_extent->length)); + resp_range = NULL; + } + + rc = cxl_prepare_ext_list(&dc_res, &extent_cnt, resp_range); + if (rc < 0) { + dev_err(dev, "Couldn't create extent list %d\n", rc); + return rc; + } + + rc = cxl_send_dc_cap_response(mds, dc_res, extent_cnt, + CXL_MBOX_OP_ADD_DC_RESPONSE); + kfree(dc_res); + return rc; +} + +/* Returns 0 if the event was handled successfully. */ +static int cxl_handle_dcd_event_records(struct cxl_memdev_state *mds, + struct cxl_event_record_raw *rec) +{ + struct dcd_event_dyn_cap *record = (struct dcd_event_dyn_cap *)rec; + uuid_t *id = &rec->hdr.id; + int rc; + + if (!uuid_equal(id, &dc_event_uuid)) + return -EINVAL; + + switch (record->data.event_type) { + case DCD_ADD_CAPACITY: + rc = cxl_handle_dcd_add_event(mds, &record->data.extent); + break; + case DCD_RELEASE_CAPACITY: + case DCD_FORCED_CAPACITY_RELEASE: + rc = cxl_handle_dcd_release_event(mds, &record->data.extent); + break; + default: + return -EINVAL; + } + + return rc; +} + static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, enum cxl_event_log_type type) { @@ -1016,6 +1191,13 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, le16_to_cpu(payload->records[i].hdr.handle)); cxl_event_trace_record(cxlmd, type, &payload->records[i]); + if (type == CXL_EVENT_TYPE_DCD) { + rc = cxl_handle_dcd_event_records(mds, + &payload->records[i]); + if (rc) + dev_err_ratelimited(dev, "dcd event failed: %d\n", + rc); + } } if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW) @@ -1056,6 +1238,8 @@ void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status) cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_WARN); if (status & CXLDEV_EVENT_STATUS_INFO) cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_INFO); + if (status & CXLDEV_EVENT_STATUS_DCD) + cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_DCD); } EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL); @@ -1712,7 +1896,7 @@ static void cxl_destroy_mds(void *_mds) xa_for_each(&mds->dc_extent_list, index, extent) { xa_erase(&mds->dc_extent_list, index); - kfree(extent); + cxl_dc_extent_put(extent); } xa_destroy(&mds->dc_extent_list); } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 0a225b0c20bf..81ca76ae1d02 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -163,6 +163,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw) #define CXLDEV_EVENT_STATUS_WARN BIT(1) #define CXLDEV_EVENT_STATUS_FAIL BIT(2) #define CXLDEV_EVENT_STATUS_FATAL BIT(3) +#define CXLDEV_EVENT_STATUS_DCD BIT(4) #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO | \ CXLDEV_EVENT_STATUS_WARN | \ @@ -601,6 +602,14 @@ struct cxl_pmem_region { struct cxl_pmem_region_mapping mapping[]; }; +/* See CXL 3.0 8.2.9.2.1.5 */ +enum dc_event { + DCD_ADD_CAPACITY, + DCD_RELEASE_CAPACITY, + DCD_FORCED_CAPACITY_RELEASE, + DCD_REGION_CONFIGURATION_UPDATED, +}; + struct cxl_dax_region { struct device dev; struct cxl_region *cxlr; diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index ad690600c1b9..118392229174 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -582,6 +582,16 @@ enum cxl_opcode { UUID_INIT(0xe1819d9, 0x11a9, 0x400c, 0x81, 0x1f, 0xd6, 0x07, 0x19, \ 0x40, 0x3d, 0x86) +struct cxl_mbox_dc_response { + __le32 extent_list_size; + u8 reserved[4]; + struct updated_extent_list { + __le64 dpa_start; + __le64 length; + u8 reserved[8]; + } __packed extent_list[]; +} __packed; + struct cxl_mbox_get_supported_logs { __le16 entries; u8 rsvd[6]; @@ -667,6 +677,7 @@ enum cxl_event_log_type { CXL_EVENT_TYPE_WARN, CXL_EVENT_TYPE_FAIL, CXL_EVENT_TYPE_FATAL, + CXL_EVENT_TYPE_DCD, CXL_EVENT_TYPE_MAX }; @@ -757,6 +768,8 @@ struct cxl_dc_extent_data { u64 length; u8 tag[CXL_DC_EXTENT_TAG_LEN]; u16 shared_extent_seq; + struct cxl_memdev_state *mds; + struct kref region_ref; }; /* @@ -771,6 +784,21 @@ struct cxl_dc_extent { u8 reserved[6]; } __packed; +struct dcd_record_data { + u8 event_type; + u8 reserved; + __le16 host_id; + u8 region_index; + u8 reserved1[3]; + struct cxl_dc_extent extent; + u8 reserved2[32]; +} __packed; + +struct dcd_event_dyn_cap { + struct cxl_event_record_hdr hdr; + struct dcd_record_data data; +} __packed; + struct cxl_mbox_get_partition_info { __le64 active_volatile_cap; __le64 active_persistent_cap; @@ -974,6 +1002,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd); int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa); int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa); +void cxl_dc_extent_put(struct cxl_dc_extent_data *extent); + #ifdef CONFIG_CXL_SUSPEND void cxl_mem_active_inc(void); void cxl_mem_active_dec(void); From patchwork Fri Aug 16 14:08:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766387 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC7281C3F29; Fri, 16 Aug 2024 14:08:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817317; cv=none; b=L4zJG0iJgCLWE3SypZ6JayxaafJflhz+gbDh0XNTpNJs9f6199Ac6aPaxwl3IRMYG3cqtr7VWtzlLXI4AwvmmpOvTbtNtVqfGF0j37NJN9baRtXCgwXXkfUZLwhO0QYuCHe0PpQ36rlOef/xvDqUCuMMo5PNsRnSUvd8fwY8Cic= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817317; c=relaxed/simple; bh=ghQ1VU1ZTyfQXo8JE4reXm9H0bQYFDJVMZYP3XZMgAQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=S4p62tsxhLZPokqmX6SJjor7LPT+i6gbz4qwSz6jR4RvSDRF7VwBaZtPtfMEWZvH6PNDquAY38jlkbkVNztdnxiIpzWnG56mvUrG7o9lMKqCIgXJW5V3uzEWh7sx61cl+1RaB2DBJOHUIqw0EdZ0hAk68ZICHj9+yWdJDgkBjKk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=b8QwQq63; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="b8QwQq63" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817315; x=1755353315; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=ghQ1VU1ZTyfQXo8JE4reXm9H0bQYFDJVMZYP3XZMgAQ=; b=b8QwQq63Ua59aENVcNKzHP9cm/zaEZjzhiN9uroL05IrpYVEutOpQ0s6 Qw+tDbCu01LZAwvpRl2PVQicOcsUs7yyrpahq2ApCbKxqWEtv38bJTfzI zoUWDFtKIIU9kWVRROKNt1YPW5vhS1NaiDp1P4FN9A7Sesagw8WBR6422 x/Hcs67Tdyw7fHYqTXHH3h7wl07QL3JZ0tsG7QPe6FJEd+PQkKy4ymtML zxKRck8Y6J6+TokzYV+n74b9RQ1n1U5pmNDjcPVsgtWYo3nz3s4TI8pAo y5vZ1qtbHyJ9w1BGgHFrZMvUGJtxlcBg6HyuQcRC49UTl7weKqt1mUgCx w==; X-CSE-ConnectionGUID: eR8NqgXtRXKNETqB9LSJJQ== X-CSE-MsgGUID: REfAFtQ+SEK+Q0vcNQqJpg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085271" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085271" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:35 -0700 X-CSE-ConnectionGUID: xX5DzcevQpuEFzffqUigVg== X-CSE-MsgGUID: y5mTI14DSjCHAjJ3gOgj5A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571593" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:33 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:16 -0500 Subject: [PATCH RESEND v2 11/18] cxl/region: Expose DC extents on region driver load Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-11-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=22228; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=ghQ1VU1ZTyfQXo8JE4reXm9H0bQYFDJVMZYP3XZMgAQ=; b=aAW643SCSEs4I2x/wDIuP4Eayzm/Tud8ILFX9dBJVnW1gaqoVUPl/z7cY5JcSOqvqfBvk9Nlu Uq0kk9cmcdmC1a8UFSgQEpDlEbnjk1wNMfMsUDqcygG3oMmJ226ksZf X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Ultimately user space must associate Dynamic Capacity (DC) extents with DAX devices. Remember also that DCD extents may have been accepted previous to regions being created and must have references held until all higher level regions and DAX devices are done with the memory. On CXL region driver load scan existing device extents and create CXL DAX region extents as needed. Create abstractions for the extents to be used in DAX region. This includes a generic interface to take proper references on the lower level CXL region extents. Also maintain separate objects for the DAX region extent device vs the DAX region extent. The DAX region extent device has a shorter life span which corresponds to the removal of an extent while a DAX device is still using it. In this case an extent continues to exist whilst the ability to create new DAX devices on that extent is prevented. NOTE: Without interleaving; the device, CXL region, and DAX region extents have a 1:1:1 relationship. Future support for interleaving will maintain a 1:N relationship between CXL region extents and the hardware extents. While the ability to create DAX devices on an extent exists; expose the necessary details of DAX region extents by creating a device with the following sysfs entries. /sys/bus/cxl/devices/dax_regionX/extentY /sys/bus/cxl/devices/dax_regionX/extentY/length /sys/bus/cxl/devices/dax_regionX/extentY/label Label is a rough analogy to the DC extent tag. As such the DC extent tag is used to initially populate the label. However, the label is made writeable so that it can be adjusted in the future when forming a DAX device. Signed-off-by: Navneet Singh Co-developed-by: Navneet Singh Signed-off-by: Ira Weiny --- Changes from v1 [iweiny: move dax_region_extents to dax layer] [iweiny: adjust for kreference of extents] [iweiny: adjust naming to cxl_dr_extent] [iweiny: Remove region_extent xarray; use child devices instead] [iweiny: ensure dax region devices are destroyed on region destruction] [iweiny: use xa_insert] [iweiny: hpa_offset is a dr_extent parameter not an extent parameter] [iweiny: Add dc_region_extents when the region driver is loaded] --- drivers/cxl/core/mbox.c | 12 ++++ drivers/cxl/core/region.c | 179 ++++++++++++++++++++++++++++++++++++++++++++-- drivers/cxl/cxl.h | 16 +++++ drivers/cxl/cxlmem.h | 2 + drivers/dax/Makefile | 1 + drivers/dax/cxl.c | 101 ++++++++++++++++++++++++-- drivers/dax/dax-private.h | 53 ++++++++++++++ drivers/dax/extent.c | 119 ++++++++++++++++++++++++++++++ 8 files changed, 473 insertions(+), 10 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 8474a28b16ca..5472ab1d0370 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1055,6 +1055,18 @@ static void dc_extent_release(struct kref *kref) kfree(extent); } +int __must_check cxl_dc_extent_get_not_zero(struct cxl_dc_extent_data *extent) +{ + return kref_get_unless_zero(&extent->region_ref); +} +EXPORT_SYMBOL_NS_GPL(cxl_dc_extent_get_not_zero, CXL); + +void cxl_dc_extent_get(struct cxl_dc_extent_data *extent) +{ + kref_get(&extent->region_ref); +} +EXPORT_SYMBOL_NS_GPL(cxl_dc_extent_get, CXL); + void cxl_dc_extent_put(struct cxl_dc_extent_data *extent) { kref_put(&extent->region_ref, dc_extent_release); diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index fc8dee469244..0aeea50550f6 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1547,6 +1547,122 @@ static int cxl_region_validate_position(struct cxl_region *cxlr, return 0; } +static bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent_data *extent) +{ + struct range dpa_range = (struct range){ + .start = extent->dpa_start, + .end = extent->dpa_start + extent->length - 1, + }; + struct device *dev = &cxled->cxld.dev; + + dev_dbg(dev, "Checking extent DPA:%llx LEN:%llx\n", + extent->dpa_start, extent->length); + + if (!cxled->cxld.region || !cxled->dpa_res) + return false; + + dev_dbg(dev, "Cxled start:%llx end:%llx\n", + cxled->dpa_res->start, cxled->dpa_res->end); + return (cxled->dpa_res->start <= dpa_range.start && + dpa_range.end <= cxled->dpa_res->end); +} + +static int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent_data *extent) +{ + struct cxl_dr_extent *cxl_dr_ext; + struct cxl_dax_region *cxlr_dax; + resource_size_t dpa_offset, hpa; + struct range *ed_hpa_range; + struct device *dev; + int rc; + + cxlr_dax = cxled->cxld.region->cxlr_dax; + dev = &cxlr_dax->dev; + dev_dbg(dev, "Adding DC extent DPA:%llx LEN:%llx\n", + extent->dpa_start, extent->length); + + /* + * Interleave ways == 1 means this coresponds to a 1:1 mapping between + * device extents and DAX region extents. Future implementations + * should hold DC region extents here until the full dax region extent + * can be realized. + */ + if (cxlr_dax->cxlr->params.interleave_ways != 1) { + dev_err(dev, "Interleaving DC not supported\n"); + return -EINVAL; + } + + cxl_dr_ext = kzalloc(sizeof(*cxl_dr_ext), GFP_KERNEL); + if (!cxl_dr_ext) + return -ENOMEM; + + cxl_dr_ext->extent = extent; + kref_init(&cxl_dr_ext->region_ref); + + /* + * Without interleave... + * HPA offset == DPA offset + * ... but do the math anyway + */ + dpa_offset = extent->dpa_start - cxled->dpa_res->start; + ed_hpa_range = &cxled->cxld.hpa_range; + hpa = ed_hpa_range->start + dpa_offset; + cxl_dr_ext->hpa_offset = hpa - cxlr_dax->hpa_range.start; + + /* Without interleave carry length and label through */ + cxl_dr_ext->hpa_length = extent->length; + snprintf(cxl_dr_ext->label, CXL_EXTENT_LABEL_LEN, "%s", + extent->tag); + + dev_dbg(dev, "Inserting at HPA:%llx\n", cxl_dr_ext->hpa_offset); + rc = xa_insert(&cxlr_dax->extents, cxl_dr_ext->hpa_offset, cxl_dr_ext, + GFP_KERNEL); + if (rc) { + dev_err(dev, "Failed to insert extent %d\n", rc); + kfree(cxl_dr_ext); + return rc; + } + /* Put in cxl_dr_release() */ + cxl_dc_extent_get(cxl_dr_ext->extent); + return 0; +} + +static int cxl_ed_add_extents(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct cxl_memdev_state *mds = container_of(cxlds, + struct cxl_memdev_state, + cxlds); + struct device *dev = &cxled->cxld.dev; + struct cxl_dc_extent_data *extent; + unsigned long index; + + dev_dbg(dev, "Searching for DC extents\n"); + xa_for_each(&mds->dc_extent_list, index, extent) { + /* + * get not zero is important because this is racing with the + * memory device which could be removing the extent at the same + * time. + */ + if (cxl_dc_extent_get_not_zero(extent)) { + int rc = 0; + + if (cxl_dc_extent_in_ed(cxled, extent)) { + dev_dbg(dev, "Found extent DPA:%llx LEN:%llx\n", + extent->dpa_start, extent->length); + rc = cxl_ed_add_one_extent(cxled, extent); + } + cxl_dc_extent_put(extent); + if (rc) + return rc; + } + } + return 0; +} + static int cxl_region_attach_position(struct cxl_region *cxlr, struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled, @@ -2702,10 +2818,44 @@ static struct cxl_pmem_region *cxl_pmem_region_alloc(struct cxl_region *cxlr) return cxlr_pmem; } +int __must_check cxl_dr_extent_get_not_zero(struct cxl_dr_extent *cxl_dr_ext) +{ + return kref_get_unless_zero(&cxl_dr_ext->region_ref); +} +EXPORT_SYMBOL_NS_GPL(cxl_dr_extent_get_not_zero, CXL); + +void cxl_dr_extent_get(struct cxl_dr_extent *cxl_dr_ext) +{ + return kref_get(&cxl_dr_ext->region_ref); +} +EXPORT_SYMBOL_NS_GPL(cxl_dr_extent_get, CXL); + +static void cxl_dr_release(struct kref *kref) +{ + struct cxl_dr_extent *cxl_dr_ext = container_of(kref, + struct cxl_dr_extent, + region_ref); + + cxl_dc_extent_put(cxl_dr_ext->extent); + kfree(cxl_dr_ext); +} + +void cxl_dr_extent_put(struct cxl_dr_extent *cxl_dr_ext) +{ + kref_put(&cxl_dr_ext->region_ref, cxl_dr_release); +} +EXPORT_SYMBOL_NS_GPL(cxl_dr_extent_put, CXL); + static void cxl_dax_region_release(struct device *dev) { struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); + struct cxl_dr_extent *cxl_dr_ext; + unsigned long index; + xa_for_each(&cxlr_dax->extents, index, cxl_dr_ext) { + xa_erase(&cxlr_dax->extents, index); + cxl_dr_extent_put(cxl_dr_ext); + } kfree(cxlr_dax); } @@ -2756,6 +2906,7 @@ static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr) cxlr_dax->hpa_range.start = p->res->start; cxlr_dax->hpa_range.end = p->res->end; + xa_init(&cxlr_dax->extents); dev = &cxlr_dax->dev; cxlr_dax->cxlr = cxlr; @@ -2862,7 +3013,17 @@ static void cxlr_dax_unregister(void *_cxlr_dax) device_unregister(&cxlr_dax->dev); } -static int __devm_cxl_add_dax_region(struct cxl_region *cxlr) +static int cxl_region_add_dc_extents(struct cxl_region *cxlr) +{ + for (int i = 0; i < cxlr->params.nr_targets; i++) { + int rc = cxl_ed_add_extents(cxlr->params.targets[i]); + if (rc) + return rc; + } + return 0; +} + +static int __devm_cxl_add_dax_region(struct cxl_region *cxlr, bool is_dc) { struct cxl_dax_region *cxlr_dax; struct device *dev; @@ -2877,6 +3038,17 @@ static int __devm_cxl_add_dax_region(struct cxl_region *cxlr) if (rc) goto err; + cxlr->cxlr_dax = cxlr_dax; + if (is_dc) { + /* + * Process device extents prior to surfacing the device to + * ensure the cxl_dax_region driver has access to prior extents + */ + rc = cxl_region_add_dc_extents(cxlr); + if (rc) + goto err; + } + rc = device_add(dev); if (rc) goto err; @@ -2893,7 +3065,7 @@ static int __devm_cxl_add_dax_region(struct cxl_region *cxlr) static int devm_cxl_add_dax_region(struct cxl_region *cxlr) { - return __devm_cxl_add_dax_region(cxlr); + return __devm_cxl_add_dax_region(cxlr, false); } static int devm_cxl_add_dc_dax_region(struct cxl_region *cxlr) @@ -2902,8 +3074,7 @@ static int devm_cxl_add_dc_dax_region(struct cxl_region *cxlr) dev_err(&cxlr->dev, "Interleaving DC not supported\n"); return -EINVAL; } - - return __devm_cxl_add_dax_region(cxlr); + return __devm_cxl_add_dax_region(cxlr, true); } static int match_decoder_by_range(struct device *dev, void *data) diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 81ca76ae1d02..177b892ac53f 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -555,6 +555,7 @@ struct cxl_region_params { * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge + * @cxlr_dax: (for DC regions) cached copy of CXL DAX bridge * @flags: Region state flags * @params: active + config params for the region */ @@ -565,6 +566,7 @@ struct cxl_region { enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; + struct cxl_dax_region *cxlr_dax; unsigned long flags; struct cxl_region_params params; }; @@ -614,8 +616,22 @@ struct cxl_dax_region { struct device dev; struct cxl_region *cxlr; struct range hpa_range; + struct xarray extents; }; +/* Interleave will manage multiple cxl_dc_extent_data objects */ +#define CXL_EXTENT_LABEL_LEN 64 +struct cxl_dr_extent { + struct kref region_ref; + u64 hpa_offset; + u64 hpa_length; + char label[CXL_EXTENT_LABEL_LEN]; + struct cxl_dc_extent_data *extent; +}; +int cxl_dr_extent_get_not_zero(struct cxl_dr_extent *cxl_dr_ext); +void cxl_dr_extent_get(struct cxl_dr_extent *cxl_dr_ext); +void cxl_dr_extent_put(struct cxl_dr_extent *cxl_dr_ext); + /** * struct cxl_port - logical collection of upstream port devices and * downstream port devices to construct a CXL memory diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 118392229174..8ca81fd067c2 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -1002,6 +1002,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd); int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa); int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa); +int cxl_dc_extent_get_not_zero(struct cxl_dc_extent_data *extent); +void cxl_dc_extent_get(struct cxl_dc_extent_data *extent); void cxl_dc_extent_put(struct cxl_dc_extent_data *extent); #ifdef CONFIG_CXL_SUSPEND diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile index 5ed5c39857c8..38cd3c4c0898 100644 --- a/drivers/dax/Makefile +++ b/drivers/dax/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o dax-y := super.o dax-y += bus.o +dax-y += extent.o device_dax-y := device.o dax_pmem-y := pmem.o dax_cxl-y := cxl.o diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 147c8c69782b..057b00b1d914 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -5,6 +5,87 @@ #include "../cxl/cxl.h" #include "bus.h" +#include "dax-private.h" + +static void dax_reg_ext_get(struct dax_region_extent *dr_extent) +{ + kref_get(&dr_extent->ref); +} + +static void dr_release(struct kref *kref) +{ + struct dax_region_extent *dr_extent; + struct cxl_dr_extent *cxl_dr_ext; + + dr_extent = container_of(kref, struct dax_region_extent, ref); + cxl_dr_ext = dr_extent->private_data; + cxl_dr_extent_put(cxl_dr_ext); + kfree(dr_extent); +} + +static void dax_reg_ext_put(struct dax_region_extent *dr_extent) +{ + kref_put(&dr_extent->ref, dr_release); +} + +static int cxl_dax_region_create_extent(struct dax_region *dax_region, + struct cxl_dr_extent *cxl_dr_ext) +{ + struct dax_region_extent *dr_extent; + int rc; + + dr_extent = kzalloc(sizeof(*dr_extent), GFP_KERNEL); + if (!dr_extent) + return -ENOMEM; + + dr_extent->private_data = cxl_dr_ext; + dr_extent->get = dax_reg_ext_get; + dr_extent->put = dax_reg_ext_put; + + /* device manages the dr_extent on success */ + kref_init(&dr_extent->ref); + + rc = dax_region_ext_create_dev(dax_region, dr_extent, + cxl_dr_ext->hpa_offset, + cxl_dr_ext->hpa_length, + cxl_dr_ext->label); + if (rc) { + kfree(dr_extent); + return rc; + } + + /* extent accepted */ + cxl_dr_extent_get(cxl_dr_ext); + return 0; +} + +static int cxl_dax_region_create_extents(struct cxl_dax_region *cxlr_dax) +{ + struct cxl_dr_extent *cxl_dr_ext; + unsigned long index; + + dev_dbg(&cxlr_dax->dev, "Adding extents\n"); + xa_for_each(&cxlr_dax->extents, index, cxl_dr_ext) { + /* + * get not zero is important because this is racing with the + * region driver which is racing with the memory device which + * could be removing the extent at the same time. + */ + if (cxl_dr_extent_get_not_zero(cxl_dr_ext)) { + struct dax_region *dax_region; + int rc; + + dax_region = dev_get_drvdata(&cxlr_dax->dev); + dev_dbg(&cxlr_dax->dev, "Found OFF:%llx LEN:%llx\n", + cxl_dr_ext->hpa_offset, cxl_dr_ext->hpa_length); + rc = cxl_dax_region_create_extent(dax_region, cxl_dr_ext); + cxl_dr_extent_put(cxl_dr_ext); + if (rc) + return rc; + } + } + return 0; +} static int cxl_dax_region_probe(struct device *dev) { @@ -19,20 +100,28 @@ static int cxl_dax_region_probe(struct device *dev) if (nid == NUMA_NO_NODE) nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start); - dev_size = range_len(&cxlr_dax->hpa_range); - flags = IORESOURCE_DAX_KMEM; - if (cxlr->mode == CXL_REGION_DC) { - /* Add empty seed dax device */ - dev_size = 0; + if (cxlr->mode == CXL_REGION_DC) flags |= IORESOURCE_DAX_DYNAMIC_CAP; - } dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid, PMD_SIZE, flags); if (!dax_region) return -ENOMEM; + dev_size = range_len(&cxlr_dax->hpa_range); + if (cxlr->mode == CXL_REGION_DC) { + int rc; + + /* NOTE: Depends on dax_region being set in driver data */ + rc = cxl_dax_region_create_extents(cxlr_dax); + if (rc) + return rc; + + /* Add empty seed dax device */ + dev_size = 0; + } + data = (struct dev_dax_data) { .dax_region = dax_region, .id = -1, diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 27cf2daaaa79..4dab52496c3f 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -5,6 +5,7 @@ #ifndef __DAX_PRIVATE_H__ #define __DAX_PRIVATE_H__ +#include #include #include #include @@ -40,6 +41,58 @@ struct dax_region { struct device *youngest; }; +/* + * struct dax_region_extent - extent data defined by the low level region + * driver. + * @private_data: lower level region driver data + * @ref: track number of dax devices which are using this extent + * @get: get reference to low level data + * @put: put reference to low level data + */ +struct dax_region_extent { + void *private_data; + struct kref ref; + void (*get)(struct dax_region_extent *dr_extent); + void (*put)(struct dax_region_extent *dr_extent); +}; + +static inline void dr_extent_get(struct dax_region_extent *dr_extent) +{ + if (dr_extent->get) + dr_extent->get(dr_extent); +} + +static inline void dr_extent_put(struct dax_region_extent *dr_extent) +{ + if (dr_extent->put) + dr_extent->put(dr_extent); +} + +#define DAX_EXTENT_LABEL_LEN 64 +/** + * struct dax_reg_ext_dev - Device object to expose extent information + * @dev: device representing this extent + * @dr_extent: reference back to private extent data + * @offset: offset of this extent + * @length: size of this extent + * @label: identifier to group extents + */ +struct dax_reg_ext_dev { + struct device dev; + struct dax_region_extent *dr_extent; + resource_size_t offset; + resource_size_t length; + char label[DAX_EXTENT_LABEL_LEN]; +}; + +int dax_region_ext_create_dev(struct dax_region *dax_region, + struct dax_region_extent *dr_extent, + resource_size_t offset, + resource_size_t length, + const char *label); +#define to_dr_ext_dev(dev) \ + container_of(dev, struct dax_reg_ext_dev, dev) + struct dax_mapping { struct device dev; int range_id; diff --git a/drivers/dax/extent.c b/drivers/dax/extent.c new file mode 100644 index 000000000000..2075ccfb21cb --- /dev/null +++ b/drivers/dax/extent.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Intel Corporation. All rights reserved. */ + +#include +#include +#include "dax-private.h" + +static ssize_t length_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev = to_dr_ext_dev(dev); + + return sysfs_emit(buf, "%#llx\n", dr_reg_ext_dev->length); +} +static DEVICE_ATTR_RO(length); + +static ssize_t label_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev = to_dr_ext_dev(dev); + + return sysfs_emit(buf, "%s\n", dr_reg_ext_dev->label); +} + +static ssize_t label_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev = to_dr_ext_dev(dev); + + snprintf(dr_reg_ext_dev->label, DAX_EXTENT_LABEL_LEN, "%s", buf); + return len; +} +static DEVICE_ATTR_RW(label); + +static struct attribute *dr_extent_attrs[] = { + &dev_attr_length.attr, + &dev_attr_label.attr, + NULL, +}; + +static const struct attribute_group dr_extent_attribute_group = { + .attrs = dr_extent_attrs, +}; + +static void dr_extent_release(struct device *dev) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev = to_dr_ext_dev(dev); + + kfree(dr_reg_ext_dev); +} + +static const struct attribute_group *dr_extent_attribute_groups[] = { + &dr_extent_attribute_group, + NULL, +}; + +const struct device_type dr_extent_type = { + .name = "extent", + .release = dr_extent_release, + .groups = dr_extent_attribute_groups, +}; + +static void unregister_dr_extent(void *ext) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev = ext; + struct dax_region_extent *dr_extent; + + dr_extent = dr_reg_ext_dev->dr_extent; + dev_dbg(&dr_reg_ext_dev->dev, "Unregister DAX region ext OFF:%llx L:%s\n", + dr_reg_ext_dev->offset, dr_reg_ext_dev->label); + dr_extent_put(dr_extent); + device_unregister(&dr_reg_ext_dev->dev); +} + +int dax_region_ext_create_dev(struct dax_region *dax_region, + struct dax_region_extent *dr_extent, + resource_size_t offset, + resource_size_t length, + const char *label) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev; + struct device *dev; + int rc; + + dr_reg_ext_dev = kzalloc(sizeof(*dr_reg_ext_dev), GFP_KERNEL); + if (!dr_reg_ext_dev) + return -ENOMEM; + + dr_reg_ext_dev->dr_extent = dr_extent; + dr_reg_ext_dev->offset = offset; + dr_reg_ext_dev->length = length; + snprintf(dr_reg_ext_dev->label, DAX_EXTENT_LABEL_LEN, "%s", label); + + dev = &dr_reg_ext_dev->dev; + device_initialize(dev); + dev->id = offset / PMD_SIZE ; + device_set_pm_not_required(dev); + dev->parent = dax_region->dev; + dev->type = &dr_extent_type; + rc = dev_set_name(dev, "extent%d", dev->id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + dev_dbg(dev, "DAX region extent OFF:%llx LEN:%llx\n", + dr_reg_ext_dev->offset, dr_reg_ext_dev->length); + return devm_add_action_or_reset(dax_region->dev, unregister_dr_extent, + dr_reg_ext_dev); + +err: + dev_err(dev, "Failed to initialize DAX extent dev OFF:%llx LEN:%llx\n", + dr_reg_ext_dev->offset, dr_reg_ext_dev->length); + put_device(dev); + return rc; +} +EXPORT_SYMBOL_GPL(dax_region_ext_create_dev); From patchwork Fri Aug 16 14:08:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766388 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A13351C4608; Fri, 16 Aug 2024 14:08:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817319; cv=none; b=FvxuFx6eiu93icd10VMQiYR8HJyG3ngpHIOP2XR9BGOp7QxQ6QWrIa7RVw14DRL5dEyPVEWHlMsjhoNGM2E2UPJUUBXj64eDvgDpcLJFAHgrOEYNc0wCv1cKoRGvyGckNpuJiRkfcM70pCbrcodKn6hbXfHij8OQxqh6xgLI1Kc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817319; c=relaxed/simple; bh=o22xScUtL3JjI5mJyO1fezE17HQOJxBHyE4UYjDptu4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=o/L4YoyhtFZAEE3e1s94QSTbSj6csKiv3KMPMa9C1IIVJKXkSA27bAVdIl3gxgnoX1P0IGMHAU0yJz0tZJL8E97T/T5PUeX3Qafnx3s48mp+3hPw8aapNOyopyPH6vvDrjH3r09Ss58fzamMoiMM31lHPldvJt2s8d3QI71ZMaM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dMKwgBfT; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dMKwgBfT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817317; x=1755353317; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=o22xScUtL3JjI5mJyO1fezE17HQOJxBHyE4UYjDptu4=; b=dMKwgBfTs0Myfu1Bd3f+SNRdHPeVREOAP9PeCDXy+NZeo+u9O/S0sr6R 5iQdfDro/48x/OVjg+83cBaBcgCxtjVcc4nNCGTkg89XNPP3rK38mk33m 9svkdu//OS/uxuUWerLvr8PyFVQQBLfa1bfCUjEmKeTBaB7E4yeKB3I4O SBqOK7AK4fnOaUpPpDEpGn5350mPxQvn202QvuHM4O3NVtEy5O4EQg1qU KUoEMRmeY3bXXjjnHQRp97DzJuu8ZnbLmx7yabRS1jJycmgRbGeKJSvmO mUVEPqjSv0KoPQdmxq0ErS9HpyjmETVhlIR2L5N76goQzaQMJNb6/4fVq A==; X-CSE-ConnectionGUID: iuvCqFduQSCQyRNVmp7xYQ== X-CSE-MsgGUID: /2TLRuYET2+NQshA9WgaQQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085284" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085284" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:37 -0700 X-CSE-ConnectionGUID: Y+s3i+pKRHuopnoDNyCPPg== X-CSE-MsgGUID: enfVZjSQQDur7jQv8/ArYQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571611" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:35 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:17 -0500 Subject: [PATCH RESEND v2 12/18] cxl/region: Notify regions of DC changes Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-12-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=18741; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=o22xScUtL3JjI5mJyO1fezE17HQOJxBHyE4UYjDptu4=; b=Nhxy4dscZMSTSUIOv/+LmOPNz00PTDAoLLsX6BMxKQFWQemuEGoBxWGXL5r6HQNLPqOU3Ackl 3hR/4pSM8KRCFqYYPRbd9m3ozF16LxsUmFPhpanKDxc9uJsZyZ7MQHD X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= In order for a user to use dynamic capacity effectively they need to know when dynamic capacity is available. Thus when Dynamic Capacity (DC) extents are added or removed by a DC device the regions affected need to be notified. Ultimately the DAX region uses the memory associated with DC extents. However, remember that CXL DAX regions maintain any interleave details between devices. When a DCD event occurs, iterate all CXL endpoint decoders and notify regions which contain the endpoints affected by the event. In turn notify the DAX regions of the changes to the DAX region extents. For now interleave is handled by creating simple 1:1 mappings between the CXL DAX region and DAX region layers. Future implementations will need to resolve when to actually surface a DAX region extent and pass the notification along. Remember that adding capacity is safe because there is no chance of the memory being in use. Also remember at this point releasing capacity is straight forward because DAX devices do not yet have references to the extents. Future patches will handle that complication. Signed-off-by: Ira Weiny --- Changes from v1: [iweiny: Rewrite] --- drivers/cxl/core/mbox.c | 39 +++++++++++++-- drivers/cxl/core/region.c | 123 +++++++++++++++++++++++++++++++++++++++++----- drivers/cxl/cxl.h | 22 +++++++++ drivers/cxl/mem.c | 50 +++++++++++++++++++ drivers/dax/cxl.c | 99 ++++++++++++++++++++++++++++++------- drivers/dax/dax-private.h | 3 ++ drivers/dax/extent.c | 14 ++++++ 7 files changed, 317 insertions(+), 33 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 5472ab1d0370..9d9c13e13ecf 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -824,6 +824,35 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL); +static int cxl_notify_dc_extent(struct cxl_memdev_state *mds, + enum dc_event event, + struct cxl_dc_extent_data *extent) +{ + struct cxl_drv_nd nd = (struct cxl_drv_nd) { + .event = event, + .extent = extent + }; + struct device *dev; + int rc = 0; + + dev = &mds->cxlds.cxlmd->dev; + dev_dbg(dev, "Trying notify: type %d DPA:%llx LEN:%llx\n", + event, extent->dpa_start, extent->length); + + device_lock(dev); + if (dev->driver) { + struct cxl_driver *mem_drv = to_cxl_drv(dev->driver); + + if (mem_drv->notify) { + dev_dbg(dev, "Notify: type %d DPA:%llx LEN:%llx\n", + event, extent->dpa_start, extent->length); + rc = mem_drv->notify(dev, &nd); + } + } + device_unlock(dev); + return rc; +} + static int cxl_store_dc_extent(struct cxl_memdev_state *mds, struct cxl_dc_extent *dc_extent) { @@ -852,9 +881,10 @@ static int cxl_store_dc_extent(struct cxl_memdev_state *mds, dev_warn_once(dev, "Duplicate extent DPA:%llx LEN:%llx\n", extent->dpa_start, extent->length); kfree(extent); + return rc; } - return rc; + return cxl_notify_dc_extent(mds, DCD_ADD_CAPACITY, extent); } /* @@ -1074,7 +1104,8 @@ void cxl_dc_extent_put(struct cxl_dc_extent_data *extent) EXPORT_SYMBOL_NS_GPL(cxl_dc_extent_put, CXL); static int cxl_handle_dcd_release_event(struct cxl_memdev_state *mds, - struct cxl_dc_extent *rel_extent) + struct cxl_dc_extent *rel_extent, + enum dc_event event) { struct device *dev = mds->cxlds.dev; struct cxl_dc_extent_data *extent; @@ -1090,6 +1121,7 @@ static int cxl_handle_dcd_release_event(struct cxl_memdev_state *mds, dev_err(dev, "No extent found with DPA:0x%llx\n", dpa); return -EINVAL; } + cxl_notify_dc_extent(mds, event, extent); cxl_dc_extent_put(extent); return 0; } @@ -1151,7 +1183,8 @@ static int cxl_handle_dcd_event_records(struct cxl_memdev_state *mds, break; case DCD_RELEASE_CAPACITY: case DCD_FORCED_CAPACITY_RELEASE: - rc = cxl_handle_dcd_release_event(mds, &record->data.extent); + rc = cxl_handle_dcd_release_event(mds, &record->data.extent, + record->data.event_type); break; default: return -EINVAL; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 0aeea50550f6..a0c1f2793dd7 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1547,8 +1547,8 @@ static int cxl_region_validate_position(struct cxl_region *cxlr, return 0; } -static bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, - struct cxl_dc_extent_data *extent) +bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent_data *extent) { struct range dpa_range = (struct range){ .start = extent->dpa_start, @@ -1567,14 +1567,66 @@ static bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, return (cxled->dpa_res->start <= dpa_range.start && dpa_range.end <= cxled->dpa_res->end); } +EXPORT_SYMBOL_NS_GPL(cxl_dc_extent_in_ed, CXL); + +static int cxl_region_notify_extent(struct cxl_endpoint_decoder *cxled, + enum dc_event event, + struct cxl_dr_extent *cxl_dr_ext) +{ + struct cxl_dax_region *cxlr_dax; + struct device *dev; + int rc = 0; + + cxlr_dax = cxled->cxld.region->cxlr_dax; + dev = &cxlr_dax->dev; + dev_dbg(dev, "Trying notify: type %d HPA:%llx LEN:%llx\n", + event, cxl_dr_ext->hpa_offset, cxl_dr_ext->hpa_length); + + device_lock(dev); + if (dev->driver) { + struct cxl_driver *reg_drv = to_cxl_drv(dev->driver); + struct cxl_drv_nd nd = (struct cxl_drv_nd) { + .event = event, + .cxl_dr_ext = cxl_dr_ext, + }; + + if (reg_drv->notify) { + dev_dbg(dev, "Notify: type %d HPA:%llx LEN:%llx\n", + event, cxl_dr_ext->hpa_offset, + cxl_dr_ext->hpa_length); + rc = reg_drv->notify(dev, &nd); + } + } + device_unlock(dev); + return rc; +} + +static resource_size_t +cxl_dc_extent_to_hpa_offset(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent_data *extent) +{ + struct cxl_dax_region *cxlr_dax; + resource_size_t dpa_offset, hpa; + struct range *ed_hpa_range; + + cxlr_dax = cxled->cxld.region->cxlr_dax; + + /* + * Without interleave... + * HPA offset == DPA offset + * ... but do the math anyway + */ + dpa_offset = extent->dpa_start - cxled->dpa_res->start; + ed_hpa_range = &cxled->cxld.hpa_range; + hpa = ed_hpa_range->start + dpa_offset; + return hpa - cxlr_dax->hpa_range.start; +} static int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, struct cxl_dc_extent_data *extent) { struct cxl_dr_extent *cxl_dr_ext; struct cxl_dax_region *cxlr_dax; - resource_size_t dpa_offset, hpa; - struct range *ed_hpa_range; struct device *dev; int rc; @@ -1601,15 +1653,7 @@ static int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, cxl_dr_ext->extent = extent; kref_init(&cxl_dr_ext->region_ref); - /* - * Without interleave... - * HPA offset == DPA offset - * ... but do the math anyway - */ - dpa_offset = extent->dpa_start - cxled->dpa_res->start; - ed_hpa_range = &cxled->cxld.hpa_range; - hpa = ed_hpa_range->start + dpa_offset; - cxl_dr_ext->hpa_offset = hpa - cxlr_dax->hpa_range.start; + cxl_dr_ext->hpa_offset = cxl_dc_extent_to_hpa_offset(cxled, extent); /* Without interleave carry length and label through */ cxl_dr_ext->hpa_length = extent->length; @@ -1626,6 +1670,7 @@ static int cxl_ed_add_one_extent(struct cxl_endpoint_decoder *cxled, } /* Put in cxl_dr_release() */ cxl_dc_extent_get(cxl_dr_ext->extent); + cxl_region_notify_extent(cxled, DCD_ADD_CAPACITY, cxl_dr_ext); return 0; } @@ -1663,6 +1708,58 @@ static int cxl_ed_add_extents(struct cxl_endpoint_decoder *cxled) return 0; } +static int cxl_ed_rm_dc_extent(struct cxl_endpoint_decoder *cxled, + enum dc_event event, + struct cxl_dc_extent_data *extent) +{ + struct cxl_region *cxlr = cxled->cxld.region; + struct cxl_dax_region *cxlr_dax = cxlr->cxlr_dax; + struct cxl_dr_extent *cxl_dr_ext; + resource_size_t hpa_offset; + + hpa_offset = cxl_dc_extent_to_hpa_offset(cxled, extent); + + /* + * NOTE on Interleaving: There is no need to 'break up' the cxl_dr_ext. + * If one of the extents comprising it is gone it should be removed + * from the region to prevent future use. Later code may save other + * extents for future processing. But for now the corelation is 1:1:1 + * so just erase the extent. + */ + cxl_dr_ext = xa_erase(&cxlr_dax->extents, hpa_offset); + + dev_dbg(&cxlr_dax->dev, "Remove DAX region ext HPA:%llx\n", + cxl_dr_ext->hpa_offset); + cxl_region_notify_extent(cxled, event, cxl_dr_ext); + cxl_dr_extent_put(cxl_dr_ext); + return 0; +} + +int cxl_ed_notify_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_drv_nd *nd) +{ + int rc = 0; + + switch (nd->event) { + case DCD_ADD_CAPACITY: + if (cxl_dc_extent_get_not_zero(nd->extent)) { + rc = cxl_ed_add_one_extent(cxled, nd->extent); + if (rc) + cxl_dc_extent_put(nd->extent); + } + break; + case DCD_RELEASE_CAPACITY: + case DCD_FORCED_CAPACITY_RELEASE: + rc = cxl_ed_rm_dc_extent(cxled, nd->event, nd->extent); + break; + default: + dev_err(&cxled->cxld.dev, "Unknown DC event %d\n", nd->event); + break; + } + return rc; +} +EXPORT_SYMBOL_NS_GPL(cxl_ed_notify_extent, CXL); + static int cxl_region_attach_position(struct cxl_region *cxlr, struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 177b892ac53f..2c73a30980b6 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -838,10 +838,18 @@ bool is_cxl_region(struct device *dev); extern struct bus_type cxl_bus_type; +/* Driver Notifier Data */ +struct cxl_drv_nd { + enum dc_event event; + struct cxl_dc_extent_data *extent; + struct cxl_dr_extent *cxl_dr_ext; +}; + struct cxl_driver { const char *name; int (*probe)(struct device *dev); void (*remove)(struct device *dev); + int (*notify)(struct device *dev, struct cxl_drv_nd *nd); struct device_driver drv; int id; }; @@ -887,6 +895,10 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev); int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled); struct cxl_dax_region *to_cxl_dax_region(struct device *dev); +bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent_data *extent); +int cxl_ed_notify_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_drv_nd *nd); #else static inline bool is_cxl_pmem_region(struct device *dev) { @@ -905,6 +917,16 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev) { return NULL; } +static inline bool cxl_dc_extent_in_ed(struct cxl_endpoint_decoder *cxled, + struct cxl_dc_extent_data *extent) +{ + return false; +} +static inline int cxl_ed_notify_extent(struct cxl_endpoint_decoder *cxled, + struct cxl_drv_nd *nd) +{ + return 0; +} #endif /* diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 80cffa40e91a..d3c4c9c87392 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -104,6 +104,55 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa) DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL, cxl_debugfs_poison_clear, "%llx\n"); +static int match_ep_decoder_by_range(struct device *dev, void *data) +{ + struct cxl_dc_extent_data *extent = data; + struct cxl_endpoint_decoder *cxled; + + if (!is_endpoint_decoder(dev)) + return 0; + cxled = to_cxl_endpoint_decoder(dev); + return cxl_dc_extent_in_ed(cxled, extent); +} + +static struct cxl_endpoint_decoder *cxl_find_ed(struct cxl_memdev_state *mds, + struct cxl_dc_extent_data *extent) +{ + struct cxl_memdev *cxlmd = mds->cxlds.cxlmd; + struct cxl_port *endpoint = cxlmd->endpoint; + struct device *dev; + + dev = device_find_child(&endpoint->dev, extent, + match_ep_decoder_by_range); + if (!dev) { + dev_dbg(mds->cxlds.dev, "Extent DPA:%llx LEN:%llx not mapped\n", + extent->dpa_start, extent->length); + return NULL; + } + + return to_cxl_endpoint_decoder(dev); +} + +static int cxl_mem_notify(struct device *dev, struct cxl_drv_nd *nd) +{ + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + struct cxl_endpoint_decoder *cxled; + struct cxl_dc_extent_data *extent; + int rc = 0; + + extent = nd->extent; + dev_dbg(dev, "notify DC action %d DPA:%llx LEN:%llx\n", + nd->event, extent->dpa_start, extent->length); + + cxled = cxl_find_ed(mds, extent); + if (!cxled) + return 0; + rc = cxl_ed_notify_extent(cxled, nd); + put_device(&cxled->cxld.dev); + return rc; +} + static int cxl_mem_probe(struct device *dev) { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); @@ -247,6 +296,7 @@ __ATTRIBUTE_GROUPS(cxl_mem); static struct cxl_driver cxl_mem_driver = { .name = "cxl_mem", .probe = cxl_mem_probe, + .notify = cxl_mem_notify, .id = CXL_DEVICE_MEMORY_EXPANDER, .drv = { .dev_groups = cxl_mem_groups, diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 057b00b1d914..44cbd28668f1 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -59,6 +59,29 @@ static int cxl_dax_region_create_extent(struct dax_region *dax_region, return 0; } +static int cxl_dax_region_add_extent(struct cxl_dax_region *cxlr_dax, + struct cxl_dr_extent *cxl_dr_ext) +{ + /* + * get not zero is important because this is racing with the + * region driver which is racing with the memory device which + * could be removing the extent at the same time. + */ + if (cxl_dr_extent_get_not_zero(cxl_dr_ext)) { + struct dax_region *dax_region; + int rc; + + dax_region = dev_get_drvdata(&cxlr_dax->dev); + dev_dbg(&cxlr_dax->dev, "Creating HPA:%llx LEN:%llx\n", + cxl_dr_ext->hpa_offset, cxl_dr_ext->hpa_length); + rc = cxl_dax_region_create_extent(dax_region, cxl_dr_ext); + cxl_dr_extent_put(cxl_dr_ext); + if (rc) + return rc; + } + return 0; +} + static int cxl_dax_region_create_extents(struct cxl_dax_region *cxlr_dax) { struct cxl_dr_extent *cxl_dr_ext; @@ -66,27 +89,68 @@ static int cxl_dax_region_create_extents(struct cxl_dax_region *cxlr_dax) dev_dbg(&cxlr_dax->dev, "Adding extents\n"); xa_for_each(&cxlr_dax->extents, index, cxl_dr_ext) { - /* - * get not zero is important because this is racing with the - * region driver which is racing with the memory device which - * could be removing the extent at the same time. - */ - if (cxl_dr_extent_get_not_zero(cxl_dr_ext)) { - struct dax_region *dax_region; - int rc; - - dax_region = dev_get_drvdata(&cxlr_dax->dev); - dev_dbg(&cxlr_dax->dev, "Found OFF:%llx LEN:%llx\n", - cxl_dr_ext->hpa_offset, cxl_dr_ext->hpa_length); - rc = cxl_dax_region_create_extent(dax_region, cxl_dr_ext); - cxl_dr_extent_put(cxl_dr_ext); - if (rc) - return rc; - } + int rc; + + rc = cxl_dax_region_add_extent(cxlr_dax, cxl_dr_ext); + if (rc) + return rc; } return 0; } +static int match_cxl_dr_extent(struct device *dev, void *data) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev; + struct dax_region_extent *dr_extent; + + if (!is_dr_ext_dev(dev)) + return 0; + + dr_reg_ext_dev = to_dr_ext_dev(dev); + dr_extent = dr_reg_ext_dev->dr_extent; + return data == dr_extent->private_data; +} + +static int cxl_dax_region_rm_extent(struct cxl_dax_region *cxlr_dax, + struct cxl_dr_extent *cxl_dr_ext) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev; + struct dax_region *dax_region; + struct device *dev; + + dev = device_find_child(&cxlr_dax->dev, cxl_dr_ext, + match_cxl_dr_extent); + if (!dev) + return -EINVAL; + dr_reg_ext_dev = to_dr_ext_dev(dev); + put_device(dev); + dax_region = dev_get_drvdata(&cxlr_dax->dev); + dax_region_ext_del_dev(dax_region, dr_reg_ext_dev); + return 0; +} + +static int cxl_dax_region_notify(struct device *dev, + struct cxl_drv_nd *nd) +{ + struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); + struct cxl_dr_extent *cxl_dr_ext = nd->cxl_dr_ext; + int rc = 0; + + switch (nd->event) { + case DCD_ADD_CAPACITY: + rc = cxl_dax_region_add_extent(cxlr_dax, cxl_dr_ext); + break; + case DCD_RELEASE_CAPACITY: + case DCD_FORCED_CAPACITY_RELEASE: + rc = cxl_dax_region_rm_extent(cxlr_dax, cxl_dr_ext); + break; + default: + dev_err(&cxlr_dax->dev, "Unknown DC event %d\n", nd->event); + break; + } + return rc; +} + static int cxl_dax_region_probe(struct device *dev) { struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); @@ -134,6 +198,7 @@ static int cxl_dax_region_probe(struct device *dev) static struct cxl_driver cxl_dax_region_driver = { .name = "cxl_dax_region", .probe = cxl_dax_region_probe, + .notify = cxl_dax_region_notify, .id = CXL_DEVICE_DAX_REGION, .drv = { .suppress_bind_attrs = true, diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 4dab52496c3f..250babd6e470 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -90,8 +90,11 @@ int dax_region_ext_create_dev(struct dax_region *dax_region, resource_size_t offset, resource_size_t length, const char *label); +void dax_region_ext_del_dev(struct dax_region *dax_region, + struct dax_reg_ext_dev *dr_reg_ext_dev); #define to_dr_ext_dev(dev) \ container_of(dev, struct dax_reg_ext_dev, dev) +bool is_dr_ext_dev(struct device *dev); struct dax_mapping { struct device dev; diff --git a/drivers/dax/extent.c b/drivers/dax/extent.c index 2075ccfb21cb..dea6d408d2c8 100644 --- a/drivers/dax/extent.c +++ b/drivers/dax/extent.c @@ -60,6 +60,12 @@ const struct device_type dr_extent_type = { .groups = dr_extent_attribute_groups, }; +bool is_dr_ext_dev(struct device *dev) +{ + return dev->type == &dr_extent_type; +} +EXPORT_SYMBOL_GPL(is_dr_ext_dev); + static void unregister_dr_extent(void *ext) { struct dax_reg_ext_dev *dr_reg_ext_dev = ext; @@ -117,3 +123,11 @@ int dax_region_ext_create_dev(struct dax_region *dax_region, return rc; } EXPORT_SYMBOL_GPL(dax_region_ext_create_dev); + +void dax_region_ext_del_dev(struct dax_region *dax_region, + struct dax_reg_ext_dev *dr_reg_ext_dev) +{ + devm_remove_action(dax_region->dev, unregister_dr_extent, dr_reg_ext_dev); + unregister_dr_extent(dr_reg_ext_dev); +} +EXPORT_SYMBOL_GPL(dax_region_ext_del_dev); From patchwork Fri Aug 16 14:08:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766389 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F37F1C0DD2; Fri, 16 Aug 2024 14:08:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817320; cv=none; b=Y+mREY7UuiA/7FDYfX7P58uzr3Q/E86CM37dozVBbSCqdIqupfFtPBJEQZPyHrZl8BoPuYXz2wZWrW01Vjj+DdEwghBCLb4aR8wZdR3JtbZ8H8gEOPKH9A8hbTbkqrOzAvtoyealYHL621f9sqGe5i51qoi+6T3pFYtimvt+ueg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817320; c=relaxed/simple; bh=r5+l2xi7CwUvP683eP1rT3xt7URxitfNsBryB6O28tU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=cxA2H9PM6UZgtEKjSmnUvWEs9yPSFRnuWqaPbEdSKSkE/W8feJqD/f+oKbDnUW1iPt7HShRqu+xQR4Pn/JAIT85Km8BtEGrJeOYTaBxe5tt9qy1lvqiD3KO5KNxJvDxQmFNYs1j+sCToMe4X874tlDM2J3D3RUiuA0Ecs/psN7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BsSnuS5b; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BsSnuS5b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817319; x=1755353319; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=r5+l2xi7CwUvP683eP1rT3xt7URxitfNsBryB6O28tU=; b=BsSnuS5b4FKaOUJU3lwUPU5gZOOK3VH44l1U3SyaFnnVAAPKg/PdrNCF m5zcl6jrP+qrL6/iEuAuerFN8btDzvPxKecD/VlcT4T/pyttvJI74qsyf CQV4ilUHlnUWhYcmG5VfHHizzrwBE23oy8ZtWuu+SrKQeNBGaTSideUs1 kPl00QAMyqeTnnAcdk5PypUfWuocFtxvo0ZGvuZxtJZg+Q7PkBRtsszPY Ytq/5q+kb4ziGmDoPxkOZfJartM9ybjorRN2YXdl4OdHnwS33kAaLZtko rcXM5V0FxAAtOUOM4AaKzzittvmV2SN4Ewv3gWIeR1pcRWjun6J9usnCq g==; X-CSE-ConnectionGUID: M6oPgh2ZSje4xRumdOUrUA== X-CSE-MsgGUID: l/sQtHAESUqZiMJRuTALMg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085294" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085294" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:39 -0700 X-CSE-ConnectionGUID: 1aTvYgMlSQmXPMJz3nH4tQ== X-CSE-MsgGUID: 5OZ8dA10STy9onOADTXhcQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571624" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:37 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:18 -0500 Subject: [PATCH RESEND v2 13/18] dax/bus: Factor out dev dax resize logic Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-13-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=8374; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=r5+l2xi7CwUvP683eP1rT3xt7URxitfNsBryB6O28tU=; b=YMrqnM4cAkX/rDYiKS3rt6UdfyVoYeF3PMF2cHBPgm6Ql2PqL0cbwcdhnt9EOfbG0g+jrb5hE jI3OeWR+QpwAzfiUkT3+vCyZTX+VYOeQJ+y4AYmY8LiIMB50qXhUbfw X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic Capacity regions must limit dev dax resources to those areas which have extents backing real memory. Four alternatives were considered to manage the intersection of region space and extents: 1) Create a single region resource child on region creation which reserves the entire region. Then as extents are added punch holes in this reservation. This requires new resource manipulation to punch the holes and still requires an additional iteration over the extent areas which may already have existing dev dax resources used. 2) Maintain an ordered xarray of extents which can be queried while processing the resize logic. The issue is that existing region->res children may artificially limit the allocation size sent to alloc_dev_dax_range(). IE the resource children can't be directly used in the resize logic to find where space in the region is. 3) Maintain a separate resource tree with extents. This option is the same as 2) but with a different data structure. Most ideally we have some unified representation of the resource tree. 4) Create region resource children for each extent. Manage the dax dev resize logic in the same way as before but use a region child (extent) resource as the parents to find space within each extent. Option 4 can leverage the existing resize algorithm to find space within the extents. In preparation for this change, factor out the dev_dax_resize logic. For static regions use dax_region->res as the parent to find space for the dax ranges. Future patches will use the same algorithm with individual extent resources as the parent. Signed-off-by: Ira Weiny --- drivers/dax/bus.c | 128 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 79 insertions(+), 49 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index b76e49813a39..ea7ae82b4687 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -817,11 +817,10 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) return 0; } -static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, - resource_size_t size) +static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, + u64 start, resource_size_t size) { struct dax_region *dax_region = dev_dax->region; - struct resource *res = &dax_region->res; struct device *dev = &dev_dax->dev; struct dev_dax_range *ranges; unsigned long pgoff = 0; @@ -839,14 +838,14 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, return 0; } - alloc = __request_region(res, start, size, dev_name(dev), 0); + alloc = __request_region(parent, start, size, dev_name(dev), 0); if (!alloc) return -ENOMEM; ranges = krealloc(dev_dax->ranges, sizeof(*ranges) * (dev_dax->nr_range + 1), GFP_KERNEL); if (!ranges) { - __release_region(res, alloc->start, resource_size(alloc)); + __release_region(parent, alloc->start, resource_size(alloc)); return -ENOMEM; } @@ -997,50 +996,45 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) return true; } -static ssize_t dev_dax_resize(struct dax_region *dax_region, - struct dev_dax *dev_dax, resource_size_t size) +/* + * dev_dax_resize_static - Expand the device into the unused portion of the + * region. This may involve adjusting the end of an existing resource, or + * allocating a new resource. + * + * @parent: parent resource to allocate this range in. + * @dev_dax: DAX device we are creating this range for + * @to_alloc: amount of space to alloc; must be <= space available in @parent + * + * Return the amount of space allocated or -ERRNO on failure + */ +static ssize_t dev_dax_resize_static(struct resource *parent, + struct dev_dax *dev_dax, + resource_size_t to_alloc) { - resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; - resource_size_t dev_size = dev_dax_size(dev_dax); - struct resource *region_res = &dax_region->res; - struct device *dev = &dev_dax->dev; struct resource *res, *first; - resource_size_t alloc = 0; int rc; - if (dev->driver) - return -EBUSY; - if (size == dev_size) - return 0; - if (size > dev_size && size - dev_size > avail) - return -ENOSPC; - if (size < dev_size) - return dev_dax_shrink(dev_dax, size); - - to_alloc = size - dev_size; - if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc), - "resize of %pa misaligned\n", &to_alloc)) - return -ENXIO; - - /* - * Expand the device into the unused portion of the region. This - * may involve adjusting the end of an existing resource, or - * allocating a new resource. - */ -retry: - first = region_res->child; - if (!first) - return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc); + first = parent->child; + if (!first) { + rc = alloc_dev_dax_range(parent, dev_dax, + parent->start, to_alloc); + if (rc) + return rc; + return to_alloc; + } - rc = -ENOSPC; for (res = first; res; res = res->sibling) { struct resource *next = res->sibling; + resource_size_t alloc; /* space at the beginning of the region */ - if (res == first && res->start > dax_region->res.start) { - alloc = min(res->start - dax_region->res.start, to_alloc); - rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, alloc); - break; + if (res == first && res->start > parent->start) { + alloc = min(res->start - parent->start, to_alloc); + rc = alloc_dev_dax_range(parent, dev_dax, + parent->start, alloc); + if (rc) + return rc; + return alloc; } alloc = 0; @@ -1049,21 +1043,55 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region, alloc = min(next->start - (res->end + 1), to_alloc); /* space at the end of the region */ - if (!alloc && !next && res->end < region_res->end) - alloc = min(region_res->end - res->end, to_alloc); + if (!alloc && !next && res->end < parent->end) + alloc = min(parent->end - res->end, to_alloc); if (!alloc) continue; if (adjust_ok(dev_dax, res)) { rc = adjust_dev_dax_range(dev_dax, res, resource_size(res) + alloc); - break; + if (rc) + return rc; + return alloc; } - rc = alloc_dev_dax_range(dev_dax, res->end + 1, alloc); - break; + rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc); + if (rc) + return rc; + return alloc; } - if (rc) - return rc; + + /* available was already calculated and should never be an issue */ + dev_WARN_ONCE(&dev_dax->dev, 1, "space not found?"); + return 0; +} + +static ssize_t dev_dax_resize(struct dax_region *dax_region, + struct dev_dax *dev_dax, resource_size_t size) +{ + resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; + resource_size_t dev_size = dev_dax_size(dev_dax); + struct device *dev = &dev_dax->dev; + resource_size_t alloc = 0; + + if (dev->driver) + return -EBUSY; + if (size == dev_size) + return 0; + if (size > dev_size && size - dev_size > avail) + return -ENOSPC; + if (size < dev_size) + return dev_dax_shrink(dev_dax, size); + + to_alloc = size - dev_size; + if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc), + "resize of %pa misaligned\n", &to_alloc)) + return -ENXIO; + +retry: + alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc); + if (alloc <= 0) + return alloc; to_alloc -= alloc; if (to_alloc) goto retry; @@ -1154,7 +1182,8 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr, to_alloc = range_len(&r); if (alloc_is_aligned(dev_dax, to_alloc)) - rc = alloc_dev_dax_range(dev_dax, r.start, to_alloc); + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start, + to_alloc); device_unlock(dev); device_unlock(dax_region->dev); @@ -1371,7 +1400,8 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) device_initialize(dev); dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); - rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, data->size); + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start, + data->size); if (rc) goto err_range; From patchwork Fri Aug 16 14:08:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766390 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 595781C0DD7; Fri, 16 Aug 2024 14:08:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817322; cv=none; b=sEz1TuNiGiX0Sq3c8VxbaFA9LCyiaDmG7fdEdqCXQXSgE3b/kTeS8pFzW13lGklcf0pBpPCTfVEDmeC+PZyZVUEAcr7ZenVr2niSWpNDNFF5VyXZSZ3KOZy4lnacWWUsoNfggJBwaS78DTf4z4jwtyH3CbrpxbtSHFhPW75+5J8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817322; c=relaxed/simple; bh=41imGvmAZOujDaS9hVfhyiJJp8/wSTZRA+xyf8QypMc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=q06fpOtO9xPsCN28ZVLSMjsla4UfvZS1eMlZiXQ6qDwnQlH/qVKfe+HJVZqXzhjS4tKus8Dvrhl8BaepL/vlID7/zKI68lCSucRjWP340xp2Ef1C1a8Zsw7BKMuo9PA66W6tKUQR7pE5/+DbBC9lh3/N0E8PCFhqoB/tiwWKsh0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=e2sv46VH; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="e2sv46VH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817321; x=1755353321; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=41imGvmAZOujDaS9hVfhyiJJp8/wSTZRA+xyf8QypMc=; b=e2sv46VH6z1czMJxfSP5zRZZui9x/RU2MZhxiqPs1JokhuVXQOAfVCT5 PDOQLWYhwoYaWbR4Pp8+UtuXUtoF1yTq5/CJgxmqYW6FqdivX3Qy5vQJ4 l+X2oejzIbqTzinqGUUUV5v25dRfLFBDNy7j1/J0rX5PVmZ+Nk8q5dCdu bS9rXu67EGr4grsVDYOb9G3bVer2TeoMv1O8pH+1IiURua3b6N4MdQ9ch lECH8y49f+4KA03gKZChY02hZh9eQpnklYh0IY0VKOp9MoFB4KihYSJpZ nrX9qkTtgbGrCjfR/udoh40/1J97gqbL64d8x9ufSbIdMbcVtGX+nhpy7 A==; X-CSE-ConnectionGUID: KgDxFsH+SbOs26dl39Dlag== X-CSE-MsgGUID: okaRAv61Tz+s2cM3URzXBA== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085299" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085299" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:41 -0700 X-CSE-ConnectionGUID: zDTf7pQ9SDKWojg18cTnDA== X-CSE-MsgGUID: EKw6f6WsSx+6prKhErdlvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571637" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:39 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:19 -0500 Subject: [PATCH RESEND v2 14/18] dax/region: Support DAX device creation on dynamic DAX regions Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-14-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=15280; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=41imGvmAZOujDaS9hVfhyiJJp8/wSTZRA+xyf8QypMc=; b=Fh0c4QJuhVOkwVgb304ml8kkU5gEaIPQabm6vk7dRlwJcHQbzcmhxtagso1Q9zLubYwkMHf3v oN3ZLFXGQD4C0qGfYpAucTEGBS5T4sHspPBG85IQ1RIFWzhO/ZuimQ8 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic Capacity (DC) DAX regions have a list of extents which define the memory of the region which is available. Now that DAX region extents are fully realized support DAX device creation on dynamic regions by adjusting the allocation algorithms to account for the extents. Remember also references must be held on the extents until the DAX devices are done with the memory. Redefine the region available size to include only extent space. Reuse the size allocation algorithm by defining sub-resources for each extent and limiting range allocation to those extents which have space. Do not support direct mapping of DAX devices on dynamic devices. Enhance DAX device range objects to hold references on the extents until the DAX device is destroyed. NOTE: At this time all extents within a region are created equally. However, labels are associated with extents which can be used with future DAX device labels to group which extents are used. Signed-off-by: Ira Weiny --- drivers/dax/bus.c | 157 +++++++++++++++++++++++++++++++++++++++------- drivers/dax/cxl.c | 44 +++++++++++++ drivers/dax/dax-private.h | 5 ++ 3 files changed, 182 insertions(+), 24 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index ea7ae82b4687..a9ea6a706702 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -280,6 +280,36 @@ static ssize_t region_align_show(struct device *dev, static struct device_attribute dev_attr_region_align = __ATTR(align, 0400, region_align_show, NULL); +#define for_each_extent_resource(extent, res) \ + for (res = (extent)->child; res; res = res->sibling) + +static unsigned long long +dr_extent_avail_size(struct dax_region_extent *dr_extent) +{ + unsigned long long rc; + struct resource *res; + + rc = resource_size(dr_extent->res); + for_each_extent_resource(dr_extent->res, res) + rc -= resource_size(res); + return rc; +} + +static int dax_region_add_dynamic_size(struct device *dev, void *data) +{ + unsigned long long *size = data, ext_size; + struct dax_reg_ext_dev *dr_reg_ext_dev; + + if (!is_dr_ext_dev(dev)) + return 0; + + dr_reg_ext_dev = to_dr_ext_dev(dev); + ext_size = dr_extent_avail_size(dr_reg_ext_dev->dr_extent); + dev_dbg(dev, "size %llx\n", ext_size); + *size += ext_size; + return 0; +} + #define for_each_dax_region_resource(dax_region, res) \ for (res = (dax_region)->res.child; res; res = res->sibling) @@ -290,8 +320,12 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region) device_lock_assert(dax_region->dev); - if (is_dynamic(dax_region)) - return 0; + if (is_dynamic(dax_region)) { + size = 0; + device_for_each_child(dax_region->dev, &size, + dax_region_add_dynamic_size); + return size; + } for_each_dax_region_resource(dax_region, res) size -= resource_size(res); @@ -421,15 +455,24 @@ EXPORT_SYMBOL_GPL(kill_dev_dax); static void trim_dev_dax_range(struct dev_dax *dev_dax) { int i = dev_dax->nr_range - 1; - struct range *range = &dev_dax->ranges[i].range; + struct dev_dax_range *dev_range = &dev_dax->ranges[i]; + struct range *range = &dev_range->range; struct dax_region *dax_region = dev_dax->region; + struct resource *res = &dax_region->res; device_lock_assert(dax_region->dev); dev_dbg(&dev_dax->dev, "delete range[%d]: %#llx:%#llx\n", i, (unsigned long long)range->start, (unsigned long long)range->end); - __release_region(&dax_region->res, range->start, range_len(range)); + if (dev_range->dr_extent) + res = dev_range->dr_extent->res; + + __release_region(res, range->start, range_len(range)); + + if (dev_range->dr_extent) + dr_extent_put(dev_range->dr_extent); + if (--dev_dax->nr_range == 0) { kfree(dev_dax->ranges); dev_dax->ranges = NULL; @@ -818,7 +861,8 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) } static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, - u64 start, resource_size_t size) + u64 start, resource_size_t size, + struct dax_region_extent *dr_extent) { struct dax_region *dax_region = dev_dax->region; struct device *dev = &dev_dax->dev; @@ -852,12 +896,15 @@ static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, for (i = 0; i < dev_dax->nr_range; i++) pgoff += PHYS_PFN(range_len(&ranges[i].range)); dev_dax->ranges = ranges; + if (dr_extent) + dr_extent_get(dr_extent); ranges[dev_dax->nr_range++] = (struct dev_dax_range) { .pgoff = pgoff, .range = { .start = alloc->start, .end = alloc->end, }, + .dr_extent = dr_extent, }; dev_dbg(dev, "alloc range[%d]: %pa:%pa\n", dev_dax->nr_range - 1, @@ -938,7 +985,8 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) int i; for (i = dev_dax->nr_range - 1; i >= 0; i--) { - struct range *range = &dev_dax->ranges[i].range; + struct dev_dax_range *dev_range = &dev_dax->ranges[i]; + struct range *range = &dev_range->range; struct dax_mapping *mapping = dev_dax->ranges[i].mapping; struct resource *adjust = NULL, *res; resource_size_t shrink; @@ -954,12 +1002,16 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) continue; } - for_each_dax_region_resource(dax_region, res) - if (strcmp(res->name, dev_name(dev)) == 0 - && res->start == range->start) { - adjust = res; - break; - } + if (dev_range->dr_extent) { + adjust = dev_range->dr_extent->res; + } else { + for_each_dax_region_resource(dax_region, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + } if (dev_WARN_ONCE(dev, !adjust || i != dev_dax->nr_range - 1, "failed to find matching resource\n")) @@ -973,12 +1025,15 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) /* * Only allow adjustments that preserve the relative pgoff of existing * allocations. I.e. the dev_dax->ranges array is ordered by increasing pgoff. + * Dissallow adjustments on dynamic regions as they can come from all over. */ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) { struct dev_dax_range *last; int i; + if (is_dynamic(dev_dax->region)) + return false; if (dev_dax->nr_range == 0) return false; if (strcmp(res->name, dev_name(&dev_dax->dev)) != 0) @@ -997,19 +1052,21 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) } /* - * dev_dax_resize_static - Expand the device into the unused portion of the - * region. This may involve adjusting the end of an existing resource, or - * allocating a new resource. + * __dev_dax_resize - Expand the device into the unused portion of the region. + * This may involve adjusting the end of an existing resource, or allocating a + * new resource. * * @parent: parent resource to allocate this range in. * @dev_dax: DAX device we are creating this range for * @to_alloc: amount of space to alloc; must be <= space available in @parent + * @dr_extent: if dynamic; the extent containing parent * * Return the amount of space allocated or -ERRNO on failure */ -static ssize_t dev_dax_resize_static(struct resource *parent, - struct dev_dax *dev_dax, - resource_size_t to_alloc) +static ssize_t __dev_dax_resize(struct resource *parent, + struct dev_dax *dev_dax, + resource_size_t to_alloc, + struct dax_region_extent *dr_extent) { struct resource *res, *first; int rc; @@ -1017,7 +1074,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, first = parent->child; if (!first) { rc = alloc_dev_dax_range(parent, dev_dax, - parent->start, to_alloc); + parent->start, to_alloc, + dr_extent); if (rc) return rc; return to_alloc; @@ -1031,7 +1089,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, if (res == first && res->start > parent->start) { alloc = min(res->start - parent->start, to_alloc); rc = alloc_dev_dax_range(parent, dev_dax, - parent->start, alloc); + parent->start, alloc, + dr_extent); if (rc) return rc; return alloc; @@ -1055,7 +1114,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, return rc; return alloc; } - rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc); + rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc, + dr_extent); if (rc) return rc; return alloc; @@ -1066,6 +1126,47 @@ static ssize_t dev_dax_resize_static(struct resource *parent, return 0; } +static ssize_t dev_dax_resize_static(struct dax_region *dax_region, + struct dev_dax *dev_dax, + resource_size_t to_alloc) +{ + return __dev_dax_resize(&dax_region->res, dev_dax, to_alloc, NULL); +} + +static int dax_region_find_space(struct device *dev, void *data) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev; + + if (!is_dr_ext_dev(dev)) + return 0; + + dr_reg_ext_dev = to_dr_ext_dev(dev); + return dr_extent_avail_size(dr_reg_ext_dev->dr_extent); +} + +static ssize_t dev_dax_resize_dynamic(struct dax_region *dax_region, + struct dev_dax *dev_dax, + resource_size_t to_alloc) +{ + struct dax_reg_ext_dev *dr_reg_ext_dev; + struct dax_region_extent *dr_extent; + resource_size_t alloc; + resource_size_t extent_max; + struct device *dev; + + dev = device_find_child(dax_region->dev, NULL, dax_region_find_space); + if (dev_WARN_ONCE(dax_region->dev, !dev, "Space should be available!")) + return -ENOSPC; + dr_reg_ext_dev = to_dr_ext_dev(dev); + dr_extent = dr_reg_ext_dev->dr_extent; + extent_max = dr_extent_avail_size(dr_extent); + to_alloc = min(extent_max, to_alloc); + alloc = __dev_dax_resize(dr_extent->res, dev_dax, to_alloc, dr_extent); + put_device(dev); + + return alloc; +} + static ssize_t dev_dax_resize(struct dax_region *dax_region, struct dev_dax *dev_dax, resource_size_t size) { @@ -1089,7 +1190,10 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region, return -ENXIO; retry: - alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc); + if (is_dynamic(dax_region)) + alloc = dev_dax_resize_dynamic(dax_region, dev_dax, to_alloc); + else + alloc = dev_dax_resize_static(dax_region, dev_dax, to_alloc); if (alloc <= 0) return alloc; to_alloc -= alloc; @@ -1168,6 +1272,9 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr, struct range r; ssize_t rc; + if (is_dynamic(dax_region)) + return -EINVAL; + rc = range_parse(buf, len, &r); if (rc) return rc; @@ -1183,7 +1290,7 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr, to_alloc = range_len(&r); if (alloc_is_aligned(dev_dax, to_alloc)) rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start, - to_alloc); + to_alloc, NULL); device_unlock(dev); device_unlock(dax_region->dev); @@ -1400,8 +1507,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) device_initialize(dev); dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); + dev_WARN_ONCE(parent, is_dynamic(dax_region) && data->size, + "Dynamic DAX devices are created initially with 0 size"); rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start, - data->size); + data->size, NULL); if (rc) goto err_range; diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 44cbd28668f1..6394a3531e25 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -12,6 +12,17 @@ static void dax_reg_ext_get(struct dax_region_extent *dr_extent) kref_get(&dr_extent->ref); } + +static void dax_region_rm_resource(struct dax_region_extent *dr_extent) +{ + struct dax_region *dax_region = dr_extent->region; + struct resource *res = dr_extent->res; + + dev_dbg(dax_region->dev, "Extent release resource %pR\n", + dr_extent->res); + __release_region(&dax_region->res, res->start, resource_size(res)); +} + static void dr_release(struct kref *kref) { struct dax_region_extent *dr_extent; @@ -19,6 +30,7 @@ static void dr_release(struct kref *kref) dr_extent = container_of(kref, struct dax_region_extent, ref); cxl_dr_ext = dr_extent->private_data; + dax_region_rm_resource(dr_extent); cxl_dr_extent_put(cxl_dr_ext); kfree(dr_extent); } @@ -28,6 +40,29 @@ static void dax_reg_ext_put(struct dax_region_extent *dr_extent) kref_put(&dr_extent->ref, dr_release); } +static int dax_region_add_resource(struct dax_region *dax_region, + struct dax_region_extent *dr_extent, + resource_size_t offset, + resource_size_t length) +{ + resource_size_t start = dax_region->res.start + offset; + struct resource *ext_res; + + dev_dbg(dax_region->dev, "DAX region resource %pR\n", &dax_region->res); + ext_res = __request_region(&dax_region->res, start, length, "extent", 0); + if (!ext_res) { + dev_err(dax_region->dev, "Failed to add extent s:%llx l:%llx\n", + start, length); + return -ENOSPC; + } + + dr_extent->region = dax_region; + dr_extent->res = ext_res; + dev_dbg(dax_region->dev, "Extent add resource %pR\n", ext_res); + + return 0; +} + static int cxl_dax_region_create_extent(struct dax_region *dax_region, struct cxl_dr_extent *cxl_dr_ext) { @@ -45,11 +80,20 @@ static int cxl_dax_region_create_extent(struct dax_region *dax_region, /* device manages the dr_extent on success */ kref_init(&dr_extent->ref); + rc = dax_region_add_resource(dax_region, dr_extent, + cxl_dr_ext->hpa_offset, + cxl_dr_ext->hpa_length); + if (rc) { + kfree(dr_extent); + return rc; + } + rc = dax_region_ext_create_dev(dax_region, dr_extent, cxl_dr_ext->hpa_offset, cxl_dr_ext->hpa_length, cxl_dr_ext->label); if (rc) { + dax_region_rm_resource(dr_extent); kfree(dr_extent); return rc; } diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 250babd6e470..ad73b53aa802 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -44,12 +44,16 @@ struct dax_region { /* * struct dax_region_extent - extent data defined by the low level region * driver. + * @region: cache of dax_region + * @res: cache of resource tree for this extent * @private_data: lower level region driver data * @ref: track number of dax devices which are using this extent * @get: get reference to low level data * @put: put reference to low level data */ struct dax_region_extent { + struct dax_region *region; + struct resource *res; void *private_data; struct kref ref; void (*get)(struct dax_region_extent *dr_extent); @@ -131,6 +135,7 @@ struct dev_dax { unsigned long pgoff; struct range range; struct dax_mapping *mapping; + struct dax_region_extent *dr_extent; } *ranges; }; From patchwork Fri Aug 16 14:08:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766391 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE43A1C0DD9; Fri, 16 Aug 2024 14:08:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817325; cv=none; b=eNWDxaOpsdHEqpoKfJEKZsGvE4cyatPRSamt0MZVP/tBKEhoXVJH01gDUx8rrXT/1QFvxL/hI3+0zAgEUO4SbWyslioUyaLnX98IXJ90wXHYY8A1dN1GmBmDlfy2LwAxRhaZd3I74YWPIakaVBYJs04JzdYkr7ml/xS8B5qkXKE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817325; c=relaxed/simple; bh=iqYi9qGKpsi4OQyYti2iKVMKe9GZTq7Nj4Xoy5LIsbs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=P0uZ6WBpea1848ry3w4kHouPxSxxVUkkiM4JneuqrHiX2Qi/26jozk8jmvsgH0zIqp8OHLpF+Xvo93w5O+oeKRC75jsrfPUzv4u4e5/Iq6Z8yhbrRRvMCTu5mSkmK6G8Vad2L9sTez4sbcvWscJFM8pJG0DYsRrbc9NHuiVS2XQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TBPFJbbQ; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TBPFJbbQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817324; x=1755353324; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=iqYi9qGKpsi4OQyYti2iKVMKe9GZTq7Nj4Xoy5LIsbs=; b=TBPFJbbQEXPMiE00LhZZ1t7/3wKVLdQd3yUwJpYzBf69zn1GSdsbcnoB FVVE8FcP/GBCf+RSEARvEL2yLArDCipDYwbimT72L9fngl6uEU2S8aNVH QAzAIB9bEfGbKzeNySAJ4h+EDAg4Lo1rriQMB1Vs3zrKSHdWRGuZHv+d0 HsZxRgwZ7bvDgc19hEMR/wW0Zz0b6QTD7+V7n8I27sf7TP0jUUyTlDV/1 gFh3uHIVUc6z9oGscxiE/S6BqrtUYKx8paVLE7XamhcTNsbfw5t8wEYoP kMHKvuz1fB0V6/rIWOregcglwSx7k5LCsgtmbanT6TP9qDx9AlVUgjRBx w==; X-CSE-ConnectionGUID: gy/+ewHtTmaWy4poI81AVg== X-CSE-MsgGUID: u4YN8rCYRVyFCNudJJfhag== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085305" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085305" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:44 -0700 X-CSE-ConnectionGUID: ed7KPw+SS8+JW2zSKxbX9Q== X-CSE-MsgGUID: IX6xieblTj+10ZR+3nE3IA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571661" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:42 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 09:08:20 -0500 Subject: [PATCH RESEND v2 15/18] cxl/mem: Trace Dynamic capacity Event Record Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-15-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=3471; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=btVbiwQ2WtrSPeA40kJolneVn+Xb13ZFysp+2MD41iM=; b=+Yzp3uQiwuAJVqveRVoNIKrZ6Ayuc3xYtCV4149AsyAopQ+vhzA3kdtAjfmBoLT6zwa1+F3co nvAzYCCpJu4D6/ID4I1CDYCR2ID+aYRHMO1wVzsd6gblkw4C5aC3X7M X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh CXL rev 3.0 section 8.2.9.2.1.5 defines the Dynamic Capacity Event Record Determine if the event read is a Dynamic capacity event record and if so trace the record for the debug purpose. Add DC trace points to the trace log. Signed-off-by: Navneet Singh Signed-off-by: Ira Weiny --- [iweiny: fixups] --- drivers/cxl/core/mbox.c | 5 ++++ drivers/cxl/core/trace.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 9d9c13e13ecf..9462c34aa1dc 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -939,6 +939,11 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd, (struct cxl_event_mem_module *)record; trace_cxl_memory_module(cxlmd, type, rec); + } else if (uuid_equal(id, &dc_event_uuid)) { + struct dcd_event_dyn_cap *rec = + (struct dcd_event_dyn_cap *)record; + + trace_cxl_dynamic_capacity(cxlmd, type, rec); } else { /* For unknown record types print just the header */ trace_cxl_generic_event(cxlmd, type, record); diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index a0b5819bc70b..1899c5cc96b9 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -703,6 +703,71 @@ TRACE_EVENT(cxl_poison, ) ); +/* + * DYNAMIC CAPACITY Event Record - DER + * + * CXL rev 3.0 section 8.2.9.2.1.5 Table 8-47 + */ + +#define CXL_DC_ADD_CAPACITY 0x00 +#define CXL_DC_REL_CAPACITY 0x01 +#define CXL_DC_FORCED_REL_CAPACITY 0x02 +#define CXL_DC_REG_CONF_UPDATED 0x03 +#define show_dc_evt_type(type) __print_symbolic(type, \ + { CXL_DC_ADD_CAPACITY, "Add capacity"}, \ + { CXL_DC_REL_CAPACITY, "Release capacity"}, \ + { CXL_DC_FORCED_REL_CAPACITY, "Forced capacity release"}, \ + { CXL_DC_REG_CONF_UPDATED, "Region Configuration Updated" } \ +) + +TRACE_EVENT(cxl_dynamic_capacity, + + TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log, + struct dcd_event_dyn_cap *rec), + + TP_ARGS(cxlmd, log, rec), + + TP_STRUCT__entry( + CXL_EVT_TP_entry + + /* Dynamic capacity Event */ + __field(u8, event_type) + __field(u16, hostid) + __field(u8, region_id) + __field(u64, dpa_start) + __field(u64, length) + __array(u8, tag, CXL_DC_EXTENT_TAG_LEN) + __field(u16, sh_extent_seq) + ), + + TP_fast_assign( + CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr); + + /* Dynamic_capacity Event */ + __entry->event_type = rec->data.event_type; + + /* DCD event record data */ + __entry->hostid = le16_to_cpu(rec->data.host_id); + __entry->region_id = rec->data.region_index; + __entry->dpa_start = le64_to_cpu(rec->data.extent.start_dpa); + __entry->length = le64_to_cpu(rec->data.extent.length); + memcpy(__entry->tag, &rec->data.extent.tag, CXL_DC_EXTENT_TAG_LEN); + __entry->sh_extent_seq = le16_to_cpu(rec->data.extent.shared_extn_seq); + ), + + CXL_EVT_TP_printk("event_type='%s' host_id='%d' region_id='%d' " \ + "starting_dpa=%llx length=%llx tag=%s " \ + "shared_extent_sequence=%d", + show_dc_evt_type(__entry->event_type), + __entry->hostid, + __entry->region_id, + __entry->dpa_start, + __entry->length, + __print_hex(__entry->tag, CXL_DC_EXTENT_TAG_LEN), + __entry->sh_extent_seq + ) +); + #endif /* _CXL_EVENTS_H */ #define TRACE_INCLUDE_FILE trace From patchwork Fri Aug 16 14:08:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766392 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DE061C6891; Fri, 16 Aug 2024 14:08:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817327; cv=none; b=HFYtelp6Gm3Rdm90IzulNio8C8PVNRfj8s2nXS05t38PBJ1U/t/b+542VQZhgu54ZOCwfa+Tqin3ralRj9sKFg//J9Dy7iNQRJk8gEGVLJzIGXfH4TTGY4oblELJdWXsZTD5+CKfNIkfTMXnsjiNM7IKIIfiEon6LPwmCJpObR4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817327; c=relaxed/simple; bh=93w/ASZfzWrnv/uCVUGHvBklgBaRWfbpAL8bkpVo3O8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=DFk5feOKY7T9giBipXff/Brp7xmmF3DDQ/+m97+26l14fB90o1mDx7ASsGG4RFiGx9HLpNM+thOhTQQbWTxZSWALiQXr67U5xO/l9IGPv0cj40gYhrdOV2P9kehoyyDhNqpmfvBXq4dhmOlI+SwkakgIGukqCwSGaOR8RzUXzDg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aBgB6liB; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aBgB6liB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817326; x=1755353326; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=93w/ASZfzWrnv/uCVUGHvBklgBaRWfbpAL8bkpVo3O8=; b=aBgB6liBmxaOY8vkTg9TR0Y+Prv27tMjhEhVHDSJ/pvNQIg5jcTS1MNK 7ENUgZSQ/WiiNuMYS+QbIGK5fVnlLDoXMwkqDI6HUdbJp52X51k2JOdRx aaFfGOOQtgqS5moXN/VQJXhdNvgZsRf40qcREBs4xT1JpGozMqqmfdAaH 6v4vNROUaZ0W4Ms2xIyxsHUpF8wcfqUUBgl3gjjJSC6nlbP9naquuzC2s 8zqF3XU3pgwt2u/ByBD6XGjDWXVpTuef6B/nsNBziDYWQBpoPYRzULICy fkOPeyRl7GUpK5yfvKkCn4h9hb+JhJbk/EJruh61PRIIIxcv1u9wYxCI2 w==; X-CSE-ConnectionGUID: 2eDcTkl5T2O8GTi/A7Oraw== X-CSE-MsgGUID: fxDSwUl/Rfalvr8KZY9EEw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085322" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085322" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:46 -0700 X-CSE-ConnectionGUID: elovYeeMSOKF+vhBMVw+GQ== X-CSE-MsgGUID: GJ82H05WSpi7HKsM8nBBEA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571666" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:44 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:21 -0500 Subject: [PATCH RESEND v2 16/18] tools/testing/cxl: Make event logs dynamic Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-16-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=15039; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=93w/ASZfzWrnv/uCVUGHvBklgBaRWfbpAL8bkpVo3O8=; b=miaoJLoChHiqe2ZkufDTonMjdqsLXGh0J+dLWMq1WhA1Dpn9BeSNJ2SVmDQkvLFjOrq2yRTvE CeoXAC+5YmeAKOqHdEqvbbn7ZsgCEu5bG2TWJjEixZNXwY6RS6CHxD5 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The test event logs were created as static arrays as an easy way to mock events. Dynamic Capacity Device (DCD) test support requires events be created dynamically when extents are created/destroyed. Modify the event log storage to be dynamically allocated. Thus they can accommodate the dynamic events required by DCD. Reuse the static event data to create the dynamic events in the new logs without inventing complex event injection through the test sysfs. Simplify the processing of the logs by using the event log array index as the handle. Add a lock to manage concurrency to come with DCD extent testing. Signed-off-by: Ira Weiny --- tools/testing/cxl/test/mem.c | 276 ++++++++++++++++++++++++++----------------- 1 file changed, 170 insertions(+), 106 deletions(-) diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 51be202fabd0..6a036c8d215d 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -118,18 +118,27 @@ static struct { #define PASS_TRY_LIMIT 3 -#define CXL_TEST_EVENT_CNT_MAX 15 +#define CXL_TEST_EVENT_CNT_MAX 17 /* Set a number of events to return at a time for simulation. */ #define CXL_TEST_EVENT_CNT 3 +/* + * @next_handle: next handle (index) to be stored to + * @cur_handle: current handle (index) to be returned to the user on get_event + * @nr_events: total events in this log + * @nr_overflow: number of events added past the log size + * @lock: protect these state variables + * @events: array of pending events to be returned. + */ struct mock_event_log { - u16 clear_idx; - u16 cur_idx; + u16 next_handle; + u16 cur_handle; u16 nr_events; u16 nr_overflow; - u16 overflow_reset; - struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX]; + rwlock_t lock; + /* 1 extra slot to accommodate that handles can't be 0 */ + struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX+1]; }; struct mock_event_store { @@ -163,64 +172,76 @@ static struct mock_event_log *event_find_log(struct device *dev, int log_type) return &mdata->mes.mock_logs[log_type]; } -static struct cxl_event_record_raw *event_get_current(struct mock_event_log *log) -{ - return log->events[log->cur_idx]; -} - -static void event_reset_log(struct mock_event_log *log) -{ - log->cur_idx = 0; - log->clear_idx = 0; - log->nr_overflow = log->overflow_reset; -} - -/* Handle can never be 0 use 1 based indexing for handle */ -static u16 event_get_clear_handle(struct mock_event_log *log) -{ - return log->clear_idx + 1; -} - /* Handle can never be 0 use 1 based indexing for handle */ -static __le16 event_get_cur_event_handle(struct mock_event_log *log) +static void event_inc_handle(u16 *handle) { - u16 cur_handle = log->cur_idx + 1; - - return cpu_to_le16(cur_handle); -} - -static bool event_log_empty(struct mock_event_log *log) -{ - return log->cur_idx == log->nr_events; + *handle = (*handle + 1) % CXL_TEST_EVENT_CNT_MAX; + if (!*handle) + *handle = *handle + 1; } +/* Add the event or free it on 'overflow' */ static void mes_add_event(struct mock_event_store *mes, enum cxl_event_log_type log_type, struct cxl_event_record_raw *event) { + struct device *dev = mes->mds->cxlds.dev; struct mock_event_log *log; + u16 handle; if (WARN_ON(log_type >= CXL_EVENT_TYPE_MAX)) return; log = &mes->mock_logs[log_type]; - if ((log->nr_events + 1) > CXL_TEST_EVENT_CNT_MAX) { + write_lock(&log->lock); + + handle = log->next_handle; + if ((handle + 1) == log->cur_handle) { log->nr_overflow++; - log->overflow_reset = log->nr_overflow; - return; + dev_dbg(dev, "Overflowing %d\n", log_type); + devm_kfree(dev, event); + goto unlock; } - log->events[log->nr_events] = event; + dev_dbg(dev, "Log %d; handle %u\n", log_type, handle); + event->hdr.handle = cpu_to_le16(handle); + log->events[handle] = event; + event_inc_handle(&log->next_handle); log->nr_events++; + +unlock: + write_unlock(&log->lock); +} + +static void mes_del_event(struct device *dev, + struct mock_event_log *log, + u16 handle) +{ + struct cxl_event_record_raw *cur; + + lockdep_assert(lockdep_is_held(&log->lock)); + + dev_dbg(dev, "Clearing event %u; cur %u\n", handle, log->cur_handle); + cur = log->events[handle]; + if (!cur) { + dev_err(dev, "Mock event index %u empty? nr_events %u", + handle, log->nr_events); + return; + } + log->events[handle] = NULL; + + event_inc_handle(&log->cur_handle); + log->nr_events--; + devm_kfree(dev, cur); } static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd) { struct cxl_get_event_payload *pl; struct mock_event_log *log; - u16 nr_overflow; u8 log_type; + u16 handle; int i; if (cmd->size_in != sizeof(log_type)) @@ -233,30 +254,38 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd) if (log_type >= CXL_EVENT_TYPE_MAX) return -EINVAL; - memset(cmd->payload_out, 0, cmd->size_out); - log = event_find_log(dev, log_type); - if (!log || event_log_empty(log)) + if (!log) return 0; + memset(cmd->payload_out, 0, cmd->size_out); pl = cmd->payload_out; - for (i = 0; i < CXL_TEST_EVENT_CNT && !event_log_empty(log); i++) { - memcpy(&pl->records[i], event_get_current(log), - sizeof(pl->records[i])); - pl->records[i].hdr.handle = event_get_cur_event_handle(log); - log->cur_idx++; + read_lock(&log->lock); + + handle = log->cur_handle; + dev_dbg(dev, "Get log %d handle %u next %u\n", + log_type, handle, log->next_handle); + for (i = 0; + i < CXL_TEST_EVENT_CNT && handle != log->next_handle; + i++, event_inc_handle(&handle)) { + struct cxl_event_record_raw *cur; + + cur = log->events[handle]; + dev_dbg(dev, "Sending event log %d handle %d idx %u\n", + log_type, le16_to_cpu(cur->hdr.handle), handle); + memcpy(&pl->records[i], cur, sizeof(pl->records[i])); } pl->record_count = cpu_to_le16(i); - if (!event_log_empty(log)) + if (log->nr_events > i) pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS; if (log->nr_overflow) { u64 ns; pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW; - pl->overflow_err_count = cpu_to_le16(nr_overflow); + pl->overflow_err_count = cpu_to_le16(log->nr_overflow); ns = ktime_get_real_ns(); ns -= 5000000000; /* 5s ago */ pl->first_overflow_timestamp = cpu_to_le64(ns); @@ -265,16 +294,17 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd) pl->last_overflow_timestamp = cpu_to_le64(ns); } + read_unlock(&log->lock); return 0; } static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd) { struct cxl_mbox_clear_event_payload *pl = cmd->payload_in; - struct mock_event_log *log; u8 log_type = pl->event_log; + struct mock_event_log *log; + int nr, rc = 0; u16 handle; - int nr; if (log_type >= CXL_EVENT_TYPE_MAX) return -EINVAL; @@ -283,24 +313,23 @@ static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd) if (!log) return 0; /* No mock data in this log */ - /* - * This check is technically not invalid per the specification AFAICS. - * (The host could 'guess' handles and clear them in order). - * However, this is not good behavior for the host so test it. - */ - if (log->clear_idx + pl->nr_recs > log->cur_idx) { - dev_err(dev, - "Attempting to clear more events than returned!\n"); - return -EINVAL; - } + write_lock(&log->lock); /* Check handle order prior to clearing events */ - for (nr = 0, handle = event_get_clear_handle(log); - nr < pl->nr_recs; - nr++, handle++) { + handle = log->cur_handle; + for (nr = 0; + nr < pl->nr_recs && handle != log->next_handle; + nr++, event_inc_handle(&handle)) { + + dev_dbg(dev, "Checking clear of %d handle %u plhandle %u\n", + log_type, handle, + le16_to_cpu(pl->handles[nr])); + if (handle != le16_to_cpu(pl->handles[nr])) { - dev_err(dev, "Clearing events out of order\n"); - return -EINVAL; + dev_err(dev, "Clearing events out of order %u %u\n", + handle, le16_to_cpu(pl->handles[nr])); + rc = -EINVAL; + goto unlock; } } @@ -308,25 +337,12 @@ static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd) log->nr_overflow = 0; /* Clear events */ - log->clear_idx += pl->nr_recs; - return 0; -} + for (nr = 0; nr < pl->nr_recs; nr++) + mes_del_event(dev, log, le16_to_cpu(pl->handles[nr])); -static void cxl_mock_event_trigger(struct device *dev) -{ - struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); - struct mock_event_store *mes = &mdata->mes; - int i; - - for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) { - struct mock_event_log *log; - - log = event_find_log(dev, i); - if (log) - event_reset_log(log); - } - - cxl_mem_get_event_records(mes->mds, mes->ev_status); +unlock: + write_unlock(&log->lock); + return rc; } struct cxl_event_record_raw maint_needed = { @@ -429,8 +445,29 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds, return 0; } -static void cxl_mock_add_event_logs(struct mock_event_store *mes) +/* Create a dynamically allocated event out of a statically defined event. */ +static void add_event_from_static(struct mock_event_store *mes, + enum cxl_event_log_type log_type, + struct cxl_event_record_raw *raw) +{ + struct device *dev = mes->mds->cxlds.dev; + struct cxl_event_record_raw *rec; + + rec = devm_kzalloc(dev, sizeof(*rec), GFP_KERNEL); + if (!rec) { + dev_err(dev, "Failed to alloc event for log\n"); + return; + } + + memcpy(rec, raw, sizeof(*rec)); + mes_add_event(mes, log_type, rec); +} + +static void cxl_mock_add_event_logs(struct cxl_mockmem_data *mdata) { + struct mock_event_store *mes = &mdata->mes; + struct device *dev = mes->mds->cxlds.dev; + put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK, &gen_media.validity_flags); @@ -438,43 +475,60 @@ static void cxl_mock_add_event_logs(struct mock_event_store *mes) CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN, &dram.validity_flags); - mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed); - mes_add_event(mes, CXL_EVENT_TYPE_INFO, + dev_dbg(dev, "Generating fake event logs %d\n", + CXL_EVENT_TYPE_INFO); + add_event_from_static(mes, CXL_EVENT_TYPE_INFO, &maint_needed); + add_event_from_static(mes, CXL_EVENT_TYPE_INFO, (struct cxl_event_record_raw *)&gen_media); - mes_add_event(mes, CXL_EVENT_TYPE_INFO, + add_event_from_static(mes, CXL_EVENT_TYPE_INFO, (struct cxl_event_record_raw *)&mem_module); mes->ev_status |= CXLDEV_EVENT_STATUS_INFO; - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &maint_needed); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + dev_dbg(dev, "Generating fake event logs %d\n", + CXL_EVENT_TYPE_FAIL); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &maint_needed); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, + (struct cxl_event_record_raw *)&mem_module); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&dram); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&gen_media); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&mem_module); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, (struct cxl_event_record_raw *)&dram); /* Overflow this log */ - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace); mes->ev_status |= CXLDEV_EVENT_STATUS_FAIL; - mes_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace); - mes_add_event(mes, CXL_EVENT_TYPE_FATAL, + dev_dbg(dev, "Generating fake event logs %d\n", + CXL_EVENT_TYPE_FATAL); + add_event_from_static(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace); + add_event_from_static(mes, CXL_EVENT_TYPE_FATAL, (struct cxl_event_record_raw *)&dram); mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL; } +static void cxl_mock_event_trigger(struct device *dev) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct mock_event_store *mes = &mdata->mes; + + cxl_mock_add_event_logs(mdata); + cxl_mem_get_event_records(mes->mds, mes->ev_status); +} + static int mock_gsl(struct cxl_mbox_cmd *cmd) { if (cmd->size_out < sizeof(mock_gsl_payload)) @@ -1391,6 +1445,14 @@ static ssize_t event_trigger_store(struct device *dev, } static DEVICE_ATTR_WO(event_trigger); +static void init_event_log(struct mock_event_log *log) +{ + rwlock_init(&log->lock); + /* Handle can never be 0 use 1 based indexing for handle */ + log->cur_handle = 1; + log->next_handle = 1; +} + static int __cxl_mock_mem_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -1458,7 +1520,9 @@ static int __cxl_mock_mem_probe(struct platform_device *pdev) return rc; mdata->mes.mds = mds; - cxl_mock_add_event_logs(&mdata->mes); + for (int i = 0; i < CXL_EVENT_TYPE_MAX; i++) + init_event_log(&mdata->mes.mock_logs[i]); + cxl_mock_add_event_logs(mdata); cxlmd = devm_cxl_add_memdev(cxlds); if (IS_ERR(cxlmd)) From patchwork Fri Aug 16 14:08:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766393 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 416901C0DEF; Fri, 16 Aug 2024 14:08:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817330; cv=none; b=mQgBL7qMRKCkzA/eNbfkzpIxaIs4rLo9vPYO8l5e0E9XXIXusFIqojPkwlOlF/Z2BhDnFtpf8A+8TvDj2FW2/PaLE0Jw4PKtiu43m5bIQMPl+9NaY1j162nDjgpEmMmK9v95TijU9FMlb4Lt1wD7oq5jjpr64lzF+tsZJMfJhmE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817330; c=relaxed/simple; bh=JkDM6qOAZQFkgrtzgYpKJMm1/f3q0KpCMbrLfqjn7K0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=pw1sEKZyKEaY9Yc4/nfa3ON6W6blUjK7q/m8wWCN8pDDrYNe+Lf5uklKdR028/+XUJetFwEtCTY3HMTb1H0TCo7JYbkQRIfPHAe7DeItM4cSBO+JlwwyZpwlfb+GIbPRCBT4R9NexuEX5PCNltixCIYAUB7IirUU65QMyojIzYI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YhSzX5ds; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YhSzX5ds" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817329; x=1755353329; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=JkDM6qOAZQFkgrtzgYpKJMm1/f3q0KpCMbrLfqjn7K0=; b=YhSzX5dsZvDrNKYrv/ggJxzFXrcqY/WRbMzZ3VAVPZqq5enOniIKqqlT sBVsiqgjjK256EZD+Joju4aZkTq9wYHuz3BRQCYNDe0yR7+xA/Zgc9om7 BizTJR/v/uLE9+pAV18u0IURhMgw5QyFwxEdkURk2jS0GINVsc+PmcbKd 1uAF2m/dNTidILtPvNByoGhiPWmdGXtzbP8BmE1114fD/CpbjwQWVwiVs /o10fP9mS99VkKDlbJynHtT9AZ/k6KxJUrsFMFKkB6D49ZLSPTbL7ucYG a+QO+rt21XYTE1xnNAfYs7QhrCRCjxMqu3iec6EwOOQq5oLMID0PIbX9q w==; X-CSE-ConnectionGUID: x5BqP1bkQDSbP29VxxLbHw== X-CSE-MsgGUID: ZM7zW1MZRz+y11xwVET6lQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085327" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085327" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:48 -0700 X-CSE-ConnectionGUID: ZSVPy8pcSg6eU3q04uQ1zQ== X-CSE-MsgGUID: kn9GxhzgQwiuYECNt8usIQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571672" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:46 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:22 -0500 Subject: [PATCH RESEND v2 17/18] tools/testing/cxl: Add DC Regions to mock mem data Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-17-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=15596; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=JkDM6qOAZQFkgrtzgYpKJMm1/f3q0KpCMbrLfqjn7K0=; b=sSOSUyzQrS4opwTrgI0qLmGwl+bEZyeiJbBuF75668wUFIcR9IaEfXRTR+KbWOYgmS7EnisQ7 8oPD5ILkf5yCpmioPQTyMncqKJFHR3HArNNi7oU9ooPW5hr3tGJK8b8 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= To test DC regions the mock memory devices will need to store information about the regions and manage fake extent data. Define mock_dc_region information within the mock memory data. Add sysfs entries on the mock device to inject and delete extents. The inject format is :: The delete format is Add DC mailbox commands to the CEL and implement those commands. Signed-off-by: Ira Weiny --- tools/testing/cxl/test/mem.c | 449 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 449 insertions(+) diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 6a036c8d215d..d6041a2145c5 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -18,6 +18,7 @@ #define FW_SLOTS 3 #define DEV_SIZE SZ_2G #define EFFECT(x) (1U << x) +#define BASE_DYNAMIC_CAP_DPA DEV_SIZE #define MOCK_INJECT_DEV_MAX 8 #define MOCK_INJECT_TEST_MAX 128 @@ -89,6 +90,22 @@ static struct cxl_cel_entry mock_cel[] = { .effect = cpu_to_le16(EFFECT(CONF_CHANGE_COLD_RESET) | EFFECT(CONF_CHANGE_IMMEDIATE)), }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_GET_DC_CONFIG), + .effect = CXL_CMD_EFFECT_NONE, + }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_GET_DC_EXTENT_LIST), + .effect = CXL_CMD_EFFECT_NONE, + }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_ADD_DC_RESPONSE), + .effect = cpu_to_le16(EFFECT(CONF_CHANGE_IMMEDIATE)), + }, + { + .opcode = cpu_to_le16(CXL_MBOX_OP_RELEASE_DC), + .effect = cpu_to_le16(EFFECT(CONF_CHANGE_IMMEDIATE)), + }, }; /* See CXL 2.0 Table 181 Get Health Info Output Payload */ @@ -147,6 +164,7 @@ struct mock_event_store { u32 ev_status; }; +#define NUM_MOCK_DC_REGIONS 2 struct cxl_mockmem_data { void *lsa; void *fw; @@ -161,6 +179,10 @@ struct cxl_mockmem_data { struct mock_event_store mes; u8 event_buf[SZ_4K]; u64 timestamp; + struct cxl_dc_region_config dc_regions[NUM_MOCK_DC_REGIONS]; + u32 dc_ext_generation; + struct xarray dc_extents; + struct xarray dc_accepted_exts; }; static struct mock_event_log *event_find_log(struct device *dev, int log_type) @@ -529,6 +551,98 @@ static void cxl_mock_event_trigger(struct device *dev) cxl_mem_get_event_records(mes->mds, mes->ev_status); } +static int devm_add_extent(struct device *dev, u64 start, u64 length, + const char *tag) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_dc_extent_data *extent; + + extent = devm_kzalloc(dev, sizeof(*extent), GFP_KERNEL); + if (!extent) { + dev_dbg(dev, "Failed to allocate extent\n"); + return -ENOMEM; + } + extent->dpa_start = start; + extent->length = length; + memcpy(extent->tag, tag, min(sizeof(extent->tag), strlen(tag))); + + if (xa_insert(&mdata->dc_extents, start, extent, GFP_KERNEL)) { + devm_kfree(dev, extent); + dev_err(dev, "Failed xarry insert %llx\n", start); + return -EINVAL; + } + mdata->dc_ext_generation++; + + return 0; +} + +static int dc_accept_extent(struct device *dev, u64 start) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + + dev_dbg(dev, "Accepting extent 0x%llx\n", start); + return xa_insert(&mdata->dc_accepted_exts, start, (void *)start, + GFP_KERNEL); +} + +static void release_dc_ext(void *md) +{ + struct cxl_mockmem_data *mdata = md; + + xa_destroy(&mdata->dc_extents); + xa_destroy(&mdata->dc_accepted_exts); +} + +static int cxl_mock_dc_region_setup(struct device *dev) +{ +#define DUMMY_EXT_OFFSET SZ_256M +#define DUMMY_EXT_LENGTH SZ_256M + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + u64 base_dpa = BASE_DYNAMIC_CAP_DPA; + u32 dsmad_handle = 0xFADE; + u64 decode_length = SZ_2G; + u64 block_size = SZ_512; + /* For testing make this smaller than decode length */ + u64 length = SZ_1G; + int rc; + + xa_init(&mdata->dc_extents); + xa_init(&mdata->dc_accepted_exts); + + rc = devm_add_action_or_reset(dev, release_dc_ext, mdata); + if (rc) + return rc; + + for (int i = 0; i < NUM_MOCK_DC_REGIONS; i++) { + struct cxl_dc_region_config *conf = &mdata->dc_regions[i]; + + dev_dbg(dev, "Creating DC region DC%d DPA:%llx LEN:%llx\n", + i, base_dpa, length); + + conf->region_base = cpu_to_le64(base_dpa); + conf->region_decode_length = cpu_to_le64(decode_length / + CXL_CAPACITY_MULTIPLIER); + conf->region_length = cpu_to_le64(length); + conf->region_block_size = cpu_to_le64(block_size); + conf->region_dsmad_handle = cpu_to_le32(dsmad_handle); + dsmad_handle++; + + /* Pretend we have some previous accepted extents */ + rc = devm_add_extent(dev, base_dpa + DUMMY_EXT_OFFSET, + DUMMY_EXT_LENGTH, "CXL-TEST"); + if (rc) + return rc; + + rc = dc_accept_extent(dev, base_dpa + DUMMY_EXT_OFFSET); + if (rc) + return rc; + + base_dpa += decode_length; + } + + return 0; +} + static int mock_gsl(struct cxl_mbox_cmd *cmd) { if (cmd->size_out < sizeof(mock_gsl_payload)) @@ -1315,6 +1429,148 @@ static int mock_activate_fw(struct cxl_mockmem_data *mdata, return -EINVAL; } +static int mock_get_dc_config(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mbox_get_dc_config *dc_config = cmd->payload_in; + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + u8 region_requested, region_start_idx, region_ret_cnt; + struct cxl_mbox_dynamic_capacity *resp; + + region_requested = dc_config->region_count; + if (NUM_MOCK_DC_REGIONS < region_requested) + region_requested = NUM_MOCK_DC_REGIONS; + + if (cmd->size_out < struct_size(resp, region, region_requested)) + return -EINVAL; + + memset(cmd->payload_out, 0, cmd->size_out); + resp = cmd->payload_out; + + region_start_idx = dc_config->start_region_index; + region_ret_cnt = 0; + for (int i = 0; i < NUM_MOCK_DC_REGIONS; i++) { + if (i >= region_start_idx) { + memcpy(&resp->region[region_ret_cnt], + &mdata->dc_regions[i], + sizeof(resp->region[region_ret_cnt])); + region_ret_cnt++; + } + } + resp->avail_region_count = region_ret_cnt; + + dev_dbg(dev, "Returning %d dc regions\n", region_ret_cnt); + return 0; +} + + +static int mock_get_dc_extent_list(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_mbox_get_dc_extent *get = cmd->payload_in; + struct cxl_mbox_dc_extents *resp = cmd->payload_out; + u32 total_avail = 0, total_ret = 0; + struct cxl_dc_extent_data *ext; + u32 ext_count, start_idx; + unsigned long i; + + ext_count = le32_to_cpu(get->extent_cnt); + start_idx = le32_to_cpu(get->start_extent_index); + + memset(resp, 0, sizeof(*resp)); + + /* + * Total available needs to be calculated and returned regardless of + * how many can actually be returned. + */ + xa_for_each(&mdata->dc_extents, i, ext) + total_avail++; + + if (start_idx > total_avail) + return -EINVAL; + + xa_for_each(&mdata->dc_extents, i, ext) { + if (total_ret >= ext_count) + break; + + if (total_ret >= start_idx) { + resp->extent[total_ret].start_dpa = + cpu_to_le64(ext->dpa_start); + resp->extent[total_ret].length = + cpu_to_le64(ext->length); + memcpy(&resp->extent[total_ret].tag, ext->tag, + sizeof(resp->extent[total_ret])); + resp->extent[total_ret].shared_extn_seq = + cpu_to_le16(ext->shared_extent_seq); + total_ret++; + } + } + + resp->ret_extent_cnt = cpu_to_le32(total_ret); + resp->total_extent_cnt = cpu_to_le32(total_avail); + resp->extent_list_num = cpu_to_le32(mdata->dc_ext_generation); + + dev_dbg(dev, "Returning %d extents of %d total\n", + total_ret, total_avail); + + return 0; +} + +static int mock_add_dc_response(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mbox_dc_response *req = cmd->payload_in; + u32 list_size = le32_to_cpu(req->extent_list_size); + + for (int i = 0; i < list_size; i++) { + u64 start = le64_to_cpu(req->extent_list[i].dpa_start); + int rc; + + dev_dbg(dev, "Extent 0x%llx accepted by HOST\n", start); + rc = dc_accept_extent(dev, start); + if (rc) + return rc; + } + + return 0; +} + +static int dc_delete_extent(struct device *dev, unsigned long long start) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + void *ext; + + dev_dbg(dev, "Deleting extent at %llx\n", start); + + ext = xa_erase(&mdata->dc_extents, start); + if (!ext) { + dev_err(dev, "No extent found at %llx\n", start); + return -EINVAL; + } + devm_kfree(dev, ext); + mdata->dc_ext_generation++; + + return 0; +} + +static int mock_dc_release(struct device *dev, + struct cxl_mbox_cmd *cmd) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_mbox_dc_response *req = cmd->payload_in; + u32 list_size = le32_to_cpu(req->extent_list_size); + + for (int i = 0; i < list_size; i++) { + u64 start = le64_to_cpu(req->extent_list[i].dpa_start); + + dev_dbg(dev, "Extent 0x%llx released by HOST\n", start); + xa_erase(&mdata->dc_accepted_exts, start); + } + + return 0; +} + static int cxl_mock_mbox_send(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd) { @@ -1399,6 +1655,18 @@ static int cxl_mock_mbox_send(struct cxl_memdev_state *mds, case CXL_MBOX_OP_ACTIVATE_FW: rc = mock_activate_fw(mdata, cmd); break; + case CXL_MBOX_OP_GET_DC_CONFIG: + rc = mock_get_dc_config(dev, cmd); + break; + case CXL_MBOX_OP_GET_DC_EXTENT_LIST: + rc = mock_get_dc_extent_list(dev, cmd); + break; + case CXL_MBOX_OP_ADD_DC_RESPONSE: + rc = mock_add_dc_response(dev, cmd); + break; + case CXL_MBOX_OP_RELEASE_DC: + rc = mock_dc_release(dev, cmd); + break; default: break; } @@ -1467,6 +1735,10 @@ static int __cxl_mock_mem_probe(struct platform_device *pdev) return -ENOMEM; dev_set_drvdata(dev, mdata); + rc = cxl_mock_dc_region_setup(dev); + if (rc) + return rc; + mdata->lsa = vmalloc(LSA_SIZE); if (!mdata->lsa) return -ENOMEM; @@ -1515,6 +1787,10 @@ static int __cxl_mock_mem_probe(struct platform_device *pdev) if (rc) return rc; + rc = cxl_dev_dynamic_capacity_identify(mds); + if (rc) + return rc; + rc = cxl_mem_create_range_info(mds); if (rc) return rc; @@ -1528,6 +1804,10 @@ static int __cxl_mock_mem_probe(struct platform_device *pdev) if (IS_ERR(cxlmd)) return PTR_ERR(cxlmd); + rc = cxl_dev_get_dynamic_capacity_extents(mds); + if (rc) + return rc; + rc = cxl_memdev_setup_fw_upload(mds); if (rc) return rc; @@ -1669,10 +1949,179 @@ static ssize_t fw_buf_checksum_show(struct device *dev, static DEVICE_ATTR_RO(fw_buf_checksum); +/* Returns if the proposed extent is valid */ +static bool new_extent_valid(struct device *dev, size_t new_start, + size_t new_len) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + struct cxl_dc_extent_data *extent; + size_t new_end, i; + + if (!new_len) + return -EINVAL; + + new_end = new_start + new_len; + + dev_dbg(dev, "New extent %zx-%zx\n", new_start, new_end); + + /* Overlap with other extent? */ + xa_for_each(&mdata->dc_extents, i, extent) { + size_t ext_end = extent->dpa_start + extent->length; + + if (extent->dpa_start <= new_start && new_start < ext_end) { + dev_err(dev, "Extent overlap: Start %llu ?<= %zx ?<= %zx\n", + extent->dpa_start, new_start, ext_end); + return false; + } + if (extent->dpa_start <= new_end && new_end < ext_end) { + dev_err(dev, "Extent overlap: End %llx ?<= %zx ?<= %zx\n", + extent->dpa_start, new_end, ext_end); + return false; + } + } + + /* Ensure it is in a region and is valid for that regions block size */ + for (int i = 0; i < NUM_MOCK_DC_REGIONS; i++) { + struct cxl_dc_region_config *dc_region = &mdata->dc_regions[i]; + size_t reg_start, reg_end; + + reg_start = le64_to_cpu(dc_region->region_base); + reg_end = le64_to_cpu(dc_region->region_length); + reg_end += reg_start; + + dev_dbg(dev, "Region %d: %zx-%zx\n", i, reg_start, reg_end); + + if (reg_start >= new_start && new_end < reg_end) { + u64 block_size = le64_to_cpu(dc_region->region_block_size); + + if (new_start % block_size || new_len % block_size) { + dev_err(dev, "Extent not aligned to block size: start %zx; len %zx; block_size 0x%llx\n", + new_start, new_len, block_size); + return false; + } + + dev_dbg(dev, "Extent in region %d\n", i); + return true; + } + } + + return false; +} + +/* + * Format :: + * + * start and length must be a multiple of the configured region block size. + * Tag can be any string up to 16 bytes. + * + * Extents must be exclusive of other extents + */ +static ssize_t dc_inject_extent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + char *start_str __free(kfree) = kstrdup(buf, GFP_KERNEL); + unsigned long long start, length; + char *len_str, *tag_str; + size_t buf_len = count; + int rc; + + if (!start_str) + return -ENOMEM; + + len_str = strnchr(start_str, buf_len, ':'); + if (!len_str) { + dev_err(dev, "Extent failed to find len_str: %s\n", start_str); + return -EINVAL; + } + + *len_str = '\0'; + len_str += 1; + buf_len -= strlen(start_str); + + tag_str = strnchr(len_str, buf_len, ':'); + if (!tag_str) { + dev_err(dev, "Extent failed to find tag_str: %s\n", len_str); + return -EINVAL; + } + *tag_str = '\0'; + tag_str += 1; + + if (kstrtoull(start_str, 0, &start)) { + dev_err(dev, "Extent failed to parse start: %s\n", start_str); + return -EINVAL; + } + if (kstrtoull(len_str, 0, &length)) { + dev_err(dev, "Extent failed to parse length: %s\n", len_str); + return -EINVAL; + } + + if (!new_extent_valid(dev, start, length)) + return -EINVAL; + + rc = devm_add_extent(dev, start, length, tag_str); + if (rc) + return rc; + + return count; +} +static DEVICE_ATTR_WO(dc_inject_extent); + +static ssize_t dc_del_extent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned long long start; + int rc; + + if (kstrtoull(buf, 0, &start)) { + dev_err(dev, "Extent failed to parse start value\n"); + return -EINVAL; + } + + rc = dc_delete_extent(dev, start); + if (rc) + return rc; + + return count; +} +static DEVICE_ATTR_WO(dc_del_extent); + +static ssize_t dc_force_del_extent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); + unsigned long long start; + void *ext; + int rc; + + if (kstrtoull(buf, 0, &start)) { + dev_err(dev, "Extent failed to parse start value\n"); + return -EINVAL; + } + + ext = xa_erase(&mdata->dc_accepted_exts, start); + if (ext) + dev_dbg(dev, "Forcing remove of accepted extent: %llx\n", + start); + + dev_dbg(dev, "Forcing delete of extent at %llx\n", start); + rc = dc_delete_extent(dev, start); + if (rc) + return rc; + + return count; +} +static DEVICE_ATTR_WO(dc_force_del_extent); + static struct attribute *cxl_mock_mem_attrs[] = { &dev_attr_security_lock.attr, &dev_attr_event_trigger.attr, &dev_attr_fw_buf_checksum.attr, + &dev_attr_dc_inject_extent.attr, + &dev_attr_dc_del_extent.attr, + &dev_attr_dc_force_del_extent.attr, NULL }; ATTRIBUTE_GROUPS(cxl_mock_mem); From patchwork Fri Aug 16 14:08:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766394 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 553941C7B7F; Fri, 16 Aug 2024 14:08:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817333; cv=none; b=t4zWjK8SLpZNYMwDZNGSbw3yjwzSG52GvFEs4udk85qb6vTF/OxN8rJOicqM/0ftyr4jlTRC5E7EVUbBmGlTgARl4wM/5RotH6bR4+hTMQY+l3DPIEtChkLh3FPz9fRCM3GuAZbzv4EIOaULLgud2sT+VLLOPc4QuRvCf56EDmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723817333; c=relaxed/simple; bh=18K1X6hkpyxWWugHftsiPLTtiWyO1FKFLEPU84k6jk8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Bz1hcmLSn6UWVorVuPpwsLtMeptBHj0hRLL8clyKmfiKswMK8NB9PCBPda6IBaUrOjlNoj8E7RPq/VZmDP08YrqJoe0gtvlFHQben5Q0lr16rlU2Lm/m1qMs9blcIRNTx2kqESsIc4T3iqdCWPgnbgbbC+bAGNXt1Ik+jOaYndo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=F71zT2Qz; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="F71zT2Qz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723817332; x=1755353332; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=18K1X6hkpyxWWugHftsiPLTtiWyO1FKFLEPU84k6jk8=; b=F71zT2QzRlrccXTPEfG2lScPyJ2Q0pZqfhC7zujLGabZOKzDr9UchFzM /2fJzbvo72GyerfILkJIMNaUQvafuTbjThHqySmtlGbww2jxjW7ssJCTL vU5xLurVHkkL+z0xmsYdAyH+qNNJejaKIKruEKRESFJubVmG+Zz302mOM ujKRrOHK5y/7Q5gO5zWYryw09mm4N4GKlzX+fysgRRiC0zrWDe8i3TIlO UkZma8m/OT2mxJIxps8EGW249i+BgQhyWe7EaaXOl7rxfVXK56LheLDhY Yl4JTM/X3rnaNJUh6562yUuCQyKDw6lHnPxJRPswId+czZI3UasmQbvq3 A==; X-CSE-ConnectionGUID: L5/gFeDAQ62HAvg5ReH03A== X-CSE-MsgGUID: zADiQVYfQKSwoixV9uKXNQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22085331" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22085331" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:52 -0700 X-CSE-ConnectionGUID: hvMeK9GsR9atL7/MfElkRA== X-CSE-MsgGUID: HSXWDMB/SoWj0yvzCqPePQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="64571678" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:08:49 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:08:23 -0500 Subject: [PATCH RESEND v2 18/18] tools/testing/cxl: Add Dynamic Capacity events Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-18-b4044aadf2bd@intel.com> References: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-b4044aadf2bd@intel.com> To: Dan Williams Cc: Navneet Singh , Fan Ni , Jonathan Cameron , Davidlohr Bueso , Dave Jiang , Alison Schofield , Vishal Verma , Ira Weiny , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.15-dev-37811 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723817288; l=3592; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=18K1X6hkpyxWWugHftsiPLTtiWyO1FKFLEPU84k6jk8=; b=I7T8LRUkZz9eSXvjIReM42nnl34Lpi7JddyThzZA/fKRnXnENig+usrRa790RoOlMk3FkMyQI AwVFXc5vGbIBXKOnPuakmfeBkKtXunYH3BgW6l3PlwfQqaxU4GXTQWB X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= OS software needs to be alerted when new extents arrive on a Dynamic Capacity Device (DCD). On test DCDs extents are added through sysfs. Add events on DCD extent injection. Directly call the event irq callback to simulate irqs to process the test extents. Signed-off-by: Ira Weiny --- tools/testing/cxl/test/mem.c | 57 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index d6041a2145c5..20364fee9df9 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -2008,6 +2008,41 @@ static bool new_extent_valid(struct device *dev, size_t new_start, return false; } +struct dcd_event_dyn_cap dcd_event_rec_template = { + .hdr = { + .id = UUID_INIT(0xca95afa7, 0xf183, 0x4018, + 0x8c, 0x2f, 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a), + .length = sizeof(struct dcd_event_dyn_cap), + }, +}; + +static int send_dc_event(struct mock_event_store *mes, enum dc_event type, + u64 start, u64 length, const char *tag_str) +{ + struct device *dev = mes->mds->cxlds.dev; + struct dcd_event_dyn_cap *dcd_event_rec; + + dcd_event_rec = devm_kzalloc(dev, sizeof(*dcd_event_rec), GFP_KERNEL); + if (!dcd_event_rec) + return -ENOMEM; + + memcpy(dcd_event_rec, &dcd_event_rec_template, sizeof(*dcd_event_rec)); + dcd_event_rec->data.event_type = type; + dcd_event_rec->data.extent.start_dpa = cpu_to_le64(start); + dcd_event_rec->data.extent.length = cpu_to_le64(length); + memcpy(dcd_event_rec->data.extent.tag, tag_str, + min(sizeof(dcd_event_rec->data.extent.tag), + strlen(tag_str))); + + mes_add_event(mes, CXL_EVENT_TYPE_DCD, + (struct cxl_event_record_raw *)dcd_event_rec); + + /* Fake the irq */ + cxl_mem_get_event_records(mes->mds, CXLDEV_EVENT_STATUS_DCD); + + return 0; +} + /* * Format :: * @@ -2021,6 +2056,7 @@ static ssize_t dc_inject_extent_store(struct device *dev, const char *buf, size_t count) { char *start_str __free(kfree) = kstrdup(buf, GFP_KERNEL); + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); unsigned long long start, length; char *len_str, *tag_str; size_t buf_len = count; @@ -2063,6 +2099,13 @@ static ssize_t dc_inject_extent_store(struct device *dev, if (rc) return rc; + rc = send_dc_event(&mdata->mes, DCD_ADD_CAPACITY, start, length, + tag_str); + if (rc) { + dev_err(dev, "Failed to add event %d\n", rc); + return rc; + } + return count; } static DEVICE_ATTR_WO(dc_inject_extent); @@ -2071,6 +2114,7 @@ static ssize_t dc_del_extent_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { + struct cxl_mockmem_data *mdata = dev_get_drvdata(dev); unsigned long long start; int rc; @@ -2083,6 +2127,12 @@ static ssize_t dc_del_extent_store(struct device *dev, if (rc) return rc; + rc = send_dc_event(&mdata->mes, DCD_RELEASE_CAPACITY, start, 0, ""); + if (rc) { + dev_err(dev, "Failed to add event %d\n", rc); + return rc; + } + return count; } static DEVICE_ATTR_WO(dc_del_extent); @@ -2111,6 +2161,13 @@ static ssize_t dc_force_del_extent_store(struct device *dev, if (rc) return rc; + rc = send_dc_event(&mdata->mes, DCD_FORCED_CAPACITY_RELEASE, + start, 0, ""); + if (rc) { + dev_err(dev, "Failed to add event %d\n", rc); + return rc; + } + return count; } static DEVICE_ATTR_WO(dc_force_del_extent);