From patchwork Fri Aug 16 13:59:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766352 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1659C1BB69B; Fri, 16 Aug 2024 13:59:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816797; cv=none; b=rE3By0AnOgiCP4zet3dWIleL0fZYzc03S98R6iDtDX4KYRxsCNoUQokIMhf75rj1Kf4xyZC/PAwBc8VM7kUiVeXMSeg9w/BnS9+Ett3BYXbYWKe7W4nzYrtwtacZ3L6o7Pohqig6Z+xNx1JyZZnXqSYrZGPAcgI3M7ar4PCrFx8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816797; c=relaxed/simple; bh=iuwoKFBy90JgvWLAvMU7CdNYTV/UBM55Pi1kkx/Ytv0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MYLyyWG6mAeqCvfqN38K/I+74jnxgZtOb2mx8i3uf6uw0tb3NrXPCbPCeL9O4axrP5oSsLlHR6nMgLydKnQJ7Kq+UooErM/byAGAtumCS2/+a6l+4oBzhmsldHO0yqms3CQGeIPNcVftkD34qsc6CbEf8z2KzXaskcPohElLhAE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UVMSrejw; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UVMSrejw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816796; x=1755352796; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=iuwoKFBy90JgvWLAvMU7CdNYTV/UBM55Pi1kkx/Ytv0=; b=UVMSrejwC2I58NYuemt3lKPtUvw1qg9CkWHi797PaGAgQ7UU6xQ1Y/wR EdcKBvdsUBAEqK2cmWgNXCeVL910SY8rU+IWT5aUAkpT39Gn6Nmq2AWvK aRwd/ZjUcw6thH8ayr2f3EdHE2MsV9FY6x5c8KANpsAuxISngmSW9LcIu 2BzBpY3XLg8zYNSWIooA/KkZSQNfB35ghRyw9gNoNOp2sJ7Ilwzm8oc2P uVc6pRDejagKxsSfMuEzX99PmhMdjI109f2SWU0AXzc7EvxZj0VZFvC4B siDY4d9erz4DRfK8INPsEebG1LvvgpXArSgKoZtf8BtLeEnNiCzMVO1YW Q==; X-CSE-ConnectionGUID: jdMVji4eRwWBqLnxspr+xQ== X-CSE-MsgGUID: 387r/qqURSGFDSfJvzLPkw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272721" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272721" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 06:59:55 -0700 X-CSE-ConnectionGUID: oPiRCRZ0TUSN/kjfl8bs3A== X-CSE-MsgGUID: +UlXRaZlS4q/9LngM6RQLA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90410994" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 06:59:55 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 08:59:49 -0500 Subject: [PATCH v2 01/25] range: Add range_overlaps() Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-1-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Mason , Josef Bacik , David Sterba , Johannes Thumshirn X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=3425; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=iuwoKFBy90JgvWLAvMU7CdNYTV/UBM55Pi1kkx/Ytv0=; b=t0DhbhgY8dfn3/XHD3DXg/WDDbVvXnUzbg+lWnTqRCGt1FLe7m5300xpd5mob3k89VSykSE4/ vfU/NJ2SC4RBWn42euFHoRzzMs5JEKSqzNVds73QoFLtA5yxGaQkr0O X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Code to support CXL Dynamic Capacity devices will have extent ranges which need to be compared for intersection not a subset as is being checked in range_contains(). range_overlaps() is defined in btrfs with a different meaning from what is required in the standard range code. Dan Williams pointed this out in [1]. Adjust the btrfs call according to his suggestion there. Then add a generic range_overlaps(). Cc: Dan Williams Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Cc: linux-btrfs@vger.kernel.org Acked-by: David Sterba Reviewed-by: Davidlohr Bueso Reviewed-by: Johannes Thumshirn Reviewed-by: Fan Ni Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Signed-off-by: Ira Weiny [1] https://lore.kernel.org/all/65949f79ef908_8dc68294f2@dwillia2-xfh.jf.intel.com.notmuch/ --- fs/btrfs/ordered-data.c | 10 +++++----- include/linux/range.h | 7 +++++++ 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 82a68394a89c..37164cc44a25 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -111,8 +111,8 @@ static struct rb_node *__tree_search(struct rb_root *root, u64 file_offset, return NULL; } -static int range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, - u64 len) +static int btrfs_range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, + u64 len) { if (file_offset + len <= entry->file_offset || entry->file_offset + entry->num_bytes <= file_offset) @@ -985,7 +985,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range( while (1) { entry = rb_entry(node, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) break; if (entry->file_offset >= file_offset + len) { @@ -1114,12 +1114,12 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range( } if (prev) { entry = rb_entry(prev, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) goto out; } if (next) { entry = rb_entry(next, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) goto out; } /* No ordered extent in the range */ diff --git a/include/linux/range.h b/include/linux/range.h index 6ad0b73cb7ad..9a46f3212965 100644 --- a/include/linux/range.h +++ b/include/linux/range.h @@ -13,11 +13,18 @@ static inline u64 range_len(const struct range *range) return range->end - range->start + 1; } +/* True if r1 completely contains r2 */ static inline bool range_contains(struct range *r1, struct range *r2) { return r1->start <= r2->start && r1->end >= r2->end; } +/* True if any part of r1 overlaps r2 */ +static inline bool range_overlaps(struct range *r1, struct range *r2) +{ + return r1->start <= r2->end && r1->end >= r2->start; +} + int add_range(struct range *range, int az, int nr_range, u64 start, u64 end); From patchwork Fri Aug 16 13:59:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766353 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29A521BCA0C; Fri, 16 Aug 2024 13:59:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816800; cv=none; b=fPKJfKAV7PU5ylAVb6DzN44eDAMmmt/KS/y+ZBiB23pgDZya+dLs93xMZ5TEaXhSXAKyGS33vNMLvubcAi0Xzewl/WsoDoInntR4Zdq27x2d3wviw9+FeFdRFiVKjCKZ9jBT9D7Yy82H2+OGTMeilMmkJMd7F+QDJpsbDE7sNgI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816800; c=relaxed/simple; bh=4PkH1xqJoyauYPLTeFqwpXSPhLz7QrsYzPhGNpKM7KQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=OB7hwxX1MwwRPREq7Nmu7Me4grWeJF33J+ksRsEDwWMa6a8/cnipv1Dpoa1yhkACIkqdfxXiur9znEAYkmlEyDVGKssIwOG6ke4K01NX3urzy2FyoZpKUu6U2ClNXhIkF0/wpgWnY5T7Kn8Io/FTqJvw/+qc6GtiGb9NtMKWJWk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EFMgH9vB; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EFMgH9vB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816798; x=1755352798; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=4PkH1xqJoyauYPLTeFqwpXSPhLz7QrsYzPhGNpKM7KQ=; b=EFMgH9vBgUra+xfJub4zH5EunJ7VY+FDjeERBo3T7oW7hvTyDUQWgjlb DeXmUbA+5fb7gZpOskzu1uDYbTHL0+Hm/sE0mtiQWvR0wnT0D3fgQqT5T 2zSP0U8QfmNVwW3TSUTwM4UknulYewe9iD2nqBgZEnzH92/sa2bs55br/ u8/5BcrxtiJP+m+W3jm2EUan6iv4zbQ1kNRBl29JUJ1LlZ+3qAa8iLe7s NvNqyNT5Jwruw5KOkBWKvSz8yhHZLwzAPKm94TlVkOtZ9c3q3twGXLsYO TnFtaP5W7fLWZodpfUiMrNThQjTOHs4nmh5vjoxbiZbSWwrwwp0kXlXM+ Q==; X-CSE-ConnectionGUID: EZtpySuRTcaZsgrogD4mBA== X-CSE-MsgGUID: jYjSkizhQaSK+U0abjlFcg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272733" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272733" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 06:59:57 -0700 X-CSE-ConnectionGUID: jW0ko+rETeeKZeMEcDL96w== X-CSE-MsgGUID: zFuzs5tqSDShBwmW/Bez+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411023" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 06:59:57 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 08:59:50 -0500 Subject: [PATCH v2 02/25] printk: Add print format (%par) for struct range Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-2-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Petr Mladek , Steven Rostedt , Jonathan Corbet , "open list:DOCUMENTATION" X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=4286; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=4PkH1xqJoyauYPLTeFqwpXSPhLz7QrsYzPhGNpKM7KQ=; b=t10lP1XtEXcLhFatOm25WwMYhWOifjIkp49NUsu7pjvvp2GsgxzzD0r48iSokf+1HgjbEThrV eQs5WVLY9hQDOLd6c4Pe2MmdUE4F0K9gfOQRh/wDs8v7yjwCFVnjZNX X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The use of struct range in the CXL subsystem is growing. In particular, the addition of Dynamic Capacity devices uses struct range in a number of places which are reported in debug and error messages. To wit requiring the printing of the start/end fields in each print became cumbersome. Dan Williams mentions in [1] that it might be time to have a print specifier for struct range similar to struct resource A few alternatives were considered including '%pn' for 'print raNge' but %par follows that struct range is most often used to store a range of physical addresses. So use '%par' for 'print address range'. To: Petr Mladek (maintainer:VSPRINTF) To: Steven Rostedt (maintainer:VSPRINTF) To: Jonathan Corbet (maintainer:DOCUMENTATION) Cc: linux-doc@vger.kernel.org (open list:DOCUMENTATION) Cc: linux-kernel@vger.kernel.org (open list) Link: https://lore.kernel.org/all/663922b475e50_d54d72945b@dwillia2-xfh.jf.intel.com.notmuch/ [1] Suggested-by: "Dan Williams" Signed-off-by: Ira Weiny --- Documentation/core-api/printk-formats.rst | 14 ++++++++++++ lib/vsprintf.c | 37 +++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 4451ef501936..a02ef899b2a6 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -231,6 +231,20 @@ width of the CPU data path. Passed by reference. +Struct Range +------------ + +:: + + %par [range 0x60000000-0x6fffffff] or + [range 0x0000000060000000-0x000000006fffffff] + +For printing struct range. A variation of printing a physical address is to +print the value of struct range which are often used to hold a physical address +range. + +Passed by reference. + DMA address types dma_addr_t ---------------------------- diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 2d71b1115916..c132178fac07 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1140,6 +1140,39 @@ char *resource_string(char *buf, char *end, struct resource *res, return string_nocheck(buf, end, sym, spec); } +static noinline_for_stack +char *range_string(char *buf, char *end, const struct range *range, + struct printf_spec spec, const char *fmt) +{ +#define RANGE_PRINTK_SIZE 16 +#define RANGE_DECODED_BUF_SIZE ((2 * sizeof(struct range)) + 4) +#define RANGE_PRINT_BUF_SIZE sizeof("[range - ]") + char sym[RANGE_DECODED_BUF_SIZE + RANGE_PRINT_BUF_SIZE]; + char *p = sym, *pend = sym + sizeof(sym); + + static const struct printf_spec str_spec = { + .field_width = -1, + .precision = 10, + .flags = LEFT, + }; + static const struct printf_spec range_spec = { + .base = 16, + .field_width = RANGE_PRINTK_SIZE, + .precision = -1, + .flags = SPECIAL | SMALL | ZEROPAD, + }; + + *p++ = '['; + p = string_nocheck(p, pend, "range ", str_spec); + p = number(p, pend, range->start, range_spec); + *p++ = '-'; + p = number(p, pend, range->end, range_spec); + *p++ = ']'; + *p = '\0'; + + return string_nocheck(buf, end, sym, spec); +} + static noinline_for_stack char *hex_string(char *buf, char *end, u8 *addr, struct printf_spec spec, const char *fmt) @@ -1802,6 +1835,8 @@ char *address_val(char *buf, char *end, const void *addr, return buf; switch (fmt[1]) { + case 'r': + return range_string(buf, end, addr, spec, fmt); case 'd': num = *(const dma_addr_t *)addr; size = sizeof(dma_addr_t); @@ -2364,6 +2399,8 @@ char *rust_fmt_argument(char *buf, char *end, void *ptr); * to use print_hex_dump() for the larger input. * - 'a[pd]' For address types [p] phys_addr_t, [d] dma_addr_t and derivatives * (default assumed to be phys_addr_t, passed by reference) + * - 'ar' For decoded struct ranges (a variation of physical address which are + * most often stored in struct ranges. * - 'd[234]' For a dentry name (optionally 2-4 last components) * - 'D[234]' Same as 'd' but for a struct file * - 'g' For block_device name (gendisk + partition number) From patchwork Fri Aug 16 13:59:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766354 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 341211BD02A; Fri, 16 Aug 2024 14:00:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816802; cv=none; b=O3D4KKMGF2m10Teia87+vtu5iRtLhRCuVi2xVkjqql7h+NZKDzYo3TI+Y9GWWQ9vzyBgWUO0JJOuFQ2dXd2+jviLMq0cDJwEUoHdhjP/CVdOPDDstJPqm8g8Kj4Q/usy2fo5H8vESES4SWjvFuGwJ/8V8NNEAfwBn5WtuCaXSqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816802; c=relaxed/simple; bh=5VkggrFXyoRX+xFMFtPbi/ymdAmy5/e3QrX+vM16NIA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=mcFlwdYctN02N+Kte/4DkW39QqUZqP7wIbjAppAlyeb/0OsmLuT7VY6GxGu70C2WIXO1Lm34incXnxZ+8ObZGe21Mh/5FTWKxNZS9m8ROtt5EzvAijY/jChrUL6btQEwcdRnY8mCm5gcOH/xgt1535SMyAZtwrmZkw7oNYWj8Oo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=djO90WMC; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="djO90WMC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816800; x=1755352800; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=5VkggrFXyoRX+xFMFtPbi/ymdAmy5/e3QrX+vM16NIA=; b=djO90WMCwZDnFu63cg4oJhQroBAuNe7WxRzbhEpGRJCzKPZR5A8Vi5iO nQHFBheb97fNm8TaEmENJ4SjATKDfhJQQBa6/uPLMfYOgLwHp1C3FTtaM ZQoiJGWL947K+r9/E1WRf9WtDCTvxNqHJ+UMBooWUX6kD3TefzgBIbIPt aq0BTljnbOvB25nPmmBCqGzSoY5kxnPCEkw+6s0TXSQTIryjeajfWgRE0 8l8HQ04hxbEmusEbmQH+4Ot9rZ4wLBHLv7veIRWl6BPGcipUAC+Cc9eVr cR+07mheEa4XY4a/uuM6PH98vKGqzGbIT4s1WYsdlEkPGE+4sm9BSYSdI g==; X-CSE-ConnectionGUID: fDsUwqglRR+R0u+D+LJpNg== X-CSE-MsgGUID: E/mis/0sS0+Ud16B8cZCCA== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272743" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272743" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 06:59:59 -0700 X-CSE-ConnectionGUID: NBDIqoKNSIaWCBUqKWDG1A== X-CSE-MsgGUID: PP4BSPWuR0Wg/+QDvnDlUA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411049" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 06:59:59 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 08:59:51 -0500 Subject: [PATCH v2 03/25] dax: Document dax dev range tuple Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-3-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=1068; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=5VkggrFXyoRX+xFMFtPbi/ymdAmy5/e3QrX+vM16NIA=; b=dkW/0VO1hCLBbRJkyg2McAXvf0uF3ddKfBVVLec1uHI5zCf8xQBbxAIhx+K2xf7JU8E6T0lqW 9GVerPg5S4aBaLp1e0DaCFAfzxAesm+EfnGQ5nn57l66Xsns7fcKz3a X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The device DAX structure is being enhanced to track additional DCD information. The current range tuple was not fully documented. Document it prior to adding information for DC. Suggested-by: Jonathan Cameron Signed-off-by: Ira Weiny --- Changes: [iweiny: move to start of series] --- drivers/dax/dax-private.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 446617b73aea..ccde98c3d4e2 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -58,7 +58,10 @@ struct dax_mapping { * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) * @nr_range: size of @ranges - * @ranges: resource-span + pgoff tuples for the instance + * @ranges: range tuples of memory used + * @pgoff: page offset + * @range: resource-span + * @mapping: device to assist in interrogating the range layout */ struct dev_dax { struct dax_region *region; From patchwork Fri Aug 16 13:59:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766355 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6F4F1BDA85; Fri, 16 Aug 2024 14:00:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816803; cv=none; b=A/3kumCgpJXCcAKBZmwbSPQP0YvisSrTdHLzh3Rl+Cj7VPFN8dscgjfNFmNAVDP4I7MwtwTVIZ2pPfqspRoQ3hODTa5NF4VW1FGY81k4cQ7h/vgDU/fxMjsa9WFPa/4QPweTQuoFlLbYcWCAbW9bdIGI9bsy2I26O0L5O8ks1Js= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816803; c=relaxed/simple; bh=Zo1dZh5pzHpeeydz3y04EF5edS3G2yB/CM8mLx26TMA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=h1qRzdviKdHgNd96heCOE4a4Uedi1PwssdJnt1GxbOr0hYMWCb9ebKoswUkiptZKcxO3vWpsnUUnZNW3XJdDGP6ZhFcU3o/sAfs3pRRL3cZlgNOrhoRS4pWnvWa41syzBBSasUKr7MB9NtZxvU2rmv6Y38CLtZRT5PhNkrioVjE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ntwaOafp; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ntwaOafp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816802; x=1755352802; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=Zo1dZh5pzHpeeydz3y04EF5edS3G2yB/CM8mLx26TMA=; b=ntwaOafp97QJiJD605xijmWVVeomguRIRXVjXm6J62J7OydsyP9HFWP2 tdopYHjAuJooWW6Z2JJaaHWpF/NIZQUorl4J8q5YgloSHGFWqa4j0wOP4 L9nn52sa97rtCkP75thjrs5w9nZcw+30HX1y2X12RywmZ1F0zpwXkMywE AWMPDC+CvGVU+pChkExwmW2Q661I4fpciFEoufcs1Gx0ktLBm/LVc/gWQ C08TgBkRBEZ/vv6A1TAW29OmdewgR9H0ibK4Ic74GQ+IWUY72yGWy0slw Lu/i4j8AJDkFpLUa470mcBzbZ+oPZxzHYemlE5pJ6WcOYk6+ioBHlCjol Q==; X-CSE-ConnectionGUID: ZJTwEX79RjCxKa7zgk+d/A== X-CSE-MsgGUID: yuyKeXoEQKiA6qnIZv9t9w== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272749" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272749" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:01 -0700 X-CSE-ConnectionGUID: xyrRTVkkSl+0LhBrzebYXw== X-CSE-MsgGUID: IIqi6/ZQQ7Sj/Zm+FOLXNw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411076" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:01 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 08:59:52 -0500 Subject: [PATCH v2 04/25] cxl/pci: Delay event buffer allocation Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-4-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=1342; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=Zo1dZh5pzHpeeydz3y04EF5edS3G2yB/CM8mLx26TMA=; b=m0c9vfIqai24h7En9QehZLv94Qt2R152vZHbIToQei9Y4kC8nA8VtHdJ34aJ3sStcS4QnYCnS yihWHrLKsLQAIGHMn8es4Fmk/P4mP7D0sL1nzTy0dRQmDl+s4oPeV1t X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= The event buffer does not need to be allocated if something has failed in setting up event irq's. In prep for adjusting event configuration for DCD events move the buffer allocation to the end of the event configuration. Reviewed-by: Davidlohr Bueso Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Signed-off-by: Ira Weiny --- Changes: [iweiny: keep tags for early simple patch] [Davidlohr, Jonathan, djiang: move to beginning of series] [Dave feel free to pick this up if you like] --- drivers/cxl/pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 4be35dc22202..3a60cd66263e 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -760,10 +760,6 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, return 0; } - rc = cxl_mem_alloc_event_buf(mds); - if (rc) - return rc; - rc = cxl_event_get_int_policy(mds, &policy); if (rc) return rc; @@ -777,6 +773,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, return -EBUSY; } + rc = cxl_mem_alloc_event_buf(mds); + if (rc) + return rc; + rc = cxl_event_irqsetup(mds); if (rc) return rc; From patchwork Fri Aug 16 13:59:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766356 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C63EF1BE25F; Fri, 16 Aug 2024 14:00:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816806; cv=none; b=Asj6RjVsGlsdQY8AOUk5bqU8GqnQnUWMeklyIGrcE4hqzTqSWwOr4XudDfhzuS05bOwZpiaGq4AH4mrjIhpF5/FYuthyumxUUvsh9gV8Er7tQJ5h3U7bZL16Rjpct2rg7XpK109C6lDH10prBkyHcmt3n11HUDdVtqu2rZyLRDM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816806; c=relaxed/simple; bh=8wy1WjU9CNPvPdWcgOrXbyZ3TIOLulT87nDgCkeX2B0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=k1JUngBAmpgfIf+OESk3TbnVfywM5yI6fo8bdeK44w8O4yefKaiJj9WuplwSXz9EtBRer/B8O5aXJi2yRQgG3AzngXQNHXwiQWrs48vDNQav39ULf0ynFALI7LX6Rkmuq3yUr32zgE/xFbzsdSplLJanP+06D6mh3kU6jD/obL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iYZLDg5D; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iYZLDg5D" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816804; x=1755352804; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=8wy1WjU9CNPvPdWcgOrXbyZ3TIOLulT87nDgCkeX2B0=; b=iYZLDg5D6PQtGXxBhVwJ5lZaCvFFkAQ4PGiSio/tTXwK3T/2squH5O6J QZxuVzFVTwYYPi77dXp4WxNUZs8PBSs5ONKt28OPq+Qipog8IN/PyVg3y /G71zFmWOHh1Rg+/Cf522NPlx5O5MXwdrEvHeGRb5HzHNEdDoDa5xqDst 96ohLTNaYNUJo8z47nE+CyqTS/vFRAHX5AH42dSAGhDCG0OjhZk0vXnU2 9dehlEbRQ+WafAUNop9Kk7Lkimc1YXI0U+wPgEUE9EkB0/J8cpLDa5t36 bhOj0a9hxuK6v2933J1DEf+Xs9SF5ZKolXqWbHuUqhLgfSXRKNCNoHf5O g==; X-CSE-ConnectionGUID: VuV8zn7UT2qzOWP3z6z+Ag== X-CSE-MsgGUID: aB6d/sqSTtSLPZBtb+DGeQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272756" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272756" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:04 -0700 X-CSE-ConnectionGUID: WmoM4q/mQcCXT4kia2Xn2Q== X-CSE-MsgGUID: lz4IIC3fRpmOMOYsb3b42g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411110" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:03 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 08:59:53 -0500 Subject: [PATCH v2 05/25] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-5-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=4080; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=LjgTxkFr7jnH3Md+nkl6CpfcBpONEhar25gf52LYHtk=; b=VpUhSZlhhgseA6r6n3LZdyMlmn5XCVXn5tcpn+/vpQotY8WMM4eFiiT1usxTPh3KPRevzylYl HrwHaKMlIxDAvyN1DaIe9bxmuJog5VVhYCqSKCHOC6Y0jj89IBA74Gr X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Per the CXL 3.1 specification software must check the Command Effects Log (CEL) for dynamic capacity command support. Detect support for the DCD commands while reading the CEL, including: Get DC Config Get DC Extent List Add DC Response Release DC Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Reviewed-by: Jonathan Cameron Reviewed-by: Fan Ni Reviewed-by: Dave Jiang Reviewed-by: Davidlohr Bueso Signed-off-by: Ira Weiny --- Changes: [iweiny: Keep tags for this early simple patch] [Davidlohr: update commit message] [djiang: Fix misalignment] --- drivers/cxl/core/mbox.c | 33 +++++++++++++++++++++++++++++++++ drivers/cxl/cxlmem.h | 15 +++++++++++++++ 2 files changed, 48 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index e5cdeafdf76e..8eb196858abe 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -164,6 +164,34 @@ static void cxl_set_security_cmd_enabled(struct cxl_security_state *security, } } +static bool cxl_is_dcd_command(u16 opcode) +{ +#define CXL_MBOX_OP_DCD_CMDS 0x48 + + return (opcode >> 8) == CXL_MBOX_OP_DCD_CMDS; +} + +static void cxl_set_dcd_cmd_enabled(struct cxl_memdev_state *mds, + u16 opcode) +{ + switch (opcode) { + case CXL_MBOX_OP_GET_DC_CONFIG: + set_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); + break; + case CXL_MBOX_OP_GET_DC_EXTENT_LIST: + set_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, mds->dcd_cmds); + break; + case CXL_MBOX_OP_ADD_DC_RESPONSE: + set_bit(CXL_DCD_ENABLED_ADD_RESPONSE, mds->dcd_cmds); + break; + case CXL_MBOX_OP_RELEASE_DC: + set_bit(CXL_DCD_ENABLED_RELEASE, mds->dcd_cmds); + break; + default: + break; + } +} + static bool cxl_is_poison_command(u16 opcode) { #define CXL_MBOX_OP_POISON_CMDS 0x43 @@ -745,6 +773,11 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel) enabled++; } + if (cxl_is_dcd_command(opcode)) { + cxl_set_dcd_cmd_enabled(mds, opcode); + enabled++; + } + dev_dbg(dev, "Opcode 0x%04x %s\n", opcode, enabled ? "enabled" : "unsupported by driver"); } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index afb53d058d62..f2f8b567e0e7 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -238,6 +238,15 @@ struct cxl_event_state { struct mutex log_lock; }; +/* Device enabled DCD commands */ +enum dcd_cmd_enabled_bits { + CXL_DCD_ENABLED_GET_CONFIG, + CXL_DCD_ENABLED_GET_EXTENT_LIST, + CXL_DCD_ENABLED_ADD_RESPONSE, + CXL_DCD_ENABLED_RELEASE, + CXL_DCD_ENABLED_MAX +}; + /* Device enabled poison commands */ enum poison_cmd_enabled_bits { CXL_POISON_ENABLED_LIST, @@ -454,6 +463,7 @@ struct cxl_dev_state { * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device) * @mbox_mutex: Mutex to synchronize mailbox access. * @firmware_version: Firmware version for the memory device. + * @dcd_cmds: List of DCD commands implemented by memory device * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only * @total_bytes: sum of all possible capacities @@ -482,6 +492,7 @@ struct cxl_memdev_state { size_t lsa_size; struct mutex mbox_mutex; /* Protects device mailbox and firmware */ char firmware_version[0x10]; + DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX); DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); u64 total_bytes; @@ -555,6 +566,10 @@ enum cxl_opcode { CXL_MBOX_OP_UNLOCK = 0x4503, CXL_MBOX_OP_FREEZE_SECURITY = 0x4504, CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE = 0x4505, + CXL_MBOX_OP_GET_DC_CONFIG = 0x4800, + CXL_MBOX_OP_GET_DC_EXTENT_LIST = 0x4801, + CXL_MBOX_OP_ADD_DC_RESPONSE = 0x4802, + CXL_MBOX_OP_RELEASE_DC = 0x4803, CXL_MBOX_OP_MAX = 0x10000 }; From patchwork Fri Aug 16 13:59:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766357 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D50C11BF30A; Fri, 16 Aug 2024 14:00:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816808; cv=none; b=YIfke34blCZFegTL+TrmnYBwjmi2aoomuuEdeufojKd+UpPjZ/kA7bEMH3F4eFGeVe4+K4u/B4YCVg6rFL+HKhEE6b5msG50iA4y2VBzj81h/tnzLNP2YFBd3DbzdiJH8+Psiu9G+bDFSpbTo4btt2zHyKpdfmpjoIHWusJIj8A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816808; c=relaxed/simple; bh=egcD3lR4oZN+VTi/a9thY4PD5BV74GFGgjstT8QifYI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=RhoEnLHGImG30qgcv/5RFIV8Dld4ix8FLa6u+n/t65zzlpt7aB5pt5Brv1XTBmcbvahYdn65WL3acfz01nEbePUd2BbH2ZaxxI5+l04gL3MR8N4qvKMLt0xoAdrCu4crYcGtoAAZ2sziOcUl0XDuBUTTpF4FOiUap76FQc7UeWE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ahNBdMG0; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ahNBdMG0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816807; x=1755352807; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=egcD3lR4oZN+VTi/a9thY4PD5BV74GFGgjstT8QifYI=; b=ahNBdMG0EH6MrTdJHodm+uRtOMo92aOaGMhE5cATMGRbfjdh5fnyIsKQ +Hn9p+M3AmVvOxqNoa11Nj6rwF+mTl+GflG7Q3WeaVkOu+LyoP42GHUsi 8kPicxTioXe9FcSaklqViukhjAUTCIukvnKksQ/lysPa554cNn3Yzm8YK FA64lblsl7ZxVZIotKln2x9LnAcgP4G6DbM5RjzB0y78L79tFpbAcgeTJ 8x1iMCy/ihLS8WWJX3+PQnjIBNzI2yND7BGBaPLSgErFFS+PEwfm2yWjx B7F2vTVjQ8G09n3LGsDgOBOwXcgkt4enYJtu/0Ep0vIc1lTOlbxMNo9FB A==; X-CSE-ConnectionGUID: zSmqKtQSREKhu4UXh/JPeA== X-CSE-MsgGUID: RuwRvToFThyGCFfwcxd2tg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272762" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272762" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:05 -0700 X-CSE-ConnectionGUID: Eypep2JzQrSikK9M2sN5LA== X-CSE-MsgGUID: jyjn0B36QsOL2uiMkkYfyQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411146" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:05 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 08:59:54 -0500 Subject: [PATCH v2 06/25] cxl/mem: Read dynamic capacity configuration from the device Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-6-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, "Li, Ming" X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=14051; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=dyT9gfHNrKowL18Q/g5tjcanjZuTftUOgqbPwuMhfYE=; b=j4AQuQH3I/hEBUNl8jkFjf2s5puBMP+IMCk9Q2qqei4FNdYG4Fl1pCfpWBpZBMzW2gBdMQgMF f9mAYnV+8tDDMdQfabJTk0LlaMP/v4qgUfPwfjl2dFaCL2i3c02lr7x X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Devices which optionally support Dynamic Capacity (DC) are configured via mailbox commands. CXL 3.1 requires the host to issue the Get DC Configuration command in order to properly configure DCDs. Without the Get DC Configuration command DCD can't be supported. Implement the DC mailbox commands as specified in CXL 3.1 section 8.2.9.9.9 (opcodes 48XXh) to read and store the DCD configuration information. Disable DCD if DCD is not supported. Leverage the Get DC Configuration command supported bit to indicate if DCD support. Linux has no use for the trailing fields of the Get Dynamic Capacity Configuration Output Payload (Total number of supported extents, number of available extents, total number of supported tags, and number of available tags). Avoid defining those fields to use the more useful dynamic C array. Cc: "Li, Ming" Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [Li, Ming: Fix bug in total_bytes calculation] [iweiny: update commit message] [Jonathan: fix formatting] [Jonathan: Define block line size] [Jonathan/Fan: use regions returned field instead of macro in get config] [Jørgen: Rename memdev state range variables] [Jonathan: adjust use of rc in cxl_dev_dynamic_capacity_identify()] [Jonathan: white space cleanup] [fan: make a comment about the trailing configuration output fields] --- drivers/cxl/core/mbox.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxlmem.h | 64 +++++++++++++++++- drivers/cxl/pci.c | 4 ++ 3 files changed, 237 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 8eb196858abe..68c26c4be91a 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1157,7 +1157,7 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds) if (rc < 0) return rc; - mds->total_bytes = + mds->static_bytes = le64_to_cpu(id.total_capacity) * CXL_CAPACITY_MULTIPLIER; mds->volatile_only_bytes = le64_to_cpu(id.volatile_capacity) * CXL_CAPACITY_MULTIPLIER; @@ -1264,6 +1264,159 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) return rc; } +static int cxl_dc_save_region_info(struct cxl_memdev_state *mds, u8 index, + struct cxl_dc_region_config *region_config) +{ + struct cxl_dc_region_info *dcr = &mds->dc_region[index]; + struct device *dev = mds->cxlds.dev; + + dcr->base = le64_to_cpu(region_config->region_base); + dcr->decode_len = le64_to_cpu(region_config->region_decode_length); + dcr->decode_len *= CXL_CAPACITY_MULTIPLIER; + dcr->len = le64_to_cpu(region_config->region_length); + dcr->blk_size = le64_to_cpu(region_config->region_block_size); + dcr->dsmad_handle = le32_to_cpu(region_config->region_dsmad_handle); + dcr->flags = region_config->flags; + snprintf(dcr->name, CXL_DC_REGION_STRLEN, "dc%d", index); + + /* Check regions are in increasing DPA order */ + if (index > 0) { + struct cxl_dc_region_info *prev_dcr = &mds->dc_region[index - 1]; + + if ((prev_dcr->base + prev_dcr->decode_len) > dcr->base) { + dev_err(dev, + "DPA ordering violation for DC region %d and %d\n", + index - 1, index); + return -EINVAL; + } + } + + if (!IS_ALIGNED(dcr->base, SZ_256M) || + !IS_ALIGNED(dcr->base, dcr->blk_size)) { + dev_err(dev, "DC region %d invalid base %#llx blk size %#llx\n", + index, dcr->base, dcr->blk_size); + return -EINVAL; + } + + if (dcr->decode_len == 0 || dcr->len == 0 || dcr->decode_len < dcr->len || + !IS_ALIGNED(dcr->len, dcr->blk_size)) { + dev_err(dev, "DC region %d invalid length; decode %#llx len %#llx blk size %#llx\n", + index, dcr->decode_len, dcr->len, dcr->blk_size); + return -EINVAL; + } + + if (dcr->blk_size == 0 || dcr->blk_size % CXL_DCD_BLOCK_LINE_SIZE || + !is_power_of_2(dcr->blk_size)) { + dev_err(dev, "DC region %d invalid block size; %#llx\n", + index, dcr->blk_size); + return -EINVAL; + } + + dev_dbg(dev, + "DC region %s base %#llx length %#llx block size %#llx\n", + dcr->name, dcr->base, dcr->decode_len, dcr->blk_size); + + return 0; +} + +/* Returns the number of regions in dc_resp or -ERRNO */ +static int cxl_get_dc_config(struct cxl_memdev_state *mds, u8 start_region, + struct cxl_mbox_get_dc_config_out *dc_resp, + size_t dc_resp_size) +{ + struct cxl_mbox_get_dc_config_in get_dc = (struct cxl_mbox_get_dc_config_in) { + .region_count = CXL_MAX_DC_REGION, + .start_region_index = start_region, + }; + struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = CXL_MBOX_OP_GET_DC_CONFIG, + .payload_in = &get_dc, + .size_in = sizeof(get_dc), + .size_out = dc_resp_size, + .payload_out = dc_resp, + .min_out = 1, + }; + struct device *dev = mds->cxlds.dev; + int rc; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc < 0) + return rc; + + dev_dbg(dev, "Read %d/%d DC regions\n", + dc_resp->regions_returned, dc_resp->avail_region_count); + return dc_resp->regions_returned; +} + +/** + * cxl_dev_dynamic_capacity_identify() - Reads the dynamic capacity + * information from the device. + * @mds: The memory device state + * + * Read Dynamic Capacity information from the device and populate the state + * structures for later use. + * + * Return: 0 if identify was executed successfully, -ERRNO on error. + */ +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds) +{ + size_t dc_resp_size = mds->payload_size; + struct device *dev = mds->cxlds.dev; + u8 start_region, i; + + for (i = 0; i < CXL_MAX_DC_REGION; i++) + snprintf(mds->dc_region[i].name, CXL_DC_REGION_STRLEN, ""); + + if (!cxl_dcd_supported(mds)) { + dev_dbg(dev, "DCD not supported\n"); + return 0; + } + + struct cxl_mbox_get_dc_config_out *dc_resp __free(kfree) = + kvmalloc(dc_resp_size, GFP_KERNEL); + if (!dc_resp) + return -ENOMEM; + + start_region = 0; + do { + int rc, j; + + rc = cxl_get_dc_config(mds, start_region, dc_resp, dc_resp_size); + if (rc < 0) { + dev_dbg(dev, "Failed to get DC config: %d\n", rc); + return rc; + } + + mds->nr_dc_region += rc; + + if (mds->nr_dc_region < 1 || mds->nr_dc_region > CXL_MAX_DC_REGION) { + dev_err(dev, "Invalid num of dynamic capacity regions %d\n", + mds->nr_dc_region); + return -EINVAL; + } + + for (i = start_region, j = 0; i < mds->nr_dc_region; i++, j++) { + rc = cxl_dc_save_region_info(mds, i, &dc_resp->region[j]); + if (rc) { + dev_dbg(dev, "Failed to save region info: %d\n", rc); + return rc; + } + } + + start_region = mds->nr_dc_region; + + } while (mds->nr_dc_region < dc_resp->avail_region_count); + + mds->dynamic_bytes = + mds->dc_region[mds->nr_dc_region - 1].base + + mds->dc_region[mds->nr_dc_region - 1].decode_len - + mds->dc_region[0].base; + dev_dbg(dev, "Total dynamic range: %#llx\n", mds->dynamic_bytes); + + return 0; +} +EXPORT_SYMBOL_NS_GPL(cxl_dev_dynamic_capacity_identify, CXL); + static int add_dpa_res(struct device *dev, struct resource *parent, struct resource *res, resource_size_t start, resource_size_t size, const char *type) @@ -1294,8 +1447,15 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) { struct cxl_dev_state *cxlds = &mds->cxlds; struct device *dev = cxlds->dev; + size_t untenanted_mem; int rc; + mds->total_bytes = mds->static_bytes; + if (mds->nr_dc_region) { + untenanted_mem = mds->dc_region[0].base - mds->static_bytes; + mds->total_bytes += untenanted_mem + mds->dynamic_bytes; + } + if (!cxlds->media_ready) { cxlds->dpa_res = DEFINE_RES_MEM(0, 0); cxlds->ram_res = DEFINE_RES_MEM(0, 0); @@ -1305,6 +1465,15 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); + for (int i = 0; i < mds->nr_dc_region; i++) { + struct cxl_dc_region_info *dcr = &mds->dc_region[i]; + + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->dc_res[i], + dcr->base, dcr->decode_len, dcr->name); + if (rc) + return rc; + } + if (mds->partition_align_bytes == 0) { rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, mds->volatile_only_bytes, "ram"); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index f2f8b567e0e7..b4eb8164d05d 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -402,6 +402,7 @@ enum cxl_devtype { CXL_DEVTYPE_CLASSMEM, }; +#define CXL_MAX_DC_REGION 8 /** * struct cxl_dpa_perf - DPA performance property entry * @dpa_range: range for DPA address @@ -431,6 +432,8 @@ struct cxl_dpa_perf { * @dpa_res: Overall DPA resource tree for the device * @pmem_res: Active Persistent memory capacity configuration * @ram_res: Active Volatile memory capacity configuration + * @dc_res: Active Dynamic Capacity memory configuration for each possible + * region * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device */ @@ -445,10 +448,22 @@ struct cxl_dev_state { struct resource dpa_res; struct resource pmem_res; struct resource ram_res; + struct resource dc_res[CXL_MAX_DC_REGION]; u64 serial; enum cxl_devtype type; }; +#define CXL_DC_REGION_STRLEN 8 +struct cxl_dc_region_info { + u64 base; + u64 decode_len; + u64 len; + u64 blk_size; + u32 dsmad_handle; + u8 flags; + u8 name[CXL_DC_REGION_STRLEN]; +}; + /** * struct cxl_memdev_state - Generic Type-3 Memory Device Class driver data * @@ -466,7 +481,9 @@ struct cxl_dev_state { * @dcd_cmds: List of DCD commands implemented by memory device * @enabled_cmds: Hardware commands found enabled in CEL. * @exclusive_cmds: Commands that are kernel-internal only - * @total_bytes: sum of all possible capacities + * @total_bytes: length of all possible capacities + * @static_bytes: length of possible static RAM and PMEM partitions + * @dynamic_bytes: length of possible DC partitions (DC Regions) * @volatile_only_bytes: hard volatile capacity * @persistent_only_bytes: hard persistent capacity * @partition_align_bytes: alignment size for partition-able capacity @@ -476,6 +493,8 @@ struct cxl_dev_state { * @next_persistent_bytes: persistent capacity change pending device reset * @ram_perf: performance data entry matched to RAM partition * @pmem_perf: performance data entry matched to PMEM partition + * @nr_dc_region: number of DC regions implemented in the memory device + * @dc_region: array containing info about the DC regions * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -496,6 +515,8 @@ struct cxl_memdev_state { DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); u64 total_bytes; + u64 static_bytes; + u64 dynamic_bytes; u64 volatile_only_bytes; u64 persistent_only_bytes; u64 partition_align_bytes; @@ -507,6 +528,9 @@ struct cxl_memdev_state { struct cxl_dpa_perf ram_perf; struct cxl_dpa_perf pmem_perf; + u8 nr_dc_region; + struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; + struct cxl_event_state event; struct cxl_poison_state poison; struct cxl_security_state security; @@ -709,6 +733,32 @@ struct cxl_mbox_set_partition_info { #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0) +/* See CXL 3.1 Table 8-163 get dynamic capacity config Input Payload */ +struct cxl_mbox_get_dc_config_in { + u8 region_count; + u8 start_region_index; +} __packed; + +/* See CXL 3.1 Table 8-164 get dynamic capacity config Output Payload */ +struct cxl_mbox_get_dc_config_out { + u8 avail_region_count; + u8 regions_returned; + u8 rsvd[6]; + /* See CXL 3.1 Table 8-165 */ + struct cxl_dc_region_config { + __le64 region_base; + __le64 region_decode_length; + __le64 region_length; + __le64 region_block_size; + __le32 region_dsmad_handle; + u8 flags; + u8 rsvd[3]; + } __packed region[]; + /* Trailing fields unused */ +} __packed; +#define CXL_DYNAMIC_CAPACITY_SANITIZE_ON_RELEASE_FLAG BIT(0) +#define CXL_DCD_BLOCK_LINE_SIZE 0x40 + /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */ struct cxl_mbox_set_timestamp_in { __le64 timestamp; @@ -832,6 +882,7 @@ enum { int cxl_internal_send_cmd(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd); int cxl_dev_state_identify(struct cxl_memdev_state *mds); +int cxl_dev_dynamic_capacity_identify(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); int cxl_mem_create_range_info(struct cxl_memdev_state *mds); @@ -845,6 +896,17 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, enum cxl_event_log_type type, enum cxl_event_type event_type, const uuid_t *uuid, union cxl_event *evt); + +static inline bool cxl_dcd_supported(struct cxl_memdev_state *mds) +{ + return test_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); +} + +static inline void cxl_disable_dcd(struct cxl_memdev_state *mds) +{ + clear_bit(CXL_DCD_ENABLED_GET_CONFIG, mds->dcd_cmds); +} + int cxl_set_timestamp(struct cxl_memdev_state *mds); int cxl_poison_state_init(struct cxl_memdev_state *mds); int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 3a60cd66263e..f7f03599bc83 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -874,6 +874,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; + rc = cxl_dev_dynamic_capacity_identify(mds); + if (rc) + cxl_disable_dcd(mds); + rc = cxl_mem_create_range_info(mds); if (rc) return rc; From patchwork Fri Aug 16 13:59:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766358 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8522A1BC09F; Fri, 16 Aug 2024 14:00:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816810; cv=none; b=BOXZnai23NVDfYICimNGBWCzKyAO7qqAOma2FP1o5ffmhD/LkwxLmUldWn0xOqlmdHa697v+QrqhjuaFhf+FAue8afBQ+tdtdaXmHHdKoQ+E6BsIve8LKs71CM7Au8texNoVTQPMD9TuhE1Lrp25UkM3RLjzqDCckbAsLt+z7TE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816810; c=relaxed/simple; bh=N4lF7H6SxQ+yssoa1SQxppiwbjyD+JzY4faHAfQkGCA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=P/yxCG0GITlfsYQSKlS6KFK4HAr1QO+GoIjyFOrFfbuzOShvueLBc37vbJsblXbKNnG9ZrIiblv1B0rZgMEBVr7ATrcyN87+ch0Tv7WudoCoj1oNnPeDy57L6TWaz3r7TFjN5FWasEwTiD4q+8nc6rk2jnvN2bMkbX1hmeJUAZU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lTtJFmTz; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lTtJFmTz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816808; x=1755352808; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=N4lF7H6SxQ+yssoa1SQxppiwbjyD+JzY4faHAfQkGCA=; b=lTtJFmTzOU2qCkY5XnZws75Erin1nAT4VHiAOI6LL/B67cOVVAPuBZeV uxCGJcccnLWKFa3gptcnxPSnXKuq8Tl3u/PMwnv7tWm8/a9YJG2/wtf3e dXo4vkz3QefadWt5AHd5jdXrXWA+CHxlyvUwDXe+RXXI0pW7SvS4hcAOi nBqGBtOxRP97ypopBW3AQ0hq51IdKUVpmygtIkgaPsRhNCvUtHij7F/ud qVqi4PNfoX96Nz9SB4pX2foJwOdwM7ArK3VSfnTSqpge3HZtL0o2dq2oL YHcGT4gqdpStj0A+AeHvCIqdT9Yu7hKhNjOrreGv4JgRznoYBgX9JjRjA g==; X-CSE-ConnectionGUID: AdtkRAUTQw+44bUC48krUA== X-CSE-MsgGUID: 8c+Kr31dTE6X1e/luE/tjA== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272770" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272770" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:08 -0700 X-CSE-ConnectionGUID: Bqal0j15SxCqc9LYKvmelQ== X-CSE-MsgGUID: 1tOF0NDFSyeeI4SX6ecXug== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411184" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:07 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 08:59:55 -0500 Subject: [PATCH v2 07/25] cxl/core: Separate region mode from decoder mode Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-7-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Jonathan Cameron X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=9613; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=WLipwvq9440B92e192SQh8N4RKAgn5laswj+ww4UXFU=; b=drei73NyZ0C3dQPRKzl26DOZ+6BMPbOjZ1+MG4VwaITa13efRf/I/mfUICKxy0Cc/rSKJACGM FN+ij/46iw/DJIZu3vyMHK9ZJ6K470J1JfHe4LmNYW4vRXgzXDQ2Ujx X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Until now region modes and decoder modes were equivalent in that both modes were either PMEM or RAM. The addition of Dynamic Capacity partitions defines up to 8 DC partitions per device. The region mode is thus no longer equivalent to the endpoint decoder mode. IOW the endpoint decoders may have modes of DC0-DC7 while the region mode is simply DC. Define a new region mode enumeration which applies to regions separate from the decoder mode. Adjust the code to process these modes independently. There is no equal to decoder mode dead in region modes. Avoid constructing regions with decoders which have been flagged as dead. Suggested-by: Jonathan Cameron Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [iweiny: rebase] [Jonathan: remove dead code] [Jonathan: clarify commit message] --- drivers/cxl/core/region.c | 75 ++++++++++++++++++++++++++++++++++------------- drivers/cxl/cxl.h | 26 ++++++++++++++-- 2 files changed, 79 insertions(+), 22 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 971a314b6b0e..796e5a791e44 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -144,7 +144,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, rc = down_read_interruptible(&cxl_region_rwsem); if (rc) return rc; - if (cxlr->mode != CXL_DECODER_PMEM) + if (cxlr->mode != CXL_REGION_PMEM) rc = sysfs_emit(buf, "\n"); else rc = sysfs_emit(buf, "%pUb\n", &p->uuid); @@ -457,7 +457,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, * Support tooling that expects to find a 'uuid' attribute for all * regions regardless of mode. */ - if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM) + if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_REGION_PMEM) return 0444; return a->mode; } @@ -620,7 +620,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, { struct cxl_region *cxlr = to_cxl_region(dev); - return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode)); + return sysfs_emit(buf, "%s\n", cxl_region_mode_name(cxlr->mode)); } static DEVICE_ATTR_RO(mode); @@ -646,7 +646,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) /* ways, granularity and uuid (if PMEM) need to be set before HPA */ if (!p->interleave_ways || !p->interleave_granularity || - (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid))) + (cxlr->mode == CXL_REGION_PMEM && uuid_is_null(&p->uuid))) return -ENXIO; div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder); @@ -1863,6 +1863,17 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) return rc; } +static bool cxl_modes_compatible(enum cxl_region_mode rmode, + enum cxl_decoder_mode dmode) +{ + if (rmode == CXL_REGION_RAM && dmode == CXL_DECODER_RAM) + return true; + if (rmode == CXL_REGION_PMEM && dmode == CXL_DECODER_PMEM) + return true; + + return false; +} + static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { @@ -1882,9 +1893,11 @@ static int cxl_region_attach(struct cxl_region *cxlr, return rc; } - if (cxled->mode != cxlr->mode) { - dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", - dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); + if (!cxl_modes_compatible(cxlr->mode, cxled->mode)) { + dev_dbg(&cxlr->dev, "%s region mode: %s mismatch decoder: %s\n", + dev_name(&cxled->cxld.dev), + cxl_region_mode_name(cxlr->mode), + cxl_decoder_mode_name(cxled->mode)); return -EINVAL; } @@ -2447,7 +2460,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb, * devm_cxl_add_region - Adds a region to a decoder * @cxlrd: root decoder * @id: memregion id to create, or memregion_free() on failure - * @mode: mode for the endpoint decoders of this region + * @mode: mode of this region * @type: select whether this is an expander or accelerator (type-2 or type-3) * * This is the second step of region initialization. Regions exist within an @@ -2458,7 +2471,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb, */ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, int id, - enum cxl_decoder_mode mode, + enum cxl_region_mode mode, enum cxl_decoder_type type) { struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); @@ -2512,16 +2525,17 @@ static ssize_t create_ram_region_show(struct device *dev, } static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, - enum cxl_decoder_mode mode, int id) + enum cxl_region_mode mode, int id) { int rc; switch (mode) { - case CXL_DECODER_RAM: - case CXL_DECODER_PMEM: + case CXL_REGION_RAM: + case CXL_REGION_PMEM: break; default: - dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); + dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n", + cxl_region_mode_name(mode)); return ERR_PTR(-EINVAL); } @@ -2549,7 +2563,7 @@ static ssize_t create_pmem_region_store(struct device *dev, if (rc != 1) return -EINVAL; - cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id); + cxlr = __create_region(cxlrd, CXL_REGION_PMEM, id); if (IS_ERR(cxlr)) return PTR_ERR(cxlr); @@ -2569,7 +2583,7 @@ static ssize_t create_ram_region_store(struct device *dev, if (rc != 1) return -EINVAL; - cxlr = __create_region(cxlrd, CXL_DECODER_RAM, id); + cxlr = __create_region(cxlrd, CXL_REGION_RAM, id); if (IS_ERR(cxlr)) return PTR_ERR(cxlr); @@ -3215,6 +3229,22 @@ static int match_region_by_range(struct device *dev, void *data) return rc; } +static enum cxl_region_mode +cxl_decoder_to_region_mode(enum cxl_decoder_mode mode) +{ + switch (mode) { + case CXL_DECODER_NONE: + return CXL_REGION_NONE; + case CXL_DECODER_RAM: + return CXL_REGION_RAM; + case CXL_DECODER_PMEM: + return CXL_REGION_PMEM; + case CXL_DECODER_MIXED: + default: + return CXL_REGION_MIXED; + } +} + /* Establish an empty region covering the given HPA range */ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled) @@ -3223,12 +3253,17 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, struct cxl_port *port = cxlrd_to_port(cxlrd); struct range *hpa = &cxled->cxld.hpa_range; struct cxl_region_params *p; + enum cxl_region_mode mode; struct cxl_region *cxlr; struct resource *res; int rc; + if (cxled->mode == CXL_DECODER_DEAD) + return ERR_PTR(-EINVAL); + + mode = cxl_decoder_to_region_mode(cxled->mode); do { - cxlr = __create_region(cxlrd, cxled->mode, + cxlr = __create_region(cxlrd, mode, atomic_read(&cxlrd->region_id)); } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY); @@ -3431,9 +3466,9 @@ static int cxl_region_probe(struct device *dev) return rc; switch (cxlr->mode) { - case CXL_DECODER_PMEM: + case CXL_REGION_PMEM: return devm_cxl_add_pmem_region(cxlr); - case CXL_DECODER_RAM: + case CXL_REGION_RAM: /* * The region can not be manged by CXL if any portion of * it is already online as 'System RAM' @@ -3445,8 +3480,8 @@ static int cxl_region_probe(struct device *dev) return 0; return devm_cxl_add_dax_region(cxlr); default: - dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", - cxlr->mode); + dev_dbg(&cxlr->dev, "unsupported region mode: %s\n", + cxl_region_mode_name(cxlr->mode)); return -ENXIO; } } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 9afb407d438f..f766b2a8bf53 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -388,6 +388,27 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +enum cxl_region_mode { + CXL_REGION_NONE, + CXL_REGION_RAM, + CXL_REGION_PMEM, + CXL_REGION_MIXED, +}; + +static inline const char *cxl_region_mode_name(enum cxl_region_mode mode) +{ + static const char * const names[] = { + [CXL_REGION_NONE] = "none", + [CXL_REGION_RAM] = "ram", + [CXL_REGION_PMEM] = "pmem", + [CXL_REGION_MIXED] = "mixed", + }; + + if (mode >= CXL_REGION_NONE && mode <= CXL_REGION_MIXED) + return names[mode]; + return "mixed"; +} + /* * Track whether this decoder is reserved for region autodiscovery, or * free for userspace provisioning. @@ -515,7 +536,8 @@ struct cxl_region_params { * struct cxl_region - CXL region * @dev: This region's device * @id: This region's id. Id is globally unique across all regions - * @mode: Endpoint decoder allocation / access mode + * @mode: Region mode which defines which endpoint decoder modes the region is + * compatible with * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge @@ -528,7 +550,7 @@ struct cxl_region_params { struct cxl_region { struct device dev; int id; - enum cxl_decoder_mode mode; + enum cxl_region_mode mode; enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; From patchwork Fri Aug 16 13:59:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766359 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2CE91BF33C; Fri, 16 Aug 2024 14:00:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816813; cv=none; b=UhSWEqulCULeouxbtJpyJC7n1oVJxkfjJ/JIhLiR/KAOhaG1cguVL5goCAUjvkmHQ8tQDTe1KESQcMkUXjRRnqTZRkWmfPSJp8fQYvb99giGPCpSu9GZGOFXXWNFUBTwXkjSSWrc1orp3gyJM0O7D2QFWhwmqvzLEYvwoOvQ0r8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816813; c=relaxed/simple; bh=SXcZK2p0dJOyPPTEdqGxENz2u5vBcK7fakaFc4Em/+c=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=LtpjYpaQzGH4FiOoVhXrOnDbApqricJKor6LRZpCFKpsGBrlI0coIHXodUAt+bVEANL8za1YTsG9y+4EFcY7x8i0+U+/E2UpOTvH6+UKasB5qO7N7xLHsQwzr6rkZlZgimQ1lDPOkULHcbqKU29ZGV2UW1wwuvrI5f8GDtbG2K4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fNV4gbj9; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fNV4gbj9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816810; x=1755352810; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=SXcZK2p0dJOyPPTEdqGxENz2u5vBcK7fakaFc4Em/+c=; b=fNV4gbj90DWor7MGSl6GowWS9YrpxZKzMWxcYQiBPvZv+TSqzWbuS2tv TWXW/qGuaDyJp14Ys1owubzlQiUl8aD8c+6sGZ57yvBiGEQjV4QKfULa1 Nax+Q59ov4/wg+9dE8qUm9v897Prw7NUuYLqvxoQxh79SPxXqifrtmdzB ts4GvlrlBKbjrRrmIwWMynsnr0PllucIORiF8NOV0y6SM9SLen5+GPLz9 4iWF6eN2R73nMTHbpOAu7lcaG1BLdWQ5WyJTY8DGNkyHLXc2R4OS/Pq5i PnO+26houPvo9ClnLNZlwZYgCle5rukjeteEGt5dzZFef04/rn89MAiaS w==; X-CSE-ConnectionGUID: jkz1tOtWRPa1MGyoCwlwJg== X-CSE-MsgGUID: mdGUGQmHSce3ct9dZ54AEw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272777" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272777" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:10 -0700 X-CSE-ConnectionGUID: CO8u84Y2ShakfMcLBZiZIw== X-CSE-MsgGUID: DailO3teSyyFxBkuIu2e4w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411214" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:09 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 08:59:56 -0500 Subject: [PATCH v2 08/25] cxl/region: Add dynamic capacity decoder and region modes Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-8-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=3348; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=gyPkNovHd6BmfRDNMkhSNaMujG9yjRUBp332DRpzPi4=; b=eujWv4jK+IbEOEIutl6Vi6/5PNpu3f4NzOmejmT3DzVgNi52/SHIKEE5ii2U7BeARQr6xONUs 375r8EhvXNAD/LgH2ZFtK81qHF2qRJh3ftiVzyCzKz0hnnmhwkat/oj X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh One or more decoders each pointing to a Dynamic Capacity (DC) partition form a CXL software region. The region mode reflects composition of that entire software region. Decoder mode reflects a specific DC partition. DC partitions are also known as DC regions per CXL specification r3.1. Define the new modes and helper functions required to make the association between these new modes. Reviewed-by: Jonathan Cameron Reviewed-by: Fan Ni Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [iweiny: keep tags on simple patch] [Fan: s/partitions/partition/] [djiang: New wording for the commit message] [iweiny: reword commit message more] --- drivers/cxl/core/region.c | 4 ++++ drivers/cxl/cxl.h | 23 +++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 796e5a791e44..650fe33f2ed4 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1870,6 +1870,8 @@ static bool cxl_modes_compatible(enum cxl_region_mode rmode, return true; if (rmode == CXL_REGION_PMEM && dmode == CXL_DECODER_PMEM) return true; + if (rmode == CXL_REGION_DC && cxl_decoder_mode_is_dc(dmode)) + return true; return false; } @@ -3239,6 +3241,8 @@ cxl_decoder_to_region_mode(enum cxl_decoder_mode mode) return CXL_REGION_RAM; case CXL_DECODER_PMEM: return CXL_REGION_PMEM; + case CXL_DECODER_DC0 ... CXL_DECODER_DC7: + return CXL_REGION_DC; case CXL_DECODER_MIXED: default: return CXL_REGION_MIXED; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index f766b2a8bf53..d2674ab46f35 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -370,6 +370,14 @@ enum cxl_decoder_mode { CXL_DECODER_NONE, CXL_DECODER_RAM, CXL_DECODER_PMEM, + CXL_DECODER_DC0, + CXL_DECODER_DC1, + CXL_DECODER_DC2, + CXL_DECODER_DC3, + CXL_DECODER_DC4, + CXL_DECODER_DC5, + CXL_DECODER_DC6, + CXL_DECODER_DC7, CXL_DECODER_MIXED, CXL_DECODER_DEAD, }; @@ -380,6 +388,14 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) [CXL_DECODER_NONE] = "none", [CXL_DECODER_RAM] = "ram", [CXL_DECODER_PMEM] = "pmem", + [CXL_DECODER_DC0] = "dc0", + [CXL_DECODER_DC1] = "dc1", + [CXL_DECODER_DC2] = "dc2", + [CXL_DECODER_DC3] = "dc3", + [CXL_DECODER_DC4] = "dc4", + [CXL_DECODER_DC5] = "dc5", + [CXL_DECODER_DC6] = "dc6", + [CXL_DECODER_DC7] = "dc7", [CXL_DECODER_MIXED] = "mixed", }; @@ -388,10 +404,16 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +static inline bool cxl_decoder_mode_is_dc(enum cxl_decoder_mode mode) +{ + return (mode >= CXL_DECODER_DC0 && mode <= CXL_DECODER_DC7); +} + enum cxl_region_mode { CXL_REGION_NONE, CXL_REGION_RAM, CXL_REGION_PMEM, + CXL_REGION_DC, CXL_REGION_MIXED, }; @@ -401,6 +423,7 @@ static inline const char *cxl_region_mode_name(enum cxl_region_mode mode) [CXL_REGION_NONE] = "none", [CXL_REGION_RAM] = "ram", [CXL_REGION_PMEM] = "pmem", + [CXL_REGION_DC] = "dc", [CXL_REGION_MIXED] = "mixed", }; From patchwork Fri Aug 16 13:59:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766360 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8A0D1C0DDE; Fri, 16 Aug 2024 14:00:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816815; cv=none; b=Z2t100HC9jo3X36QIaHIGP2RV75m/UY1t5jBtxmuicewH5oHPWj9s51p9qTy9csF0aM+aI8al5sChav6xxna+t1cYSX/7uOMqo6UPyI5ExDGbbFCN6HWw3FKrsWYBMesbR6m99SRk+TgL7jjNWC9A5B+swb3+r/wUjKvtJIMU98= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816815; c=relaxed/simple; bh=UlIUTikIqTYbOMobHHgRquFnNLyp8ufSJTSWoQiLhqQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=TbeIgGH2FgadYjknzAce5fqhY5Zf0YYygzFSr5ISUd49s5Wjv3scLr4sIag+AfPWfnq3vf4b6/RXXWfMeSDcDJqZfZwn9jfyaSsZUJVE78PjQYm8eW6XrfzRdUUfjBU+s5v4sJTHyC3Crz/LSqur1Ur+2q9qFYHYvX9ntb1ljsw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kg8q70y4; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kg8q70y4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816813; x=1755352813; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=UlIUTikIqTYbOMobHHgRquFnNLyp8ufSJTSWoQiLhqQ=; b=kg8q70y4KneqKurT9e77iuyPu8fWLgBMyh9V27HiF0Z8vOF6J/1CSZid Xq7DkHuhi6Vf/hfm6/G3Sn7Fp+aQxwxU6exl2HimmQ/vuCydJSUoa6n8y Tt1bROqF+lOm5M0sOnmZipizCZlfAuv2TNLgCZz6z7MB7ZIJwUdbDK+rk OXBuYK2jqkwM7UCpKZyT7m/3JgsrFx2yJdSVuTg6O9h4zWqWLoFgeonki vZFLsdrV27AdOHOkOmLDCKxRMZxirUqPXiEOs73f6bahKdA8sS+MzqqMf BkgchXikxnWbC8I33SpF4lcKzCI5zogpANfanWOrUXofR0LszkxOGW7wr Q==; X-CSE-ConnectionGUID: bRlWMAVvQFi7XxZ91Nh5kA== X-CSE-MsgGUID: f1tmnNKbSVO2EDj98mR2sQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272783" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272783" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:12 -0700 X-CSE-ConnectionGUID: 3UyRGFwjT6GfXi0eX0Dx9Q== X-CSE-MsgGUID: f2VFfSN3TrWtLnbg9WEvVQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411239" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:11 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 08:59:57 -0500 Subject: [PATCH v2 09/25] cxl/hdm: Add dynamic capacity size support to endpoint decoders Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-9-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=13466; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=cHwHzbrTO+tbp6sLkEiH+DzM+3UXKPCN2KIVRTu7fpo=; b=i8GBQmmcimOd9nCDWcPsRQHeX0gjKzxQfSfBAME7Kpf0jHg0+2+6uMG5+bhTQ65FnCNb0AFZT FPIbOnQjIvgA2ySai+d+RioJy44vi9260x32998hULR0b/ftMxolrnm X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh To support Dynamic Capacity Devices (DCD) endpoint decoders will need to map DC partitions (regions). In addition to assigning the size of the DC partition, the decoder must assign any skip value from the previous decoder. This must be done within a contiguous DPA space. Two complications arise with Dynamic Capacity regions which did not exist with Ram and PMEM partitions. First, gaps in the DPA space can exist between and around the DC partitions. Second, the Linux resource tree does not allow a resource to be marked across existing nodes within a tree. For clarity, below is an example of an 60GB device with 10GB of RAM, 10GB of PMEM and 10GB for each of 2 DC partitions. The desired CXL mapping is 5GB of RAM, 5GB of PMEM, and 5GB of DC1. DPA RANGE (dpa_res) 0GB 10GB 20GB 30GB 40GB 50GB 60GB |----------|----------|----------|----------|----------|----------| RAM PMEM DC0 DC1 (ram_res) (pmem_res) (dc_res[0]) (dc_res[1]) |----------|----------| |----------| |----------| RAM PMEM DC1 |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----| 0GB 5GB 10GB 15GB 20GB 30GB 40GB 50GB 60GB The previous skip resource between RAM and PMEM was always a child of the RAM resource and fit nicely [see (S) below]. Because of this simplicity this skip resource reference was not stored in any CXL state. On release the skip range could be calculated based on the endpoint decoders stored values. Now when DC1 is being mapped 4 skip resources must be created as children. One for the PMEM resource (A), two of the parent DPA resource (B,D), and one more child of the DC0 resource (C). 0GB 10GB 20GB 30GB 40GB 50GB 60GB |----------|----------|----------|----------|----------|----------| | | |----------|----------| | |----------| | |----------| | | | | | (S) (A) (B) (C) (D) v v v v v |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----| skip skip skip skip skip Expand the calculation of DPA free space and enhance the logic to support this more complex skipping. To track the potential of multiple skip resources an xarray is attached to the endpoint decoder. The existing algorithm between RAM and PMEM is consolidated within the new one to streamline the code even though the result is the storage of a single skip resource in the xarray. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [Jonathan: Use an example only mapping 1/2 of DC1] [iweiny: Update cover letter] [iweiny: Fix 0day bugs https://lore.kernel.org/all/202408090138.RB41yBE8-lkp@intel.com/ [djbw/Jonathan: allow more than 1 region per DC partition] --- drivers/cxl/core/hdm.c | 196 ++++++++++++++++++++++++++++++++++++++++++++---- drivers/cxl/core/port.c | 2 + drivers/cxl/cxl.h | 2 + 3 files changed, 184 insertions(+), 16 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 3df10517a327..b4a517c6d283 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -223,6 +223,25 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, CXL); +static void cxl_skip_release(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds; + struct cxl_port *port = cxled_to_port(cxled); + struct device *dev = &port->dev; + unsigned long index; + void *entry; + + xa_for_each(&cxled->skip_res, index, entry) { + struct resource *res = entry; + + dev_dbg(dev, "decoder%d.%d: releasing skipped space; %pr\n", + port->id, cxled->cxld.id, res); + __release_region(&cxlds->dpa_res, res->start, + resource_size(res)); + xa_erase(&cxled->skip_res, index); + } +} + /* * Must be called in a context that synchronizes against this decoder's * port ->remove() callback (like an endpoint decoder sysfs attribute) @@ -233,15 +252,11 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct resource *res = cxled->dpa_res; - resource_size_t skip_start; lockdep_assert_held_write(&cxl_dpa_rwsem); - /* save @skip_start, before @res is released */ - skip_start = res->start - cxled->skip; __release_region(&cxlds->dpa_res, res->start, resource_size(res)); - if (cxled->skip) - __release_region(&cxlds->dpa_res, skip_start, cxled->skip); + cxl_skip_release(cxled); cxled->skip = 0; cxled->dpa_res = NULL; put_device(&cxled->cxld.dev); @@ -268,6 +283,105 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) __cxl_dpa_release(cxled); } +static int dc_mode_to_region_index(enum cxl_decoder_mode mode) +{ + return mode - CXL_DECODER_DC0; +} + +static int cxl_request_skip(struct cxl_endpoint_decoder *cxled, + resource_size_t skip_base, resource_size_t skip_len) +{ + struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds; + const char *name = dev_name(&cxled->cxld.dev); + struct cxl_port *port = cxled_to_port(cxled); + struct resource *dpa_res = &cxlds->dpa_res; + struct device *dev = &port->dev; + struct resource *res; + int rc; + + res = __request_region(dpa_res, skip_base, skip_len, name, 0); + if (!res) + return -EBUSY; + + rc = xa_insert(&cxled->skip_res, skip_base, res, GFP_KERNEL); + if (rc) { + __release_region(dpa_res, skip_base, skip_len); + return rc; + } + + dev_dbg(dev, "decoder%d.%d: skipped space; %pr\n", + port->id, cxled->cxld.id, res); + return 0; +} + +static int cxl_reserve_dpa_skip(struct cxl_endpoint_decoder *cxled, + resource_size_t base, resource_size_t skipped) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *port = cxled_to_port(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + resource_size_t skip_base = base - skipped; + struct device *dev = &port->dev; + resource_size_t skip_len = 0; + int rc, index; + + if (resource_size(&cxlds->ram_res) && skip_base <= cxlds->ram_res.end) { + skip_len = cxlds->ram_res.end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + if (skip_base == base) { + dev_dbg(dev, "skip done ram!\n"); + return 0; + } + + if (resource_size(&cxlds->pmem_res) && + skip_base <= cxlds->pmem_res.end) { + skip_len = cxlds->pmem_res.end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + index = dc_mode_to_region_index(cxled->mode); + for (int i = 0; i <= index; i++) { + struct resource *dcr = &cxlds->dc_res[i]; + + if (skip_base < dcr->start) { + skip_len = dcr->start - skip_base; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + + if (skip_base == base) { + dev_dbg(dev, "skip done DC region %d!\n", i); + break; + } + + if (resource_size(dcr) && skip_base <= dcr->end) { + if (skip_base > base) { + dev_err(dev, "Skip error DC region %d; skip_base %pa; base %pa\n", + i, &skip_base, &base); + return -ENXIO; + } + + skip_len = dcr->end - skip_base + 1; + rc = cxl_request_skip(cxled, skip_base, skip_len); + if (rc) + return rc; + skip_base += skip_len; + } + } + + return 0; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -305,13 +419,12 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, } if (skipped) { - res = __request_region(&cxlds->dpa_res, base - skipped, skipped, - dev_name(&cxled->cxld.dev), 0); - if (!res) { - dev_dbg(dev, - "decoder%d.%d: failed to reserve skipped space\n", - port->id, cxled->cxld.id); - return -EBUSY; + int rc = cxl_reserve_dpa_skip(cxled, base, skipped); + + if (rc) { + dev_dbg(dev, "decoder%d.%d: failed to reserve skipped space; %pa - %pa\n", + port->id, cxled->cxld.id, &base, &skipped); + return rc; } } res = __request_region(&cxlds->dpa_res, base, len, @@ -319,14 +432,20 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, if (!res) { dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n", port->id, cxled->cxld.id); - if (skipped) - __release_region(&cxlds->dpa_res, base - skipped, - skipped); + cxl_skip_release(cxled); return -EBUSY; } cxled->dpa_res = res; cxled->skip = skipped; + for (int mode = CXL_DECODER_DC0; mode <= CXL_DECODER_DC7; mode++) { + int index = dc_mode_to_region_index(mode); + + if (resource_contains(&cxlds->dc_res[index], res)) { + cxled->mode = mode; + goto success; + } + } if (resource_contains(&cxlds->pmem_res, res)) cxled->mode = CXL_DECODER_PMEM; else if (resource_contains(&cxlds->ram_res, res)) @@ -337,6 +456,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->mode = CXL_DECODER_MIXED; } +success: + dev_dbg(dev, "decoder%d.%d: %pr mode: %d\n", port->id, cxled->cxld.id, + cxled->dpa_res, cxled->mode); port->hdm_end++; get_device(&cxled->cxld.dev); return 0; @@ -466,8 +588,8 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) { - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); resource_size_t free_ram_start, free_pmem_start; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; @@ -524,12 +646,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) else skip_end = start - 1; skip = skip_end - skip_start + 1; + } else if (cxl_decoder_mode_is_dc(cxled->mode)) { + int dc_index = dc_mode_to_region_index(cxled->mode); + + for (p = cxlds->dc_res[dc_index].child, last = NULL; p; p = p->sibling) + last = p; + + if (last) { + /* + * Some capacity in this DC partition is already allocated, + * that allocation already handled the skip. + */ + start = last->end + 1; + skip = 0; + } else { + /* Calculate skip */ + resource_size_t skip_start, skip_end; + + start = cxlds->dc_res[dc_index].start; + + if ((resource_size(&cxlds->pmem_res) == 0) || !cxlds->pmem_res.child) + skip_start = free_ram_start; + else + skip_start = free_pmem_start; + /* + * If any dc region is already mapped, then that allocation + * already handled the RAM and PMEM skip. Check for DC region + * skip. + */ + for (int i = dc_index - 1; i >= 0 ; i--) { + if (cxlds->dc_res[i].child) { + skip_start = cxlds->dc_res[i].child->end + 1; + break; + } + } + + skip_end = start - 1; + skip = skip_end - skip_start + 1; + } + avail = cxlds->dc_res[dc_index].end - start + 1; } else { dev_dbg(dev, "mode not set\n"); rc = -EINVAL; goto out; } + dev_dbg(dev, "DPA Allocation start: %pa len: %#llx Skip: %pa\n", + &start, size, &skip); + if (size > avail) { dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, cxl_decoder_mode_name(cxled->mode), &avail); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 1d5007e3795a..8054cbaac9f6 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -419,6 +419,7 @@ static void cxl_endpoint_decoder_release(struct device *dev) struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); __cxl_decoder_release(&cxled->cxld); + xa_destroy(&cxled->skip_res); kfree(cxled); } @@ -1899,6 +1900,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) return ERR_PTR(-ENOMEM); cxled->pos = -1; + xa_init(&cxled->skip_res); cxld = &cxled->cxld; rc = cxl_decoder_init(port, cxld); if (rc) { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index d2674ab46f35..53b666ef4097 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -446,6 +446,7 @@ enum cxl_decoder_state { * @cxld: base cxl_decoder_object * @dpa_res: actively claimed DPA span of this decoder * @skip: offset into @dpa_res where @cxld.hpa_range maps + * @skip_res: array of skipped resources from the previous decoder end * @mode: which memory type / access-mode-partition this decoder targets * @state: autodiscovery state * @pos: interleave position in @cxld.region @@ -454,6 +455,7 @@ struct cxl_endpoint_decoder { struct cxl_decoder cxld; struct resource *dpa_res; resource_size_t skip; + struct xarray skip_res; enum cxl_decoder_mode mode; enum cxl_decoder_state state; int pos; From patchwork Fri Aug 16 13:59:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766361 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 606451C2301; Fri, 16 Aug 2024 14:00:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816816; cv=none; b=jl5AD29mFfVjGRDaBrmVFQjlqdqEb4Bla556qIQi62QCZf79v+pyPgng9OdJ10iDDX9MjYAZuLA6ugI6/w6VZbYxaHhigqaH+wA1q/2no2u3sbPkKZrnKf3WXg83BSiPsCdpi4YNdfdFNH9msNjljRbpShyIo0Vd6OT8mTWdbLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816816; c=relaxed/simple; bh=vS8COK5ONXmBd+v9Pw/7t83kDsPGQ3h+k3bU/+IWHss=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=TWnnO9PqnY6d5Em2fiJBaYJ3u/eTiLxUlLCfGHI3ZR3EHgtqezUlPQ/scaJIl8wDgdi3694wyAuy5d5sDS2pcdMVLzYdusa/rlTbeNxIsvZ++/ygUpgaBufHC5FjY5vnP0T8txWAeyv3Xanb/YUoaky/6uifXtIq3GT71MNjHr8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CROXyWvX; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CROXyWvX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816814; x=1755352814; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=vS8COK5ONXmBd+v9Pw/7t83kDsPGQ3h+k3bU/+IWHss=; b=CROXyWvXvYM1bxV75fIUEX2RnxtrPQnk9S+9w4iKKth1qnhZsSPqOoym d3hxYxMuMafuvKAr3LTKe3rC45a0bT/bMpjVOH01JjO8I5nzJgb9t5sb3 DkKNI/fGGhoYKyZGc3z49nC703CqAHklez+HJXWBS1TbPetXVBt70f2yz r6kNhSZ4bFhN/LYl4HTcdcycNl3FWU4yIL/fSJuzUELixrWuGpa5rJKyE 6EoaMhKig/0s+OUzV5ZdXSz1g+J/dyv6ZO1Cq4+bnMlMFg2Z7trTERR1S OruuVHIPMzGpBO5rZqKCJl2Z41m9McBY9r/Wpb9pqc1Ut20PQBGgfaJsb Q==; X-CSE-ConnectionGUID: d5iepILuQuyYXrQoOqfeUQ== X-CSE-MsgGUID: 2y2sRz8jS6Sppkpx3+mJSA== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272792" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272792" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:13 -0700 X-CSE-ConnectionGUID: aX7X86weQkidX6zq2mEEng== X-CSE-MsgGUID: fq7/v0buQR6GDmtcIq+mLg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411267" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:13 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 08:59:58 -0500 Subject: [PATCH v2 10/25] cxl/port: Add endpoint decoder DC mode support to sysfs Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-10-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=6095; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=cMje7vWnYC+ErlwgLy791KZR8UUSe5SVqngdUwWVktE=; b=AfZckVztyTR+L9c31WpxqE/zGjQ7Ur7sO1B51EE2unzCyrF3LgR6wxMkJGXSUfQiUvYSUI4aI Z34dK5wmdywCm2eF0VvWKHBPn8GON+WMKweNOwosQUE1c2W2SThSyue X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Endpoint decoder mode is used to represent the partition the decoder points to such as ram or pmem. Expand the mode to allow a decoder to point to a specific DC partition (Region). Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [Fan: change mode range logic] [Fan: use !resource_size()] [djiang: use the static mode name string array in mode_store()] [Jonathan: remove rc check from mode to region index] [Jonathan: clarify decoder mode 'mixed'] [djbw: drop cleanup patch and just follow the convention in cxl_dpa_set_mode()] [fan: make dcd resource size check similar to other partitions] [djbw, jonathan, fan: remove mode range check from dc_mode_to_region_index] [iweiny: push sysfs versions to 6.12] --- Documentation/ABI/testing/sysfs-bus-cxl | 21 ++++++++++---------- drivers/cxl/core/hdm.c | 10 ++++++++++ drivers/cxl/core/port.c | 10 +++++----- drivers/cxl/cxl.h | 35 ++++++++++++++++++--------------- 4 files changed, 45 insertions(+), 31 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 3f5627a1210a..957717264709 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -316,23 +316,24 @@ Description: What: /sys/bus/cxl/devices/decoderX.Y/mode -Date: May, 2022 -KernelVersion: v6.0 +Date: May, 2022, October 2024 +KernelVersion: v6.0, v6.12 (dcY) Contact: linux-cxl@vger.kernel.org Description: (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it translates from a host physical address range, to a device local address range. Device-local address ranges are further split - into a 'ram' (volatile memory) range and 'pmem' (persistent - memory) range. The 'mode' attribute emits one of 'ram', 'pmem', - 'mixed', or 'none'. The 'mixed' indication is for error cases - when a decoder straddles the volatile/persistent partition - boundary, and 'none' indicates the decoder is not actively - decoding, or no DPA allocation policy has been set. + into a 'ram' (volatile memory) range, 'pmem' (persistent + memory) range, or Dynamic Capacity (DC) range. The 'mode' + attribute emits one of 'ram', 'pmem', 'dcY', 'mixed', or + 'none'. The 'mixed' indication is for error cases when a + decoder straddles partition boundaries, and 'none' indicates + the decoder is not actively decoding, or no DPA allocation + policy has been set. 'mode' can be written, when the decoder is in the 'disabled' - state, with either 'ram' or 'pmem' to set the boundaries for the - next allocation. + state, with 'ram', 'pmem', or 'dcY' to set the boundaries for + the next allocation. What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index b4a517c6d283..ceca0b3d3e5c 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -551,6 +551,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, switch (mode) { case CXL_DECODER_RAM: case CXL_DECODER_PMEM: + case CXL_DECODER_DC0 ... CXL_DECODER_DC7: break; default: dev_dbg(dev, "unsupported mode: %d\n", mode); @@ -578,6 +579,15 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, goto out; } + if (mode >= CXL_DECODER_DC0 && mode <= CXL_DECODER_DC7) { + rc = dc_mode_to_region_index(mode); + if (!resource_size(&cxlds->dc_res[rc])) { + dev_dbg(dev, "no available dynamic capacity\n"); + rc = -ENXIO; + goto out; + } + } + cxled->mode = mode; rc = 0; out: diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 8054cbaac9f6..222aa0aeeef7 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -205,11 +205,11 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr, enum cxl_decoder_mode mode; ssize_t rc; - if (sysfs_streq(buf, "pmem")) - mode = CXL_DECODER_PMEM; - else if (sysfs_streq(buf, "ram")) - mode = CXL_DECODER_RAM; - else + for (mode = CXL_DECODER_RAM; mode < CXL_DECODER_MIXED; mode++) + if (sysfs_streq(buf, cxl_decoder_mode_names[mode])) + break; + + if (mode >= CXL_DECODER_MIXED) return -EINVAL; rc = cxl_dpa_set_mode(cxled, mode); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 53b666ef4097..cda7e40b9a48 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -365,6 +365,9 @@ struct cxl_decoder { /* * CXL_DECODER_DEAD prevents endpoints from being reattached to regions * while cxld_unregister() is running + * + * NOTE: CXL_DECODER_RAM must be second and CXL_DECODER_MIXED must be last. + * See mode_store() */ enum cxl_decoder_mode { CXL_DECODER_NONE, @@ -382,25 +385,25 @@ enum cxl_decoder_mode { CXL_DECODER_DEAD, }; +static const char * const cxl_decoder_mode_names[] = { + [CXL_DECODER_NONE] = "none", + [CXL_DECODER_RAM] = "ram", + [CXL_DECODER_PMEM] = "pmem", + [CXL_DECODER_DC0] = "dc0", + [CXL_DECODER_DC1] = "dc1", + [CXL_DECODER_DC2] = "dc2", + [CXL_DECODER_DC3] = "dc3", + [CXL_DECODER_DC4] = "dc4", + [CXL_DECODER_DC5] = "dc5", + [CXL_DECODER_DC6] = "dc6", + [CXL_DECODER_DC7] = "dc7", + [CXL_DECODER_MIXED] = "mixed", +}; + static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) { - static const char * const names[] = { - [CXL_DECODER_NONE] = "none", - [CXL_DECODER_RAM] = "ram", - [CXL_DECODER_PMEM] = "pmem", - [CXL_DECODER_DC0] = "dc0", - [CXL_DECODER_DC1] = "dc1", - [CXL_DECODER_DC2] = "dc2", - [CXL_DECODER_DC3] = "dc3", - [CXL_DECODER_DC4] = "dc4", - [CXL_DECODER_DC5] = "dc5", - [CXL_DECODER_DC6] = "dc6", - [CXL_DECODER_DC7] = "dc7", - [CXL_DECODER_MIXED] = "mixed", - }; - if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED) - return names[mode]; + return cxl_decoder_mode_names[mode]; return "mixed"; } From patchwork Fri Aug 16 13:59:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766362 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3848F1C2331; Fri, 16 Aug 2024 14:00:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816817; cv=none; b=g4yZNnz3u+vx7ZEEMneiNYMv1bSzcizJ6ss3ILA3h8iK8sZ8dt0v7PJV7Wj9TNPw4t05eDE4HeYjBsbleQX8u39K8joY1n/6dkcy8evWXnmdN+EcGp6rs55kBgSCFZ+k1wl61zSngj1kGXiPNFc9zs+sZhlXr8ZTEP36tPo6SjM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816817; c=relaxed/simple; bh=EuwIZdTBxuucZWNs1IHHx7UI5yCAFWOZzZein89o/lg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=gK5xwjPMRWBqHZOz9xUG0lRpv03NH4p1u589MYS616Vsr+fQ5WPWtZENx6PagkxjcJM+Qt5zgsjxGjKvHjHk0XXY0xCaR02YH490fGXXD2z0HqX1WN4ImFCquExjIbyPA3YgEe4BzKhdJHegEiFpVrhIjZPKBho3lmEidg5Wvjw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OCpc3272; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OCpc3272" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816816; x=1755352816; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=EuwIZdTBxuucZWNs1IHHx7UI5yCAFWOZzZein89o/lg=; b=OCpc3272vFsKMrARzABuuCZ4SZ6ivFVl7i/y0T9I25QsEhqX3Ac99vEe mevYIipFz4LtVnQQRg4ne1lSbNSc5e34w+6f4leY3gKtAs2vCPncEiU8A thNKon1aVYc9LfErEM+eQbaIu8CJqjXWRvl/aRbCIa7jKj4CpTnaZAMJ0 Hr5zk8imQ/4Sz0YtlV6gg1KpKunW8LZS4JF9Lqe+Z3hixP5xfTXzMIE/h 2uOudEGJLdcEvCOkq0YbucKSPQ+x3F1rPgKXjg4TZNvKKzqHqPLxvDI92 eH3+26Ik6Dtad0Ne7+XE8IrI88Wyix7z+FNciX4YRyVOkO6fMYoxb6WAq g==; X-CSE-ConnectionGUID: 3GdPFOk5RZqD3/3BEy0L3g== X-CSE-MsgGUID: ZxckbS+PTt+WFtmvAXgTRg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272799" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272799" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:15 -0700 X-CSE-ConnectionGUID: 7xMVjJ+lRPuBgzb9rrwRkg== X-CSE-MsgGUID: +d8ino18TKyahLl6dJCPbg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411292" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:15 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 08:59:59 -0500 Subject: [PATCH v2 11/25] cxl/mem: Expose DCD partition capabilities in sysfs Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-11-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=5476; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=YPz6lE5LA4LvIZp/Ltai16uHx+vNTDp+xGMjO2PW7AM=; b=jzk6f5q18CCWZ2Hm8ayx0PLvdS9yrtwsAV7QuTAzZLemFZEWE8aJ2oxFor+0GUxdRWUiLkYSM dRLV/BtL0ehCeNoM89+O+RQ4aFPHwp0lb+Ne8f7Q3zNS8yMgBTlqIHV X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh To properly configure CXL regions on Dynamic Capacity Devices (DCD), user space will need to know the details of the DC partitions available. Expose dynamic capacity capabilities through sysfs. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [iweiny: remove review tags] [Davidlohr/Fan/Jonathan: omit 'dc' attribute directory if device is not DC] [Jonathan: update documentation for dc visibility] [Jonathan: Add a comment to DC region X attributes to ensure visibility checks work] [iweiny: push sysfs version to 6.12] --- Documentation/ABI/testing/sysfs-bus-cxl | 12 ++++ drivers/cxl/core/memdev.c | 97 +++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 957717264709..6227ae0ab3fc 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -54,6 +54,18 @@ Description: identically named field in the Identify Memory Device Output Payload in the CXL-2.0 specification. +What: /sys/bus/cxl/devices/memX/dc/region_count + /sys/bus/cxl/devices/memX/dc/regionY_size +Date: August, 2024 +KernelVersion: v6.12 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) Dynamic Capacity (DC) region information. The dc + directory is only visible on devices which support Dynamic + Capacity. + The region_count is the number of Dynamic Capacity (DC) + partitions (regions) supported on the device. + regionY_size is the size of each of those partitions. What: /sys/bus/cxl/devices/memX/pmem/qos_class Date: May, 2023 diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 0277726afd04..7da1f0f5711a 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -101,6 +101,18 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr, static struct device_attribute dev_attr_pmem_size = __ATTR(size, 0444, pmem_size_show, NULL); +static ssize_t region_count_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + return sysfs_emit(buf, "%d\n", mds->nr_dc_region); +} + +static struct device_attribute dev_attr_region_count = + __ATTR(region_count, 0444, region_count_show, NULL); + static ssize_t serial_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -448,6 +460,90 @@ static struct attribute *cxl_memdev_security_attributes[] = { NULL, }; +static ssize_t show_size_regionN(struct cxl_memdev *cxlmd, char *buf, int pos) +{ + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + return sysfs_emit(buf, "%#llx\n", mds->dc_region[pos].decode_len); +} + +#define REGION_SIZE_ATTR_RO(n) \ +static ssize_t region##n##_size_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return show_size_regionN(to_cxl_memdev(dev), buf, (n)); \ +} \ +static DEVICE_ATTR_RO(region##n##_size) +REGION_SIZE_ATTR_RO(0); +REGION_SIZE_ATTR_RO(1); +REGION_SIZE_ATTR_RO(2); +REGION_SIZE_ATTR_RO(3); +REGION_SIZE_ATTR_RO(4); +REGION_SIZE_ATTR_RO(5); +REGION_SIZE_ATTR_RO(6); +REGION_SIZE_ATTR_RO(7); + +/* + * RegionX attributes must be listed in order and first in this array to + * support the visbility checks. + */ +static struct attribute *cxl_memdev_dc_attributes[] = { + &dev_attr_region0_size.attr, + &dev_attr_region1_size.attr, + &dev_attr_region2_size.attr, + &dev_attr_region3_size.attr, + &dev_attr_region4_size.attr, + &dev_attr_region5_size.attr, + &dev_attr_region6_size.attr, + &dev_attr_region7_size.attr, + &dev_attr_region_count.attr, + NULL, +}; + +static umode_t cxl_memdev_dc_attr_visible(struct kobject *kobj, struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + /* Not a memory device */ + if (!mds) + return 0; + + if (a == &dev_attr_region_count.attr) + return a->mode; + + /* + * Show only the regions supported, regionX attributes are first in the + * list + */ + if (n < mds->nr_dc_region) + return a->mode; + + return 0; +} + +static bool cxl_memdev_dc_group_visible(struct kobject *kobj) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + + /* No DC regions */ + if (!mds || mds->nr_dc_region == 0) + return false; + return true; +} + +DEFINE_SYSFS_GROUP_VISIBLE(cxl_memdev_dc); + +static struct attribute_group cxl_memdev_dc_group = { + .name = "dc", + .attrs = cxl_memdev_dc_attributes, + .is_visible = SYSFS_GROUP_VISIBLE(cxl_memdev_dc), +}; + static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a, int n) { @@ -528,6 +624,7 @@ static const struct attribute_group *cxl_memdev_attribute_groups[] = { &cxl_memdev_ram_attribute_group, &cxl_memdev_pmem_attribute_group, &cxl_memdev_security_attribute_group, + &cxl_memdev_dc_group, NULL, }; From patchwork Fri Aug 16 14:00:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766363 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCAC11BB6B7; Fri, 16 Aug 2024 14:00:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816819; cv=none; b=EsmXcRYpWRyugXGPTPrVTuiPIYAPY8gLh9X+0kpzIV+OhOIQzgcIYBeMrtPr25isxnhspJaPEF0XncPaNe8H/q8gRFyEDT8sg8HjzWNBHQEsW1cvtZcYJgfrj3MXwJDRisg7jBTMRATo27+XVMlg4XvdjhTZ2A/M44YP4EC45u8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816819; c=relaxed/simple; bh=CHaiWuhlK2oilB02S0vAQGpbVxQJYSALMGWJkD3RrYQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=do+/4VeyBXjNhF35pXGrV23njrYYGCwPILVxCpXJHz8NpQv8vb1MgFU99IgURDlqfubCPiLPr0r6LcT2HakdjB0TmOEhvtaqXhBc/0iUWXyS9Y+Z681hFs1TZScR+TZ5PzzP4w3Sj+/sl980V7KNUvswJol0midfW0gFYFQoPUU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=junJ56aO; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="junJ56aO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816817; x=1755352817; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=CHaiWuhlK2oilB02S0vAQGpbVxQJYSALMGWJkD3RrYQ=; b=junJ56aOEO91ySnWnhGgI7n0jmxagpiAbEARVPWvtbXT7zZDc2QJlr/A NMhr9B0s2tNWW0yJSAFr4zj1bSjf9wiNcXaBnNeyIPQrq7lrkSgYfvuXc 8En4Z/6ozwjrodr5VRZ4BKSanGq+yfbktfY3JVuu0A22ZhATykbWUOFik NEnxsxY5OKg21J7douketTT2aV+rdwMcBxojVtWQJOasg5PDY804PRthn o6SAAzQBEgxRg5L0wbLUIIGvpJumouBLarhBFVOyxyWtQctLluGzFOk4+ zSCwpK8Hoh3P7uPNPB3fX6PBIIE8iNyl+yfpq6JtMSPCbRgjzVfMF+aoa Q==; X-CSE-ConnectionGUID: rTiy98WiQQ+SxAFQlR6r+A== X-CSE-MsgGUID: RxY67ZhvSWeJ1sFZhaRjlw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272805" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272805" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:17 -0700 X-CSE-ConnectionGUID: z5oAXfV/Rumu2d/evbcU5Q== X-CSE-MsgGUID: QEjvY54FSe+UBuX/mqx16g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411324" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:17 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:00:00 -0500 Subject: [PATCH v2 12/25] cxl/region: Refactor common create region code Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-12-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=2376; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=CHaiWuhlK2oilB02S0vAQGpbVxQJYSALMGWJkD3RrYQ=; b=cL9fGj/DZSo6UW/kPXrOdJ6X7WcGZBSw70jnsvIREtJB6UdKuilQCiGvqZDznfh0FyiYdokuK 1XK2oAQxPhHCf5sMJxCn6A2lOKKfWYR5asYkZt9Z54CT37G/Kt8z93v X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= create_pmem_region_store() and create_ram_region_store() are identical with the exception of the region mode. With the addition of DC region mode this would end up being 3 copies of the same code. Refactor create_pmem_region_store() and create_ram_region_store() to use a single common function to be used in subsequent DC code. Suggested-by: Fan Ni Signed-off-by: Ira Weiny --- drivers/cxl/core/region.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 650fe33f2ed4..f85b26b39b2f 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2553,9 +2553,8 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM); } -static ssize_t create_pmem_region_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t len) +static ssize_t create_region_store(struct device *dev, const char *buf, + size_t len, enum cxl_region_mode mode) { struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); struct cxl_region *cxlr; @@ -2565,31 +2564,26 @@ static ssize_t create_pmem_region_store(struct device *dev, if (rc != 1) return -EINVAL; - cxlr = __create_region(cxlrd, CXL_REGION_PMEM, id); + cxlr = __create_region(cxlrd, mode, id); if (IS_ERR(cxlr)) return PTR_ERR(cxlr); return len; } + +static ssize_t create_pmem_region_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + return create_region_store(dev, buf, len, CXL_REGION_PMEM); +} DEVICE_ATTR_RW(create_pmem_region); static ssize_t create_ram_region_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); - struct cxl_region *cxlr; - int rc, id; - - rc = sscanf(buf, "region%d\n", &id); - if (rc != 1) - return -EINVAL; - - cxlr = __create_region(cxlrd, CXL_REGION_RAM, id); - if (IS_ERR(cxlr)) - return PTR_ERR(cxlr); - - return len; + return create_region_store(dev, buf, len, CXL_REGION_RAM); } DEVICE_ATTR_RW(create_ram_region); From patchwork Fri Aug 16 14:00:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766364 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86E451C3F1C; Fri, 16 Aug 2024 14:00:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816822; cv=none; b=S/AIEVEQXvnZKQRiAIC0OD4djiogzkOi/YIXyofag+cWOpPvVTRa1Fw48XVdDMqdmVher8gIq3k8xAmvm0WqW5JGQJl0y/6D0vKqW3g//zo2MbeCKz1yA3qO3a0Yny603ghVNrEyuW4gD+SgoTN6/FdMVJaozmTeTkE3pok/QeE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816822; c=relaxed/simple; bh=6N49xetpP3/MkWvYeDtswoh6ylYnbcgKRvAxEyZXYGE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=g8moHcy3zVLY7DWcerD/R+JoHg9amsr5znruRPlPLw0EGx4YcKWOQaJdjnbiL8h7kjnboZ3dMXeUlOEQ/57f2p+BMdd9KwlL5EXDvNdyNayak/vTfd+8GYjeF4AnXIeTrfKN3VJyqYNMReaY/kjOjwqdCnVfxXFQcZ7wzp8w1u0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OxexcDnf; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OxexcDnf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816820; x=1755352820; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=6N49xetpP3/MkWvYeDtswoh6ylYnbcgKRvAxEyZXYGE=; b=OxexcDnf0dNDlsajdnYqgIfyVPHwIQHN2fq9gZF5KdUjdiLDMK6JHTcL yvBQ35aXN19QWfllKbKfrV2uvvIqXdtwuhKHVkbOMju93svWIPnxZEnnh vKuQ2OXq40/7Dh9sauGyX1VtMPOltPpVDR2DvKQmOMXn70RFdAhICI2lD dB1MKoccL/BgJEm9xXjdiGMBzaSxM8Y5Dleoglny/9grSK2nDqvRVFNus 2T1vzsNdffI9ZpmBVmic8NMfyctPAchAmYMUBpd0NKGfQbUSHKzurqjGa xc6r62CnTIEb315bzlSytRDTDiVBA4qYK43fQNoJpgUFz9VidA0QS354W A==; X-CSE-ConnectionGUID: ezqIWew+ReyJtM2kGRq9zQ== X-CSE-MsgGUID: znYsp3mZRnaGCWkPH4sVvg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272812" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272812" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:19 -0700 X-CSE-ConnectionGUID: rXDGJUCxTuyYa3rhO3v67A== X-CSE-MsgGUID: 7jwFR/abTb6zh3qwoLN1FA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411356" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:19 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 09:00:01 -0500 Subject: [PATCH v2 13/25] cxl/region: Add sparse DAX region support Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-13-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=11684; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=69CgosvRYCa+mLLifzOYjKXjqY/V8IHvXbDGZnOn2OE=; b=XZ9vAgelSwrjsGv5CAtFahRoSgkHliPWVYoSX98ws21gPTG7/q9LDhv+uDngvD4BlHhrAF6RB OJH5m3h8FtZBJ6o9BPbNXTFwMAErCdBqH6o0tgIzj1nby+WSfvGKLgk X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Dynamic Capacity CXL regions must allow memory to be added or removed dynamically. In addition to the quantity of memory available the location of the memory within a DC partition is dynamic based on the extents offered by a device. CXL DAX regions must accommodate the sparseness of this memory in the management of DAX regions and devices. Introduce the concept of a sparse DAX region. Add a create_dc_region() sysfs entry to create such regions. Special case DC capable regions to create a 0 sized seed DAX device to maintain compatibility which requires a default DAX device to hold a region reference. Indicate 0 byte available capacity until such time that capacity is added. Sparse regions complicate the range mapping of dax devices. There is no known use case for range mapping on sparse regions. Avoid the complication by preventing range mapping of dax devices on sparse regions. Interleaving is deferred for now. Add checks. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [Fan: use single function for dc region store] [djiang: avoid setting dev_size twice] [djbw: Check DCD support and interleave restriction on region creation] [iweiny: squash patch : dax/region: Prevent range mapping allocation on sparse regions] [iwieny: remove reviews] [iweiny: rebase to master] [iweiny: push sysfs version to 6.12] [iweiny: make cxled_to_mds inline] --- Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++-------- drivers/cxl/core/core.h | 12 +++++++++ drivers/cxl/core/port.c | 1 + drivers/cxl/core/region.c | 46 +++++++++++++++++++++++++++++++-- drivers/dax/bus.c | 10 +++++++ drivers/dax/bus.h | 1 + drivers/dax/cxl.c | 16 ++++++++++-- 7 files changed, 93 insertions(+), 15 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 6227ae0ab3fc..3a5ee88e551b 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -406,20 +406,20 @@ Description: interleave_granularity). -What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region -Date: May, 2022, January, 2023 -KernelVersion: v6.0 (pmem), v6.3 (ram) +What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dc}_region +Date: May, 2022, January, 2023, August 2024 +KernelVersion: v6.0 (pmem), v6.3 (ram), v6.12 (dc) Contact: linux-cxl@vger.kernel.org Description: (RW) Write a string in the form 'regionZ' to start the process - of defining a new persistent, or volatile memory region - (interleave-set) within the decode range bounded by root decoder - 'decoderX.Y'. The value written must match the current value - returned from reading this attribute. An atomic compare exchange - operation is done on write to assign the requested id to a - region and allocate the region-id for the next creation attempt. - EBUSY is returned if the region name written does not match the - current cached value. + of defining a new persistent, volatile, or Dynamic Capacity + (DC) memory region (interleave-set) within the decode range + bounded by root decoder 'decoderX.Y'. The value written must + match the current value returned from reading this attribute. + An atomic compare exchange operation is done on write to assign + the requested id to a region and allocate the region-id for the + next creation attempt. EBUSY is returned if the region name + written does not match the current cached value. What: /sys/bus/cxl/devices/decoderX.Y/delete_region diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 72a506c9dbd0..15b6cf1c19ef 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -4,15 +4,27 @@ #ifndef __CXL_CORE_H__ #define __CXL_CORE_H__ +#include + extern const struct device_type cxl_nvdimm_bridge_type; extern const struct device_type cxl_nvdimm_type; extern const struct device_type cxl_pmu_type; extern struct attribute_group cxl_base_attribute_group; +static inline struct cxl_memdev_state * +cxled_to_mds(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + + return container_of(cxlds, struct cxl_memdev_state, cxlds); +} + #ifdef CONFIG_CXL_REGION extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; +extern struct device_attribute dev_attr_create_dc_region; extern struct device_attribute dev_attr_delete_region; extern struct device_attribute dev_attr_region; extern const struct device_type cxl_pmem_region_type; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 222aa0aeeef7..44e1e203173d 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -320,6 +320,7 @@ static struct attribute *cxl_decoder_root_attrs[] = { &dev_attr_qos_class.attr, SET_CXL_REGION_ATTR(create_pmem_region) SET_CXL_REGION_ATTR(create_ram_region) + SET_CXL_REGION_ATTR(create_dc_region) SET_CXL_REGION_ATTR(delete_region) NULL, }; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index f85b26b39b2f..35c4a1f4f9bd 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -496,6 +496,11 @@ static ssize_t interleave_ways_store(struct device *dev, if (rc) return rc; + if (cxlr->mode == CXL_REGION_DC && val != 1) { + dev_err(dev, "Interleaving and DCD not supported\n"); + return -EINVAL; + } + rc = ways_to_eiw(val, &iw); if (rc) return rc; @@ -2174,6 +2179,7 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos, if (sysfs_streq(buf, "\n")) rc = detach_target(cxlr, pos); else { + struct cxl_endpoint_decoder *cxled; struct device *dev; dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf); @@ -2185,8 +2191,13 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos, goto out; } - rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos, - TASK_INTERRUPTIBLE); + cxled = to_cxl_endpoint_decoder(dev); + if (cxlr->mode == CXL_REGION_DC && + !cxl_dcd_supported(cxled_to_mds(cxled))) { + dev_dbg(dev, "DCD unsupported\n"); + return -EINVAL; + } + rc = attach_target(cxlr, cxled, pos, TASK_INTERRUPTIBLE); out: put_device(dev); } @@ -2534,6 +2545,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, switch (mode) { case CXL_REGION_RAM: case CXL_REGION_PMEM: + case CXL_REGION_DC: break; default: dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n", @@ -2587,6 +2599,20 @@ static ssize_t create_ram_region_store(struct device *dev, } DEVICE_ATTR_RW(create_ram_region); +static ssize_t create_dc_region_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return __create_region_show(to_cxl_root_decoder(dev), buf); +} + +static ssize_t create_dc_region_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + return create_region_store(dev, buf, len, CXL_REGION_DC); +} +DEVICE_ATTR_RW(create_dc_region); + static ssize_t region_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -3168,6 +3194,11 @@ static int devm_cxl_add_dax_region(struct cxl_region *cxlr) struct device *dev; int rc; + if (cxlr->mode == CXL_REGION_DC && cxlr->params.interleave_ways != 1) { + dev_err(&cxlr->dev, "Interleaving DC not supported\n"); + return -EINVAL; + } + cxlr_dax = cxl_dax_region_alloc(cxlr); if (IS_ERR(cxlr_dax)) return PTR_ERR(cxlr_dax); @@ -3260,6 +3291,16 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, return ERR_PTR(-EINVAL); mode = cxl_decoder_to_region_mode(cxled->mode); + if (mode == CXL_REGION_DC) { + if (!cxl_dcd_supported(cxled_to_mds(cxled))) { + dev_err(&cxled->cxld.dev, "DCD unsupported\n"); + return ERR_PTR(-EINVAL); + } + if (cxled->cxld.interleave_ways != 1) { + dev_err(&cxled->cxld.dev, "Interleaving and DCD not supported\n"); + return ERR_PTR(-EINVAL); + } + } do { cxlr = __create_region(cxlrd, mode, atomic_read(&cxlrd->region_id)); @@ -3467,6 +3508,7 @@ static int cxl_region_probe(struct device *dev) case CXL_REGION_PMEM: return devm_cxl_add_pmem_region(cxlr); case CXL_REGION_RAM: + case CXL_REGION_DC: /* * The region can not be manged by CXL if any portion of * it is already online as 'System RAM' diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index fde29e0ad68b..d8cb5195a227 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -178,6 +178,11 @@ static bool is_static(struct dax_region *dax_region) return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0; } +static bool is_sparse(struct dax_region *dax_region) +{ + return (dax_region->res.flags & IORESOURCE_DAX_SPARSE_CAP) != 0; +} + bool static_dev_dax(struct dev_dax *dev_dax) { return is_static(dev_dax->region); @@ -301,6 +306,9 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region) lockdep_assert_held(&dax_region_rwsem); + if (is_sparse(dax_region)) + return 0; + for_each_dax_region_resource(dax_region, res) size -= resource_size(res); return size; @@ -1373,6 +1381,8 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) return 0; if (a == &dev_attr_mapping.attr && is_static(dax_region)) return 0; + if (a == &dev_attr_mapping.attr && is_sparse(dax_region)) + return 0; if ((a == &dev_attr_align.attr || a == &dev_attr_size.attr) && is_static(dax_region)) return 0444; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index cbbf64443098..783bfeef42cc 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -13,6 +13,7 @@ struct dax_region; /* dax bus specific ioresource flags */ #define IORESOURCE_DAX_STATIC BIT(0) #define IORESOURCE_DAX_KMEM BIT(1) +#define IORESOURCE_DAX_SPARSE_CAP BIT(2) struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 9b29e732b39a..367e86b1c22a 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -13,19 +13,31 @@ static int cxl_dax_region_probe(struct device *dev) struct cxl_region *cxlr = cxlr_dax->cxlr; struct dax_region *dax_region; struct dev_dax_data data; + resource_size_t dev_size; + unsigned long flags; if (nid == NUMA_NO_NODE) nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start); + flags = IORESOURCE_DAX_KMEM; + if (cxlr->mode == CXL_REGION_DC) + flags |= IORESOURCE_DAX_SPARSE_CAP; + dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid, - PMD_SIZE, IORESOURCE_DAX_KMEM); + PMD_SIZE, flags); if (!dax_region) return -ENOMEM; + if (cxlr->mode == CXL_REGION_DC) + /* Add empty seed dax device */ + dev_size = 0; + else + dev_size = range_len(&cxlr_dax->hpa_range); + data = (struct dev_dax_data) { .dax_region = dax_region, .id = -1, - .size = range_len(&cxlr_dax->hpa_range), + .size = dev_size, .memmap_on_memory = true, }; From patchwork Fri Aug 16 14:00:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766365 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E37B51C3F3E; Fri, 16 Aug 2024 14:00:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816824; cv=none; b=SXpZm1M/tCHN5ARBoaE30o6+rwm2fg/liJsGf+PneOJUXw4924aafJGVtt+T1wIIfnDajts6b/izPy8fZlrXsdDp6XQbusiZYgf+uDj6uwFFK5IqKbIa4uuK9cjRusbsnPd5yPjPbD8lTINjFsSdokNvrfRjqfwH0U8kMv4iUyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816824; c=relaxed/simple; bh=1C0H7UERFiFZZ/FOaknhjU0DGvDsdbFkKV5LYyGmeyU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=EAnqnIldXRnGPvzg8GjEeKgg8nQAheyO0mLlDJwTNgK3XcQNisdcji8dNjfDsiXeBPLY6TY5ISFm5s+7B9GiwdMksBMKDpAWWvG4UAjMQnGvyh/HR83wEK4DhrvLhZru1hxyNHeB8lZIVK2fY+p4Mi6jlwsXUtZYtfeaHr3fM3w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bPD63FOE; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bPD63FOE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816823; x=1755352823; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=1C0H7UERFiFZZ/FOaknhjU0DGvDsdbFkKV5LYyGmeyU=; b=bPD63FOEbSTWA3LoIqpLk26LMHCUg/BpIH3TZ2wfij1HrM3BrCIZWXip IbdftEr9KgzqHp2s/ActBSUH0iiAH/f4vFU84JnMyJYwh9WxSEaQEi/Hg VBqO/QVda4uGG4hHQ0G7k+qHnPbCClJ390ztumfDb90tiEpHqcSEL7g1V g83fsGu9j8uYs66YcoVfkIGjyAOCTQ+bkiiqeWvRgl/X+iuBXRziOTaNw OBz8rGsuuDpNJeyfuRHcuDqCQKz5PXb+owKi3Y06HPsIm1L0VQKatefJ8 xh1Qkw4/gIYjycCcpJ6l3dHP3lADrhrjmIQoY8dP3WDVxPAy6E/0/BuJj g==; X-CSE-ConnectionGUID: c/NUEGWXRzWRRBEUUTQTtQ== X-CSE-MsgGUID: XvcFqhznQfmKakZx0Nbm9w== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272821" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272821" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:22 -0700 X-CSE-ConnectionGUID: /R+f22TKQsSFmHuaLKi0eg== X-CSE-MsgGUID: Eij69b/HQmiNrkin28pnRA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411401" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:21 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:00:02 -0500 Subject: [PATCH v2 14/25] cxl/events: Split event msgnum configuration from irq setup Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-14-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=2746; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=1C0H7UERFiFZZ/FOaknhjU0DGvDsdbFkKV5LYyGmeyU=; b=aHOePV1Xoc/fsKfdOT5Xf1woYSfx+Lq9vclxX+U8PD0AsRQ3lURv0kQMJ8iBHeX4GRcOPI9s9 ZhNsQdhg541ChVDnMdxN2jYMvOhr4e2FKyLJ6xGx/g+vqy+k3Ff/wBc X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic Capacity Devices (DCD) require event interrupts to process memory addition or removal. BIOS may have control over non-DCD event processing. DCD interrupt configuration needs to be separate from memory event interrupt configuration. Split cxl_event_config_msgnums() from irq setup in preparation for separate DCD interrupts configuration. Signed-off-by: Ira Weiny --- drivers/cxl/pci.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index f7f03599bc83..17bea49bbf4d 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -698,35 +698,31 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds, return cxl_event_get_int_policy(mds, policy); } -static int cxl_event_irqsetup(struct cxl_memdev_state *mds) +static int cxl_event_irqsetup(struct cxl_memdev_state *mds, + struct cxl_event_interrupt_policy *policy) { struct cxl_dev_state *cxlds = &mds->cxlds; - struct cxl_event_interrupt_policy policy; int rc; - rc = cxl_event_config_msgnums(mds, &policy); - if (rc) - return rc; - - rc = cxl_event_req_irq(cxlds, policy.info_settings); + rc = cxl_event_req_irq(cxlds, policy->info_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Info log\n"); return rc; } - rc = cxl_event_req_irq(cxlds, policy.warn_settings); + rc = cxl_event_req_irq(cxlds, policy->warn_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Warn log\n"); return rc; } - rc = cxl_event_req_irq(cxlds, policy.failure_settings); + rc = cxl_event_req_irq(cxlds, policy->failure_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Failure log\n"); return rc; } - rc = cxl_event_req_irq(cxlds, policy.fatal_settings); + rc = cxl_event_req_irq(cxlds, policy->fatal_settings); if (rc) { dev_err(cxlds->dev, "Failed to get interrupt for event Fatal log\n"); return rc; @@ -745,7 +741,7 @@ static bool cxl_event_int_is_fw(u8 setting) static int cxl_event_config(struct pci_host_bridge *host_bridge, struct cxl_memdev_state *mds, bool irq_avail) { - struct cxl_event_interrupt_policy policy; + struct cxl_event_interrupt_policy policy = { 0 }; int rc; /* @@ -773,11 +769,15 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, return -EBUSY; } + rc = cxl_event_config_msgnums(mds, &policy); + if (rc) + return rc; + rc = cxl_mem_alloc_event_buf(mds); if (rc) return rc; - rc = cxl_event_irqsetup(mds); + rc = cxl_event_irqsetup(mds, &policy); if (rc) return rc; From patchwork Fri Aug 16 14:00:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766366 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49EA01C461F; Fri, 16 Aug 2024 14:00:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816826; cv=none; b=onJKN2VWf2ValZwJAn7TSC2qt9nF2mxR9BTx5ZC2AeOertltPE+zewGgeFe5NXSjF2UJmrmgZK5KQU7kU+aMYJCByDCjDFKvC3e6/BRs35j2U87zu90M+LWi9V/hJuPR759hudtT2vZcBbd/taiKmvvKJESiNBwquDIMCr495Yw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816826; c=relaxed/simple; bh=SWMZ4XkcT3C6EuI3kX3FJrjCNYC9aTWVw08nTrQrMOo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=EimuGLmI1ZQ4ZXfmRqoWyUvZBDm6csZ+diZNNZk7MIxxqtMsNe5iFC5OuD8nV6PSVkpcG8YjspBVNnJHeAKItmSR85iUtHcrr1151IdVNKqyoQnTjc9J4/8KiKzQZNMXjzBtjKV2NntW7luN/j1d6BiYE6qXxXLOyW5jJWsvU+o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XxSfHOOC; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XxSfHOOC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816825; x=1755352825; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=SWMZ4XkcT3C6EuI3kX3FJrjCNYC9aTWVw08nTrQrMOo=; b=XxSfHOOCD2oypT1JVme4nOG4bwGbo97RvYdKPtFUj4EtBjHP6zUZQvPx JNPyfv5jxjfP0r7wbTgX6kR0ga0GBrQoJvxFv2d7HzCO5uSJ2RWGS2u2t Nk/p3zuaRDUAAsxoKD491HwZMzDl/WCZmhXusnFmXNTkGJHWfZPPJ11Gy LEp2Jb/+tOfkY34di99vOUE1IskdNOobwewFlglSXJ1qnrv7X9Mw57KM2 jSoc9NrJtWHNtUYjbNxqqj8WM8sO1v6PiIGn4QF30lBoIUooUMlkhgu++ nAWH2dPOs97g8l9CSos7cmN1Rl/q7OVEC3KgMZrNLD5L7rtQiUsToRL+u A==; X-CSE-ConnectionGUID: 0uXnd/zhT9SFPi8KrYrCqw== X-CSE-MsgGUID: z8orii1UTu6uPyuM9GEDPQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272829" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272829" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:24 -0700 X-CSE-ConnectionGUID: CTGrDG75QR+cnBREnAjHRg== X-CSE-MsgGUID: xndzYq6qTPGDEDxRmSzSLQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411433" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:23 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:00:03 -0500 Subject: [PATCH v2 15/25] cxl/pci: Factor out interrupt policy check Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-15-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=2095; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=SWMZ4XkcT3C6EuI3kX3FJrjCNYC9aTWVw08nTrQrMOo=; b=Kfin4DeWXtl8gIorYmbqtqpLlqJ10XLdn9MKQb/mel/Q1r0quJ2PbIK818m9Oc8+y4spEAuhh +GfYwz6cP/zC5BndOVsHqp53JgzSWyPK9V3QZLzCrpKGJARZ62f5F3D X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic Capacity Devices (DCD) require event interrupts to process memory addition or removal. BIOS may have control over non-DCD event processing. DCD interrupt configuration needs to be separate from memory event interrupt configuration. Factor out event interrupt setting validation. Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Signed-off-by: Ira Weiny --- Changes: [iweiny: reword commit message] [iweiny: keep review tags on simple patch] --- drivers/cxl/pci.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 17bea49bbf4d..370c74eae323 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -738,6 +738,21 @@ static bool cxl_event_int_is_fw(u8 setting) return mode == CXL_INT_FW; } +static bool cxl_event_validate_mem_policy(struct cxl_memdev_state *mds, + struct cxl_event_interrupt_policy *policy) +{ + if (cxl_event_int_is_fw(policy->info_settings) || + cxl_event_int_is_fw(policy->warn_settings) || + cxl_event_int_is_fw(policy->failure_settings) || + cxl_event_int_is_fw(policy->fatal_settings)) { + dev_err(mds->cxlds.dev, + "FW still in control of Event Logs despite _OSC settings\n"); + return false; + } + + return true; +} + static int cxl_event_config(struct pci_host_bridge *host_bridge, struct cxl_memdev_state *mds, bool irq_avail) { @@ -760,14 +775,8 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (rc) return rc; - if (cxl_event_int_is_fw(policy.info_settings) || - cxl_event_int_is_fw(policy.warn_settings) || - cxl_event_int_is_fw(policy.failure_settings) || - cxl_event_int_is_fw(policy.fatal_settings)) { - dev_err(mds->cxlds.dev, - "FW still in control of Event Logs despite _OSC settings\n"); + if (!cxl_event_validate_mem_policy(mds, &policy)) return -EBUSY; - } rc = cxl_event_config_msgnums(mds, &policy); if (rc) From patchwork Fri Aug 16 14:00:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766367 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0113C1C5785; Fri, 16 Aug 2024 14:00:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816828; cv=none; b=ou5BzSUdJenICOPwFhbACY0esPoTifM0Ite0FZnAzhCYrxn5HLUkFQMinJAF8Nr+FJG2ITshVywq5aAYRCtT7t8AxS1eUCHvtPgdkGd4FgTsPDE1ZR2c824vnHYVqRJWs0bf++ykpIPPynjv3j2KdmQpclhI8wgOOMVN8npL/2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816828; c=relaxed/simple; bh=xgtDDFnbYG1PgDcDzyakm0YnyHWhi9MOgea2ciekshc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Hx9SR5j52aQ2/31jOEn6PYTxQoMM7ZEad3dsNbmARaiByWwWwJizzZ5KfJ/GCC22xigHSCOiIAj8nzRvQz9d03c8C0dcKeIjtJ3BD7bLB7xgVZeqXDOYjLJzy+oqU5XL5c0N8srACXtdYFaAlu5pCzi+VE5eBxvvRq5AFZZkQiA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jsm1rU9Q; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jsm1rU9Q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816827; x=1755352827; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=xgtDDFnbYG1PgDcDzyakm0YnyHWhi9MOgea2ciekshc=; b=jsm1rU9QeYwJGIUI8CvvRZ6E/QMvM09Lt6Y/E3sN6MLQLNiCyQo3knt2 5sfW+Qs9OMr8T9NDgcy3CAW2XM9XhYeviOs3ALJ4HeHPNGyy8JajOAWIS /PbeWkmQjtrHaaNpP/2fnCo+H1wIRSEuG/KaKZUeINF56ym2roUSz5wP0 thePn54JRZNIz9QfSuuZ6tUQQRNgQQSr66J5hZvcQA+YVEb7qCD+adA4C 7ByzSBQp7V2IWWIXNX1d2DOcIJNpNzcqLzqDGFJxB/eg39tmth6F5j44N ype42nbjc9mJAX9QA7bSVNegMQMPeETCuwO2h72cy/95Edj33rRNbCLWD g==; X-CSE-ConnectionGUID: hOaapxAwT8mQDLN9iUmmkQ== X-CSE-MsgGUID: IzdK6Fa1QGCh/G17BREMjg== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272836" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272836" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:26 -0700 X-CSE-ConnectionGUID: ExJqxgD4TgWes4kyaPvyEQ== X-CSE-MsgGUID: URUfNFFZR3evTSTJkI6xtg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411469" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:26 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 09:00:04 -0500 Subject: [PATCH v2 16/25] cxl/mem: Configure dynamic capacity interrupts Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-16-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=5482; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=nG+6QeATGPf/ZeewSqjsGsZHFXYrJSQpwae8PFSMtvA=; b=Emmz4Ntv7UaB6/fZibhiCoQl71hGLd68H4V2CGDhGmm8yHhRB+n3p4dp5ZwjR1uEO5CznIDlI DVnGofD8vIqDxuyYdG2Ulck6RffZ8rjR1Oxn+MfPRx/s1w4p4P/yvWy X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Dynamic Capacity Devices (DCD) support extent change notifications through the event log mechanism. The interrupt mailbox commands were extended in CXL 3.1 to support these notifications. Firmware can't configure DCD events to be FW controlled but can retain control of memory events. Configure DCD event log interrupts on devices supporting dynamic capacity. Disable DCD if interrupts are not supported. Care is taken to preserve the interrupt policy set by the FW if FW first has been selected by the BIOS. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [iweiny: update commit message] [iweiny: rebase to upstream irq code] [iweiny: disable DCD if irqs not supported] [Jonathan: formatting fix] [Fan: add text to debug print] [djiang: make dcd helpers inline] --- drivers/cxl/cxlmem.h | 2 ++ drivers/cxl/pci.c | 72 +++++++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 62 insertions(+), 12 deletions(-) diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index b4eb8164d05d..d41bec5433db 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -225,7 +225,9 @@ struct cxl_event_interrupt_policy { u8 warn_settings; u8 failure_settings; u8 fatal_settings; + u8 dcd_settings; } __packed; +#define CXL_EVENT_INT_POLICY_BASE_SIZE 4 /* info, warn, failure, fatal */ /** * struct cxl_event_state - Event log driver state diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 370c74eae323..e5430c4e3a3b 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -669,22 +669,33 @@ static int cxl_event_get_int_policy(struct cxl_memdev_state *mds, } static int cxl_event_config_msgnums(struct cxl_memdev_state *mds, - struct cxl_event_interrupt_policy *policy) + struct cxl_event_interrupt_policy *policy, + bool native_cxl) { + size_t size_in = CXL_EVENT_INT_POLICY_BASE_SIZE; struct cxl_mbox_cmd mbox_cmd; int rc; - *policy = (struct cxl_event_interrupt_policy) { - .info_settings = CXL_INT_MSI_MSIX, - .warn_settings = CXL_INT_MSI_MSIX, - .failure_settings = CXL_INT_MSI_MSIX, - .fatal_settings = CXL_INT_MSI_MSIX, - }; + /* memory event policy is left if FW has control */ + if (native_cxl) { + *policy = (struct cxl_event_interrupt_policy) { + .info_settings = CXL_INT_MSI_MSIX, + .warn_settings = CXL_INT_MSI_MSIX, + .failure_settings = CXL_INT_MSI_MSIX, + .fatal_settings = CXL_INT_MSI_MSIX, + .dcd_settings = 0, + }; + } + + if (cxl_dcd_supported(mds)) { + policy->dcd_settings = CXL_INT_MSI_MSIX; + size_in += sizeof(policy->dcd_settings); + } mbox_cmd = (struct cxl_mbox_cmd) { .opcode = CXL_MBOX_OP_SET_EVT_INT_POLICY, .payload_in = policy, - .size_in = sizeof(*policy), + .size_in = size_in, }; rc = cxl_internal_send_cmd(mds, &mbox_cmd); @@ -731,6 +742,31 @@ static int cxl_event_irqsetup(struct cxl_memdev_state *mds, return 0; } +static int cxl_irqsetup(struct cxl_memdev_state *mds, + struct cxl_event_interrupt_policy *policy, + bool native_cxl) +{ + struct cxl_dev_state *cxlds = &mds->cxlds; + int rc; + + if (native_cxl) { + rc = cxl_event_irqsetup(mds, policy); + if (rc) + return rc; + } + + if (cxl_dcd_supported(mds)) { + rc = cxl_event_req_irq(cxlds, policy->dcd_settings); + if (rc) { + dev_err(cxlds->dev, "Failed to get interrupt for DCD event log\n"); + cxl_disable_dcd(mds); + return rc; + } + } + + return 0; +} + static bool cxl_event_int_is_fw(u8 setting) { u8 mode = FIELD_GET(CXLDEV_EVENT_INT_MODE_MASK, setting); @@ -757,17 +793,25 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, struct cxl_memdev_state *mds, bool irq_avail) { struct cxl_event_interrupt_policy policy = { 0 }; + bool native_cxl = host_bridge->native_cxl_error; int rc; /* * When BIOS maintains CXL error reporting control, it will process * event records. Only one agent can do so. + * + * If BIOS has control of events and DCD is not supported skip event + * configuration. */ - if (!host_bridge->native_cxl_error) + if (!native_cxl && !cxl_dcd_supported(mds)) return 0; if (!irq_avail) { dev_info(mds->cxlds.dev, "No interrupt support, disable event processing.\n"); + if (cxl_dcd_supported(mds)) { + dev_info(mds->cxlds.dev, "DCD requires interrupts, disable DCD\n"); + cxl_disable_dcd(mds); + } return 0; } @@ -775,10 +819,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (rc) return rc; - if (!cxl_event_validate_mem_policy(mds, &policy)) + if (native_cxl && !cxl_event_validate_mem_policy(mds, &policy)) return -EBUSY; - rc = cxl_event_config_msgnums(mds, &policy); + rc = cxl_event_config_msgnums(mds, &policy, native_cxl); if (rc) return rc; @@ -786,12 +830,16 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge, if (rc) return rc; - rc = cxl_event_irqsetup(mds, &policy); + rc = cxl_irqsetup(mds, &policy, native_cxl); if (rc) return rc; cxl_mem_get_event_records(mds, CXLDEV_EVENT_STATUS_ALL); + dev_dbg(mds->cxlds.dev, "Event config : %s DCD %s\n", + native_cxl ? "OS" : "BIOS", + cxl_dcd_supported(mds) ? "supported" : "not supported"); + return 0; } From patchwork Fri Aug 16 14:00:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766368 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3C221C57AD; Fri, 16 Aug 2024 14:00:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816830; cv=none; b=M1vX04NEBuS7/2nDvPyiXpzAJ/5nBobeAAMO1CzuEhcfy4+8dW2G3i8L+ScM39pnH53GuC1dI3KRnhxYszZjDF4EbeUMruU8orEtIPz4PcnaVJogKBsBGlo8+0ka709KkL5Z8AiD9/2ZY1MVMF7TcNmMuH5ew5C8GGFw1GVzlLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816830; c=relaxed/simple; bh=pqRNyjGqOVpA4g/Gj6bYGTLfCk09oM9Oq8+UwkaTqOc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=HL2uSC3jzZ9NDLCWxkDixqsQNIOsVfS3onwei9NTfRDdqrNQyHxQWcaUh4gPhNRBJAcMYbmmnOzzBGlh4v6IMW/vpR6ESQwWoPzsyJAF3DAynfrL35W5X7IOkhGYFDE8NPG2TbDgDFc1z/0e1nUO+UiaWbijY5yqkTpMu8zoFpE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fY89KNgH; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fY89KNgH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816828; x=1755352828; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=pqRNyjGqOVpA4g/Gj6bYGTLfCk09oM9Oq8+UwkaTqOc=; b=fY89KNgHKokwN4kwBKitp1KA8CmldjdIeCZfvUY5xlhKJfV/gEYBYVhO raYfbR4CIUiMT0pJToSY8hgY3CfzHbhT7yVbJiaXH33jz7R7VjMpJL8Qv RP0iZa1+PrvUaHFvWQIF5Zvl+2nqnABgNBT5m5UgICJC9ERqkx7t+zvkK psTrDobrHfVZjlrlOV3DZsYLJ4i4mNYzUw50jCrIUlz/zjBGqrH6UjnRC YmgxkwPoBVaSrk4D/Qzq3B/ofUDFQxbnBejaISg6RnjIreNgdlK5MuK7x VCIRqmHUGQUcY91kkOyO5S8DtXlFgjg32vXdESiMO1uQQPGUVSjEWbQiv Q==; X-CSE-ConnectionGUID: RIcCYUyDSCqzNqAv3Nm8gA== X-CSE-MsgGUID: 0o+v7z8QSjq6EHeY4liPfQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272846" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272846" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:28 -0700 X-CSE-ConnectionGUID: C1gU//79Q2+pHmssQrxcDQ== X-CSE-MsgGUID: I+xYOfkuSvu1F2Fz9NWmSQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411500" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:28 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:00:05 -0500 Subject: [PATCH v2 17/25] cxl/core: Return endpoint decoder information from region search Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-17-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=4183; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=pqRNyjGqOVpA4g/Gj6bYGTLfCk09oM9Oq8+UwkaTqOc=; b=5MduwxG5h+xCFT7Kz1niQDIzqRUK2ebRDWJcd1aUkUTq500L4J6exSyAHWBj1pHPNmr+6T2/5 pvvDjYy5mhNAUtCiNsjl0ZJ+nK52c5hOLo16ttD8LzLRKgiwuJKtmQ6 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= cxl_dpa_to_region() finds the region from a tuple. The search involves finding the device endpoint decoder as well. Dyanmic capacity extent processing uses the endpoint decoder HPA information to calculate the HPA offset. In addition, well behaved extents should be contained within an endpoint decoder. Return the endpoint decoder found to be used in subsequent DCD code. Signed-off-by: Ira Weiny --- drivers/cxl/core/core.h | 6 ++++-- drivers/cxl/core/mbox.c | 2 +- drivers/cxl/core/memdev.c | 4 ++-- drivers/cxl/core/region.c | 8 +++++++- 4 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 15b6cf1c19ef..76c4153a9b2c 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -39,7 +39,8 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled); int cxl_region_init(void); void cxl_region_exit(void); int cxl_get_poison_by_endpoint(struct cxl_port *port); -struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa); +struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa, + struct cxl_endpoint_decoder **cxled); u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa); @@ -50,7 +51,8 @@ static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, return ULLONG_MAX; } static inline -struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa) +struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa, + struct cxl_endpoint_decoder **cxled) { return NULL; } diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 68c26c4be91a..01a447aaa1b1 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -909,7 +909,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, guard(rwsem_read)(&cxl_dpa_rwsem); dpa = le64_to_cpu(evt->media_hdr.phys_addr) & CXL_DPA_MASK; - cxlr = cxl_dpa_to_region(cxlmd, dpa); + cxlr = cxl_dpa_to_region(cxlmd, dpa, NULL); if (cxlr) hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa); diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 7da1f0f5711a..12fb07fb89a6 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -323,7 +323,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa) if (rc) goto out; - cxlr = cxl_dpa_to_region(cxlmd, dpa); + cxlr = cxl_dpa_to_region(cxlmd, dpa, NULL); if (cxlr) dev_warn_once(mds->cxlds.dev, "poison inject dpa:%#llx region: %s\n", dpa, @@ -387,7 +387,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa) if (rc) goto out; - cxlr = cxl_dpa_to_region(cxlmd, dpa); + cxlr = cxl_dpa_to_region(cxlmd, dpa, NULL); if (cxlr) dev_warn_once(mds->cxlds.dev, "poison clear dpa:%#llx region: %s\n", dpa, diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 35c4a1f4f9bd..8e0884b52f84 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2828,6 +2828,7 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port) struct cxl_dpa_to_region_context { struct cxl_region *cxlr; u64 dpa; + struct cxl_endpoint_decoder *cxled; }; static int __cxl_dpa_to_region(struct device *dev, void *arg) @@ -2861,11 +2862,13 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg) dev_name(dev)); ctx->cxlr = cxlr; + ctx->cxled = cxled; return 1; } -struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa) +struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa, + struct cxl_endpoint_decoder **cxled) { struct cxl_dpa_to_region_context ctx; struct cxl_port *port; @@ -2877,6 +2880,9 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa) if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port)) device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region); + if (cxled) + *cxled = ctx.cxled; + return ctx.cxlr; } From patchwork Fri Aug 16 14:00:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766369 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 398211C68A9; Fri, 16 Aug 2024 14:00:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816833; cv=none; b=Ul7tCZPT9+BQmEE55/fVnSdKh2YC8hA24gGxJ/KG1JwXXQNIw6/NYmFcX5rUJgFhJlfe0zPrMG5jnfqAFtMAx2RZCQE0KGIUsAcRZ366qenCCFqnNG5AKWbM33/qVkH9VDd+ddBeaUcg0qx3FNIsWZMCXDKmm1njGtAKJwMGTnY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816833; c=relaxed/simple; bh=d0THao7ptuk93WHCP3YusIjqN5/9hcxEIJnQljskkxM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=kdLv+ULD/9vQrgUCjFDf0d/4J4CpxNFwFTUCj6Ub5BNGw7rtOCG/X1+p01CrUYwIyv7yBRcM5C5cgA+wQ71sVerWXKhJXz2QWh3dECerANlfF3lMbPJAL1Z/ITyrPaSrq8qhrWCj/5ObkKYFI/8ygGeYNN4AJNOGoGNJfIsgl+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Qpiwaa81; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Qpiwaa81" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816831; x=1755352831; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=d0THao7ptuk93WHCP3YusIjqN5/9hcxEIJnQljskkxM=; b=Qpiwaa81S4y0BcuiZ/KeOaPwi7JME9V8CTCCvvMSL2RiA9GqseWOgY5m u8yicFYxx4hjZ15gXqD6NLVEEvNk3gTQrINb/1zfVP69LYlTYlQmU9o+D WSxDZ0lNC1UalnzyJRcQsOfG9ARyOSxSwjWUsiCCaK9QT6eWUEngbQsRp 7NbRCqehthGKeZY6VUumxAhmm1eLJ1patGxUEy2QCx0gtcB5tnK6IAn6q uRuiL1Tw5hThD0zRq2LAbgqwh8jWYfb8zwbZiuR/Of6aoigkHh+pQelbK KtNH2ZprO4vXMfcfl+L+ZYyuSXp0N1qKdYfeK/WGtbKVCzkpIpemEoEWo w==; X-CSE-ConnectionGUID: lt0/05eZS5yRCYrnjzmATQ== X-CSE-MsgGUID: huOZSt0iROyFxcGjW2kc0Q== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272855" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272855" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:30 -0700 X-CSE-ConnectionGUID: I5JzZ7JmSfu3tSBmHwaIRw== X-CSE-MsgGUID: ayHvDkTAS8CeWtHBTNtfVA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411536" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:30 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 09:00:06 -0500 Subject: [PATCH v2 18/25] cxl/extent: Process DCD events and realize region extents Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-18-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=33945; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=cJMxK2jgeXSr72RjpNqT0ZDXAQZOYoXDl26Ujv9cDp4=; b=jS3wAZ+2wIuZ4bwRoVHog10FEfUaK8oaDpW2ewo/Oyqiohm/ZBRU4sZiWC5wpDdCsNybevuTV k3HkwzE2WkJCtCJL6BoGVW0WFxzeWsU4ggZ4AVzJUR+/NBNWDIq+hWR X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh A dynamic capacity device (DCD) sends events to signal the host for changes in the availability of Dynamic Capacity (DC) memory. These events contain extents describing a DPA range and meta data for memory to be added or removed. Events may be sent from the device at any time. Three types of events can be signaled, Add, Release, and Force Release. On add, the host may accept or reject the memory being offered. If no region exists, or the extent is invalid, the extent should be rejected. Add extent events may be grouped by a 'more' bit which indicates those extents should be processed as a group. On remove, the host can delay the response until the host is safely not using the memory. If no region exists the release can be sent immediately. The host may also release extents (or partial extents) at any time. Thus the 'more' bit grouping of release events is of less value and can be ignored in favor of sending multiple release capacity responses for groups of release events. Force removal is intended as a mechanism between the FM and the device and intended only when the host is unresponsive, out of sync, or otherwise broken. Purposely ignore force removal events. Regions are made up of one or more devices which may be surfacing memory to the host. Once all devices in a region have surfaced an extent the region can expose a corresponding extent for the user to consume. Without interleaving a device extent forms a 1:1 relationship with the region extent. Immediately surface a region extent upon getting a device extent. Per the specification the device is allowed to offer or remove extents at any time. However, anticipated use cases can expect extents to be offered, accepted, and removed in well defined chunks. Simplify extent tracking with the following restrictions. 1) Flag for removal any extent which overlaps a requested release range. 2) Refuse the offer of extents which overlap already accepted memory ranges. 3) Accept again a range which has already been accepted by the host. (It is likely the device has an error because it should already know that this range was accepted. But from the host point of view it is safe to acknowledge that acceptance again.) Management of the region extent devices must be synchronized with potential uses of the memory within the DAX layer. Create region extent devices as children of the cxl_dax_region device such that the DAX region driver can co-drive them and synchronize with the DAX layer. Synchronization and management is handled in a subsequent patch. Process DCD events and create region devices. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [iweiny: combine this with the extent surface patches to better show the lifetime extent objects in review] [iweiny: clean up commit message.] [iweiny: move extent verification of the 'read extents on region creation' to this patch] [iweiny: Provide for a common path for extent realization between an add event and adding existing extents.] [iweiny: Persist a check that an extent is within an endpoint decoder] [iweiny: reduce exported and non-static calls] [iweiny: use %par] [Jonathan: implement the more bit with a simple algorithm which accepts all extents it can. Also include the response more bit to prevent payload overflow] [Fan: Do not error if a contained extent is added.] [Jonathan: allocate ida after kzalloc] [iweiny: fix ida resource leak] [fan/djiang: remove unneeded memset] [djiang: fix indentation] [Jonathan: Fix indentation] [Jonathan/djbw: make tag a uuid] [djbw: create helper calc_hpa_range() straight away] [djbw: Allow for multiple cxled_extents per region_extent] [djbw: s/cxl_ed/cxled] [djbw: s/cxl_release_ed_extent/cxled_release_extent/] [djbw: s/reg_ext/region_extent/] [djbw: s/dc_extent/extent/] [Gregory/djbw: reject shared extents] [iweiny: predicate extent.c compile on CONFIG_CXL_REGION] --- drivers/cxl/core/Makefile | 2 +- drivers/cxl/core/core.h | 13 ++ drivers/cxl/core/extent.c | 345 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/core/mbox.c | 268 ++++++++++++++++++++++++++++++++++- drivers/cxl/core/region.c | 6 + drivers/cxl/cxl.h | 52 ++++++- drivers/cxl/cxlmem.h | 26 ++++ include/linux/cxl-event.h | 32 +++++ tools/testing/cxl/Kbuild | 3 +- 9 files changed, 743 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile index 9259bcc6773c..3b812515e725 100644 --- a/drivers/cxl/core/Makefile +++ b/drivers/cxl/core/Makefile @@ -15,4 +15,4 @@ cxl_core-y += hdm.o cxl_core-y += pmu.o cxl_core-y += cdat.o cxl_core-$(CONFIG_TRACING) += trace.o -cxl_core-$(CONFIG_CXL_REGION) += region.o +cxl_core-$(CONFIG_CXL_REGION) += region.o extent.o diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 76c4153a9b2c..8dfc97b2e0a4 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -44,12 +44,24 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa, u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa); +int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent); +int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent); #else static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa) { return ULLONG_MAX; } +static inline int cxl_add_extent(struct cxl_memdev_state *mds, + struct cxl_extent *extent) +{ + return 0; +} +static inline int cxl_rm_extent(struct cxl_memdev_state *mds, + struct cxl_extent *extent) +{ + return 0; +} static inline struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa, struct cxl_endpoint_decoder **cxled) @@ -121,5 +133,6 @@ long cxl_pci_get_latency(struct pci_dev *pdev); int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, enum access_coordinate_class access); bool cxl_need_node_perf_attrs_update(int nid); +void memdev_release_extent(struct cxl_memdev_state *mds, struct range *range); #endif /* __CXL_CORE_H__ */ diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c new file mode 100644 index 000000000000..34456594cdc3 --- /dev/null +++ b/drivers/cxl/core/extent.c @@ -0,0 +1,345 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2024 Intel Corporation. All rights reserved. */ + +#include +#include + +#include "core.h" + +static void cxled_release_extent(struct cxl_endpoint_decoder *cxled, + struct cxled_extent *ed_extent) +{ + struct cxl_memdev_state *mds = cxled_to_mds(cxled); + struct device *dev = &cxled->cxld.dev; + + dev_dbg(dev, "Remove extent %par (%*phC)\n", &ed_extent->dpa_range, + CXL_EXTENT_TAG_LEN, ed_extent->tag); + memdev_release_extent(mds, &ed_extent->dpa_range); + kfree(ed_extent); +} + +static void free_region_extent(struct region_extent *region_extent) +{ + struct cxled_extent *ed_extent; + unsigned long index; + + /* + * Remove from each endpoint decoder the extent which backs this region + * extent + */ + xa_for_each(®ion_extent->decoder_extents, index, ed_extent) + cxled_release_extent(ed_extent->cxled, ed_extent); + xa_destroy(®ion_extent->decoder_extents); + ida_free(®ion_extent->cxlr_dax->extent_ida, region_extent->dev.id); + kfree(region_extent); +} + +static void region_extent_release(struct device *dev) +{ + struct region_extent *region_extent = to_region_extent(dev); + + free_region_extent(region_extent); +} + +static const struct device_type region_extent_type = { + .name = "extent", + .release = region_extent_release, +}; + +bool is_region_extent(struct device *dev) +{ + return dev->type == ®ion_extent_type; +} +EXPORT_SYMBOL_NS_GPL(is_region_extent, CXL); + +static void region_extent_unregister(void *ext) +{ + struct region_extent *region_extent = ext; + + dev_dbg(®ion_extent->dev, "DAX region rm extent HPA %par\n", + ®ion_extent->hpa_range); + device_unregister(®ion_extent->dev); +} + +static void region_rm_extent(struct region_extent *region_extent) +{ + struct device *region_dev = region_extent->dev.parent; + + devm_release_action(region_dev, region_extent_unregister, region_extent); +} + +static struct region_extent * +alloc_region_extent(struct cxl_dax_region *cxlr_dax, struct range *hpa_range, u8 *tag) +{ + int id; + + struct region_extent *region_extent __free(kfree) = + kzalloc(sizeof(*region_extent), GFP_KERNEL); + if (!region_extent) + return ERR_PTR(-ENOMEM); + + id = ida_alloc(&cxlr_dax->extent_ida, GFP_KERNEL); + if (id < 0) + return ERR_PTR(-ENOMEM); + + region_extent->hpa_range = *hpa_range; + region_extent->cxlr_dax = cxlr_dax; + import_uuid(®ion_extent->tag, tag); + region_extent->dev.id = id; + xa_init(®ion_extent->decoder_extents); + return no_free_ptr(region_extent); +} + +static int online_region_extent(struct region_extent *region_extent) +{ + struct cxl_dax_region *cxlr_dax = region_extent->cxlr_dax; + struct device *dev; + int rc; + + dev = ®ion_extent->dev; + device_initialize(dev); + device_set_pm_not_required(dev); + dev->parent = &cxlr_dax->dev; + dev->type = ®ion_extent_type; + rc = dev_set_name(dev, "extent%d.%d", cxlr_dax->cxlr->id, dev->id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + dev_dbg(dev, "region extent HPA %par\n", ®ion_extent->hpa_range); + return devm_add_action_or_reset(&cxlr_dax->dev, region_extent_unregister, + region_extent); + +err: + dev_err(&cxlr_dax->dev, "Failed to initialize region extent HPA %par\n", + ®ion_extent->hpa_range); + + put_device(dev); + return rc; +} + +struct match_data { + struct cxl_endpoint_decoder *cxled; + struct range *new_range; +}; + +static int match_contains(struct device *dev, void *data) +{ + struct region_extent *region_extent = to_region_extent(dev); + struct match_data *md = data; + struct cxled_extent *entry; + unsigned long index; + + if (!region_extent) + return 0; + + xa_for_each(®ion_extent->decoder_extents, index, entry) { + if (md->cxled == entry->cxled && + range_contains(&entry->dpa_range, md->new_range)) + return true; + } + return false; +} + +static bool extents_contain(struct cxl_dax_region *cxlr_dax, + struct cxl_endpoint_decoder *cxled, + struct range *new_range) +{ + struct device *extent_device; + struct match_data md = { + .cxled = cxled, + .new_range = new_range, + }; + + extent_device = device_find_child(&cxlr_dax->dev, &md, match_contains); + if (!extent_device) + return false; + + put_device(extent_device); + return true; +} + +static int match_overlaps(struct device *dev, void *data) +{ + struct region_extent *region_extent = to_region_extent(dev); + struct match_data *md = data; + struct cxled_extent *entry; + unsigned long index; + + if (!region_extent) + return 0; + + xa_for_each(®ion_extent->decoder_extents, index, entry) { + if (md->cxled == entry->cxled && + range_overlaps(&entry->dpa_range, md->new_range)) + return true; + } + + return false; +} + +static bool extents_overlap(struct cxl_dax_region *cxlr_dax, + struct cxl_endpoint_decoder *cxled, + struct range *new_range) +{ + struct device *extent_device; + struct match_data md = { + .cxled = cxled, + .new_range = new_range, + }; + + extent_device = device_find_child(&cxlr_dax->dev, &md, match_overlaps); + if (!extent_device) + return false; + + put_device(extent_device); + return true; +} + +static void calc_hpa_range(struct cxl_endpoint_decoder *cxled, + struct cxl_dax_region *cxlr_dax, + struct range *dpa_range, + struct range *hpa_range) +{ + resource_size_t dpa_offset, hpa; + + dpa_offset = dpa_range->start - cxled->dpa_res->start; + hpa = cxled->cxld.hpa_range.start + dpa_offset; + + hpa_range->start = hpa - cxlr_dax->hpa_range.start; + hpa_range->end = hpa_range->start + range_len(dpa_range) - 1; +} + +static int cxlr_rm_extent(struct device *dev, void *data) +{ + struct region_extent *region_extent = to_region_extent(dev); + struct range *region_hpa_range = data; + + if (!region_extent) + return 0; + + /* + * Any extent which 'touches' the released range is removed. + */ + if (range_overlaps(region_hpa_range, ®ion_extent->hpa_range)) { + dev_dbg(dev, "Remove region extent HPA %par\n", + ®ion_extent->hpa_range); + region_rm_extent(region_extent); + } + return 0; +} + +int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent) +{ + u64 start_dpa = le64_to_cpu(extent->start_dpa); + struct cxl_memdev *cxlmd = mds->cxlds.cxlmd; + struct cxl_endpoint_decoder *cxled; + struct range hpa_range, dpa_range; + struct cxl_region *cxlr; + + dpa_range = (struct range) { + .start = start_dpa, + .end = start_dpa + le64_to_cpu(extent->length) - 1, + }; + + guard(rwsem_read)(&cxl_region_rwsem); + cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled); + if (!cxlr) { + memdev_release_extent(mds, &dpa_range); + return -ENXIO; + } + + calc_hpa_range(cxled, cxlr->cxlr_dax, &dpa_range, &hpa_range); + + /* Remove region extents which overlap */ + return device_for_each_child(&cxlr->cxlr_dax->dev, &hpa_range, + cxlr_rm_extent); +} + +static int cxlr_add_extent(struct cxl_dax_region *cxlr_dax, + struct cxl_endpoint_decoder *cxled, + struct cxled_extent *ed_extent) +{ + struct region_extent *region_extent; + struct range hpa_range; + int rc; + + calc_hpa_range(cxled, cxlr_dax, &ed_extent->dpa_range, &hpa_range); + + region_extent = alloc_region_extent(cxlr_dax, &hpa_range, ed_extent->tag); + if (IS_ERR(region_extent)) + return PTR_ERR(region_extent); + + rc = xa_insert(®ion_extent->decoder_extents, (unsigned long)ed_extent, ed_extent, + GFP_KERNEL); + if (rc) { + free_region_extent(region_extent); + return rc; + } + + /* device model handles freeing region_extent */ + return online_region_extent(region_extent); +} + +/* Callers are expected to ensure cxled has been attached to a region */ +int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent) +{ + u64 start_dpa = le64_to_cpu(extent->start_dpa); + struct cxl_memdev *cxlmd = mds->cxlds.cxlmd; + struct cxl_endpoint_decoder *cxled; + struct range ed_range, ext_range; + struct cxl_dax_region *cxlr_dax; + struct cxled_extent *ed_extent; + struct cxl_region *cxlr; + struct device *dev; + + ext_range = (struct range) { + .start = start_dpa, + .end = start_dpa + le64_to_cpu(extent->length) - 1, + }; + + guard(rwsem_read)(&cxl_region_rwsem); + cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled); + if (!cxlr) + return -ENXIO; + + cxlr_dax = cxled->cxld.region->cxlr_dax; + dev = &cxled->cxld.dev; + ed_range = (struct range) { + .start = cxled->dpa_res->start, + .end = cxled->dpa_res->end, + }; + + dev_dbg(&cxled->cxld.dev, "Checking ED (%pr) for extent %par\n", + cxled->dpa_res, &ext_range); + + if (!range_contains(&ed_range, &ext_range)) { + dev_err_ratelimited(dev, + "DC extent DPA %par (%*phC) is not fully in ED %par\n", + &ext_range.start, CXL_EXTENT_TAG_LEN, + extent->tag, &ed_range); + return -ENXIO; + } + + if (extents_contain(cxlr_dax, cxled, &ext_range)) + return 0; + + if (extents_overlap(cxlr_dax, cxled, &ext_range)) + return -ENXIO; + + ed_extent = kzalloc(sizeof(*ed_extent), GFP_KERNEL); + if (!ed_extent) + return -ENOMEM; + + ed_extent->cxled = cxled; + ed_extent->dpa_range = ext_range; + memcpy(ed_extent->tag, extent->tag, CXL_EXTENT_TAG_LEN); + + dev_dbg(dev, "Add extent %par (%*phC)\n", &ed_extent->dpa_range, + CXL_EXTENT_TAG_LEN, ed_extent->tag); + + return cxlr_add_extent(cxlr_dax, cxled, ed_extent); +} diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 01a447aaa1b1..55edc70971c3 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -882,6 +882,48 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL); +static int cxl_validate_extent(struct cxl_memdev_state *mds, + struct cxl_extent *extent) +{ + u64 start = le64_to_cpu(extent->start_dpa); + u64 length = le64_to_cpu(extent->length); + struct device *dev = mds->cxlds.dev; + + struct range ext_range = (struct range){ + .start = start, + .end = start + length - 1, + }; + + if (le16_to_cpu(extent->shared_extn_seq) != 0) { + dev_err_ratelimited(dev, + "DC extent DPA %par (%*phC) can not be shared\n", + &ext_range.start, CXL_EXTENT_TAG_LEN, + extent->tag); + return -ENXIO; + } + + /* Extents must not cross DC region boundary's */ + for (int i = 0; i < mds->nr_dc_region; i++) { + struct cxl_dc_region_info *dcr = &mds->dc_region[i]; + struct range region_range = (struct range) { + .start = dcr->base, + .end = dcr->base + dcr->decode_len - 1, + }; + + if (range_contains(®ion_range, &ext_range)) { + dev_dbg(dev, "DC extent DPA %par (DCR:%d:%#llx)(%*phC)\n", + &ext_range, i, start - dcr->base, + CXL_EXTENT_TAG_LEN, extent->tag); + return 0; + } + } + + dev_err_ratelimited(dev, + "DC extent DPA %par (%*phC) is not in any DC region\n", + &ext_range, CXL_EXTENT_TAG_LEN, extent->tag); + return -ENXIO; +} + void cxl_event_trace_record(const struct cxl_memdev *cxlmd, enum cxl_event_log_type type, enum cxl_event_type event_type, @@ -1009,6 +1051,207 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds, return rc; } +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode, + struct xarray *extent_array, int cnt) +{ + struct cxl_mbox_dc_response *p; + struct cxl_mbox_cmd mbox_cmd; + struct cxl_extent *extent; + unsigned long index; + u32 pl_index; + int rc = 0; + + size_t pl_size = struct_size(p, extent_list, cnt); + u32 max_extents = cnt; + + /* May have to use more bit on response. */ + if (pl_size > mds->payload_size) { + max_extents = (mds->payload_size - sizeof(*p)) / + sizeof(struct updated_extent_list); + pl_size = struct_size(p, extent_list, max_extents); + } + + struct cxl_mbox_dc_response *response __free(kfree) = + kzalloc(pl_size, GFP_KERNEL); + if (!response) + return -ENOMEM; + + pl_index = 0; + xa_for_each(extent_array, index, extent) { + + response->extent_list[pl_index].dpa_start = extent->start_dpa; + response->extent_list[pl_index].length = extent->length; + pl_index++; + response->extent_list_size = cpu_to_le32(pl_index); + + if (pl_index == max_extents) { + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = opcode, + .size_in = struct_size(response, extent_list, + pl_index), + .payload_in = response, + }; + + response->flags = 0; + if (pl_index < cnt) + response->flags &= CXL_DCD_EVENT_MORE; + + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + if (rc) + return rc; + pl_index = 0; + } + } + + if (pl_index) { + mbox_cmd = (struct cxl_mbox_cmd) { + .opcode = opcode, + .size_in = struct_size(response, extent_list, + pl_index), + .payload_in = response, + }; + + response->flags = 0; + rc = cxl_internal_send_cmd(mds, &mbox_cmd); + } + + return rc; +} + +void memdev_release_extent(struct cxl_memdev_state *mds, struct range *range) +{ + struct device *dev = mds->cxlds.dev; + struct xarray extent_list; + + struct cxl_extent extent = { + .start_dpa = cpu_to_le64(range->start), + .length = cpu_to_le64(range_len(range)), + }; + + dev_dbg(dev, "Release response dpa %par\n", range); + + xa_init(&extent_list); + if (xa_insert(&extent_list, 0, &extent, GFP_KERNEL)) { + dev_dbg(dev, "Failed to release %par\n", range); + goto destroy; + } + + if (cxl_send_dc_response(mds, CXL_MBOX_OP_RELEASE_DC, &extent_list, 1)) + dev_dbg(dev, "Failed to release %par\n", range); + +destroy: + xa_destroy(&extent_list); +} + +static int validate_add_extent(struct cxl_memdev_state *mds, + struct cxl_extent *extent) +{ + int rc; + + rc = cxl_validate_extent(mds, extent); + if (rc) + return rc; + + return cxl_add_extent(mds, extent); +} + +static int cxl_add_pending(struct cxl_memdev_state *mds) +{ + struct device *dev = mds->cxlds.dev; + struct cxl_extent *extent; + unsigned long index; + unsigned long cnt = 0; + int rc; + + xa_for_each(&mds->pending_extents, index, extent) { + if (validate_add_extent(mds, extent)) { + dev_dbg(dev, "unconsumed DC extent DPA:%#llx LEN:%#llx\n", + le64_to_cpu(extent->start_dpa), + le64_to_cpu(extent->length)); + xa_erase(&mds->pending_extents, index); + kfree(extent); + continue; + } + cnt++; + } + rc = cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE, + &mds->pending_extents, cnt); + xa_for_each(&mds->pending_extents, index, extent) { + xa_erase(&mds->pending_extents, index); + kfree(extent); + } + return rc; +} + +static int handle_add_event(struct cxl_memdev_state *mds, + struct cxl_event_dcd *event) +{ + struct device *dev = mds->cxlds.dev; + + struct cxl_extent *tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + memcpy(tmp, &event->extent, sizeof(*tmp)); + if (xa_insert(&mds->pending_extents, (unsigned long)tmp, tmp, + GFP_KERNEL)) { + kfree(tmp); + return -ENOMEM; + } + + if (event->flags & CXL_DCD_EVENT_MORE) { + dev_dbg(dev, "more bit set; delay the surfacing of extent\n"); + return 0; + } + + /* extents are removed and free'ed in cxl_add_pending() */ + return cxl_add_pending(mds); +} + +static char *cxl_dcd_evt_type_str(u8 type) +{ + switch (type) { + case DCD_ADD_CAPACITY: + return "add"; + case DCD_RELEASE_CAPACITY: + return "release"; + case DCD_FORCED_CAPACITY_RELEASE: + return "force release"; + default: + break; + } + + return ""; +} + +static int cxl_handle_dcd_event_records(struct cxl_memdev_state *mds, + struct cxl_event_record_raw *raw_rec) +{ + struct cxl_event_dcd *event = &raw_rec->event.dcd; + struct cxl_extent *extent = &event->extent; + struct device *dev = mds->cxlds.dev; + uuid_t *id = &raw_rec->id; + + if (!uuid_equal(id, &CXL_EVENT_DC_EVENT_UUID)) + return -EINVAL; + + dev_dbg(dev, "DCD event %s : DPA:%#llx LEN:%#llx\n", + cxl_dcd_evt_type_str(event->event_type), + le64_to_cpu(extent->start_dpa), le64_to_cpu(extent->length)); + + switch (event->event_type) { + case DCD_ADD_CAPACITY: + return handle_add_event(mds, event); + case DCD_RELEASE_CAPACITY: + return cxl_rm_extent(mds, &event->extent); + case DCD_FORCED_CAPACITY_RELEASE: + dev_err_ratelimited(dev, "Forced release event ignored.\n"); + return 0; + default: + return -EINVAL; + } +} + static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, enum cxl_event_log_type type) { @@ -1044,9 +1287,17 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds, if (!nr_rec) break; - for (i = 0; i < nr_rec; i++) + for (i = 0; i < nr_rec; i++) { __cxl_event_trace_record(cxlmd, type, &payload->records[i]); + if (type == CXL_EVENT_TYPE_DCD) { + rc = cxl_handle_dcd_event_records(mds, + &payload->records[i]); + if (rc) + dev_err_ratelimited(dev, "dcd event failed: %d\n", + rc); + } + } if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW) trace_cxl_overflow(cxlmd, type, payload); @@ -1078,6 +1329,8 @@ void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status) { dev_dbg(mds->cxlds.dev, "Reading event logs: %x\n", status); + if (cxl_dcd_supported(mds) && (status & CXLDEV_EVENT_STATUS_DCD)) + cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_DCD); if (status & CXLDEV_EVENT_STATUS_FATAL) cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_FATAL); if (status & CXLDEV_EVENT_STATUS_FAIL) @@ -1610,6 +1863,17 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL); +static void clear_pending_extents(void *_mds) +{ + struct cxl_memdev_state *mds = _mds; + struct cxl_extent *extent; + unsigned long index; + + xa_for_each(&mds->pending_extents, index, extent) + kfree(extent); + xa_destroy(&mds->pending_extents); +} + struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) { struct cxl_memdev_state *mds; @@ -1628,6 +1892,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID; mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID; + xa_init(&mds->pending_extents); + devm_add_action_or_reset(dev, clear_pending_extents, mds); return mds; } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 8e0884b52f84..8c9171f914fb 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3037,6 +3037,7 @@ static void cxl_dax_region_release(struct device *dev) { struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); + ida_destroy(&cxlr_dax->extent_ida); kfree(cxlr_dax); } @@ -3090,6 +3091,8 @@ static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr) dev = &cxlr_dax->dev; cxlr_dax->cxlr = cxlr; + cxlr->cxlr_dax = cxlr_dax; + ida_init(&cxlr_dax->extent_ida); device_initialize(dev); lockdep_set_class(&dev->mutex, &cxl_dax_region_key); device_set_pm_not_required(dev); @@ -3190,7 +3193,10 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr) static void cxlr_dax_unregister(void *_cxlr_dax) { struct cxl_dax_region *cxlr_dax = _cxlr_dax; + struct cxl_region *cxlr = cxlr_dax->cxlr; + cxlr->cxlr_dax = NULL; + cxlr_dax->cxlr = NULL; device_unregister(&cxlr_dax->dev); } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index cda7e40b9a48..30bfd1570c63 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -11,6 +11,7 @@ #include #include #include +#include extern const struct nvdimm_security_ops *cxl_security_ops; @@ -169,11 +170,13 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw) #define CXLDEV_EVENT_STATUS_WARN BIT(1) #define CXLDEV_EVENT_STATUS_FAIL BIT(2) #define CXLDEV_EVENT_STATUS_FATAL BIT(3) +#define CXLDEV_EVENT_STATUS_DCD BIT(4) #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO | \ CXLDEV_EVENT_STATUS_WARN | \ CXLDEV_EVENT_STATUS_FAIL | \ - CXLDEV_EVENT_STATUS_FATAL) + CXLDEV_EVENT_STATUS_FATAL | \ + CXLDEV_EVENT_STATUS_DCD) /* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */ #define CXLDEV_EVENT_INT_MODE_MASK GENMASK(1, 0) @@ -444,6 +447,18 @@ enum cxl_decoder_state { CXL_DECODER_STATE_AUTO, }; +/** + * struct cxled_extent - Extent within an endpoint decoder + * @cxled: Reference to the endpoint decoder + * @dpa_range: DPA range this extent covers within the decoder + * @tag: Tag from device for this extent + */ +struct cxled_extent { + struct cxl_endpoint_decoder *cxled; + struct range dpa_range; + u8 tag[CXL_EXTENT_TAG_LEN]; +}; + /** * struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder * @cxld: base cxl_decoder_object @@ -569,6 +584,7 @@ struct cxl_region_params { * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge + * @cxlr_dax: (for DC regions) cached copy of CXL DAX bridge * @flags: Region state flags * @params: active + config params for the region * @coord: QoS access coordinates for the region @@ -582,6 +598,7 @@ struct cxl_region { enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_pmem_region *cxlr_pmem; + struct cxl_dax_region *cxlr_dax; unsigned long flags; struct cxl_region_params params; struct access_coordinate coord[ACCESS_COORDINATE_MAX]; @@ -622,12 +639,45 @@ struct cxl_pmem_region { struct cxl_pmem_region_mapping mapping[]; }; +/* See CXL 3.0 8.2.9.2.1.5 */ +enum dc_event { + DCD_ADD_CAPACITY, + DCD_RELEASE_CAPACITY, + DCD_FORCED_CAPACITY_RELEASE, + DCD_REGION_CONFIGURATION_UPDATED, +}; + struct cxl_dax_region { struct device dev; struct cxl_region *cxlr; struct range hpa_range; + struct ida extent_ida; }; +/** + * struct region_extent - CXL DAX region extent + * @dev: device representing this extent + * @cxlr_dax: back reference to parent region device + * @hpa_range: HPA range of this extent + * @tag: tag of the extent + * @decoder_extents: Endpoint decoder extents which make up this region extent + */ +struct region_extent { + struct device dev; + struct cxl_dax_region *cxlr_dax; + struct range hpa_range; + uuid_t tag; + struct xarray decoder_extents; +}; + +bool is_region_extent(struct device *dev); +static inline struct region_extent *to_region_extent(struct device *dev) +{ + if (!is_region_extent(dev)) + return NULL; + return container_of(dev, struct region_extent, dev); +} + /** * struct cxl_port - logical collection of upstream port devices and * downstream port devices to construct a CXL memory diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index d41bec5433db..3a40fe1f0be7 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -497,6 +497,7 @@ struct cxl_dc_region_info { * @pmem_perf: performance data entry matched to PMEM partition * @nr_dc_region: number of DC regions implemented in the memory device * @dc_region: array containing info about the DC regions + * @pending_extents: array of extents pending during more bit processing * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -532,6 +533,7 @@ struct cxl_memdev_state { u8 nr_dc_region; struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; + struct xarray pending_extents; struct cxl_event_state event; struct cxl_poison_state poison; @@ -607,6 +609,21 @@ enum cxl_opcode { UUID_INIT(0x5e1819d9, 0x11a9, 0x400c, 0x81, 0x1f, 0xd6, 0x07, 0x19, \ 0x40, 0x3d, 0x86) +/* + * Add Dynamic Capacity Response + * CXL rev 3.1 section 8.2.9.9.9.3; Table 8-168 & Table 8-169 + */ +struct cxl_mbox_dc_response { + __le32 extent_list_size; + u8 flags; + u8 reserved[3]; + struct updated_extent_list { + __le64 dpa_start; + __le64 length; + u8 reserved[8]; + } __packed extent_list[]; +} __packed; + struct cxl_mbox_get_supported_logs { __le16 entries; u8 rsvd[6]; @@ -669,6 +686,14 @@ struct cxl_mbox_identify { UUID_INIT(0xfe927475, 0xdd59, 0x4339, 0xa5, 0x86, 0x79, 0xba, 0xb1, \ 0x13, 0xb7, 0x74) +/* + * Dynamic Capacity Event Record + * CXL rev 3.1 section 8.2.9.2.1; Table 8-43 + */ +#define CXL_EVENT_DC_EVENT_UUID \ + UUID_INIT(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f, 0x95, 0x26, 0x8e, \ + 0x10, 0x1a, 0x2a) + /* * Get Event Records output payload * CXL rev 3.0 section 8.2.9.2.2; Table 8-50 @@ -694,6 +719,7 @@ enum cxl_event_log_type { CXL_EVENT_TYPE_WARN, CXL_EVENT_TYPE_FAIL, CXL_EVENT_TYPE_FATAL, + CXL_EVENT_TYPE_DCD, CXL_EVENT_TYPE_MAX }; diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h index 0bea1afbd747..eeda8059d81a 100644 --- a/include/linux/cxl-event.h +++ b/include/linux/cxl-event.h @@ -96,11 +96,43 @@ struct cxl_event_mem_module { u8 reserved[0x3d]; } __packed; +/* + * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-51 + */ +#define CXL_EXTENT_TAG_LEN 0x10 +struct cxl_extent { + __le64 start_dpa; + __le64 length; + u8 tag[CXL_EXTENT_TAG_LEN]; + __le16 shared_extn_seq; + u8 reserved[0x6]; +} __packed; + +/* + * Dynamic Capacity Event Record + * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-50 + */ +#define CXL_DCD_EVENT_MORE BIT(0) +struct cxl_event_dcd { + struct cxl_event_record_hdr hdr; + u8 event_type; + u8 validity_flags; + __le16 host_id; + u8 region_index; + u8 flags; + u8 reserved1[0x2]; + struct cxl_extent extent; + u8 reserved2[0x18]; + __le32 num_avail_extents; + __le32 num_avail_tags; +} __packed; + union cxl_event { struct cxl_event_generic generic; struct cxl_event_gen_media gen_media; struct cxl_event_dram dram; struct cxl_event_mem_module mem_module; + struct cxl_event_dcd dcd; /* dram & gen_media event header */ struct cxl_event_media_hdr media_hdr; } __packed; diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild index 030b388800f0..8238588fffdf 100644 --- a/tools/testing/cxl/Kbuild +++ b/tools/testing/cxl/Kbuild @@ -61,7 +61,8 @@ cxl_core-y += $(CXL_CORE_SRC)/hdm.o cxl_core-y += $(CXL_CORE_SRC)/pmu.o cxl_core-y += $(CXL_CORE_SRC)/cdat.o cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o -cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o +cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o \ + $(CXL_CORE_SRC)/extent.o cxl_core-y += config_check.o cxl_core-y += cxl_core_test.o cxl_core-y += cxl_core_exports.o From patchwork Fri Aug 16 14:00:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766370 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 019761C7B74; Fri, 16 Aug 2024 14:00:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816835; cv=none; b=gA1Iq2ne3PZcTlhST0osIj/fbx4yWkm9A+pn6map/GKtwV0qzh0TV+J9jMnk83zrknkIK5mmSCCvKckf3EN4Skr/A7I2ZU0xMRAGMM08rHfs8j4m5SBnU0fzwFuKrRyEsEQxEPQWSeNdbhR44Udq8njn1diQ57GXll4W/BcIHts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816835; c=relaxed/simple; bh=TT1Kr8PjO9EzFFK+7ZI6aOJhz7Bla3ibNxpdthisd9I=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=N6H7OgIpM5aOGwNneRmVqog9LQBOLXB6zV9kGZ8bNq+kjK4ov1NeEHt7bWA+Z13uJ7t8ysVAg64ozaaHtukCKrmWf0hELKmqzUSMaCQYfG/M3Z5qtRnBXnOud5L7jTPqilj67VTAL2ltzjPuU7C7AOabk/rGjfjlV+YU0RNnp1I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=E0JI5v9O; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="E0JI5v9O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816834; x=1755352834; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=TT1Kr8PjO9EzFFK+7ZI6aOJhz7Bla3ibNxpdthisd9I=; b=E0JI5v9OmRBHSZkJ7jxrF+jFe/sORAO1dd3pHxTscCZPhR1zpcURMyRo 3W1mE4GYhDxz1DpfCxaHg0JHWFFHH0XcKgQ9xF2qG08i6rwly6WBeHBrv 2I0pykmP0rp0ZAHcz3tQwdPQpPP5YNXl84wJ5auO6/HeEcIUET/SVkW55 /aPav+Udg2ipAAEZZ2ufACmMF5ym6C11wvhLQk4UVnAsGJEcSbT3idoBm HRFZILY7zIet7zMowzYIYLl0INBHB+oWvo+al9QEHg3BH8by48LhXNMId n3uElgdnf50mQz886dUrxm43oV5tu7ZyBtR3OsdmDfqrjcLfd7zSH7Doe Q==; X-CSE-ConnectionGUID: nmtdOkQqTKqMtwZ/mR+fqA== X-CSE-MsgGUID: 4XbTybdKQ2GovHlYrXQgkw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272862" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272862" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:33 -0700 X-CSE-ConnectionGUID: kRAu1SCXTDuFjmycXki5eg== X-CSE-MsgGUID: OIcegIcMQqm0LzTbVMcp6Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411560" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:32 -0700 From: ira.weiny@intel.com Date: Fri, 16 Aug 2024 09:00:07 -0500 Subject: [PATCH v2 19/25] cxl/region/extent: Expose region extent information in sysfs Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-19-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=4265; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=P6/+fplyb/+ZFSyo0jMPTE6EnFG/uQFiP+mkQTWMlTw=; b=Ht4VUB4lUoW6HLr0DGeUGmgn4vQB7mfeJwl39HvNlsPZI23vyqigFJK55bGpkg1zw5n7Z40TY DNmyqu75J2lA21RVsh8GJZ5NBFg3m3ll4N9GM9RNeVXoHLVEBAnoTz/ X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= From: Navneet Singh Extent information can be helpful to the user to coordinate memory usage with the external orchestrator and FM. Expose the details of region extents by creating the following sysfs entries. /sys/bus/cxl/devices/dax_regionX/extentX.Y /sys/bus/cxl/devices/dax_regionX/extentX.Y/offset /sys/bus/cxl/devices/dax_regionX/extentX.Y/length /sys/bus/cxl/devices/dax_regionX/extentX.Y/tag Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [iweiny: split this out] [Jonathan: add documentation for extent sysfs] [Jonathan/djbw: s/label/tag] [Jonathan/djbw: treat tag as uuid] [djbw: use __ATTRIBUTE_GROUPS] [djbw: make tag invisible if it is empty] [djbw/iweiny: use conventional id names for extents; extentX.Y] --- Documentation/ABI/testing/sysfs-bus-cxl | 13 ++++++++ drivers/cxl/core/extent.c | 58 +++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 3a5ee88e551b..e97e6a73c960 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -599,3 +599,16 @@ Description: See Documentation/ABI/stable/sysfs-devices-node. access0 provides the number to the closest initiator and access1 provides the number to the closest CPU. + +What: /sys/bus/cxl/devices/dax_regionX/extentX.Y/offset + /sys/bus/cxl/devices/dax_regionX/extentX.Y/length + /sys/bus/cxl/devices/dax_regionX/extentX.Y/tag +Date: October, 2024 +KernelVersion: v6.12 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) [For Dynamic Capacity regions only] Extent offset and + length within the region. Users can use the extent information + to create DAX devices on specific extents. This is done by + creating and destroying DAX devices in specific sequences and + looking at the mappings created. diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c index 34456594cdc3..d7d526a51e2b 100644 --- a/drivers/cxl/core/extent.c +++ b/drivers/cxl/core/extent.c @@ -6,6 +6,63 @@ #include "core.h" +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct region_extent *region_extent = to_region_extent(dev); + + return sysfs_emit(buf, "%#llx\n", region_extent->hpa_range.start); +} +static DEVICE_ATTR_RO(offset); + +static ssize_t length_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct region_extent *region_extent = to_region_extent(dev); + u64 length = range_len(®ion_extent->hpa_range); + + return sysfs_emit(buf, "%#llx\n", length); +} +static DEVICE_ATTR_RO(length); + +static ssize_t tag_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct region_extent *region_extent = to_region_extent(dev); + + return sysfs_emit(buf, "%pUb\n", ®ion_extent->tag); +} +static DEVICE_ATTR_RO(tag); + +static struct attribute *region_extent_attrs[] = { + &dev_attr_offset.attr, + &dev_attr_length.attr, + &dev_attr_tag.attr, + NULL, +}; + +static uuid_t empty_tag = { 0 }; + +static umode_t region_extent_visible(struct kobject *kobj, + struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct region_extent *region_extent = to_region_extent(dev); + + if (a == &dev_attr_tag.attr && + uuid_equal(®ion_extent->tag, &empty_tag)) + return 0; + + return a->mode; +} + +static const struct attribute_group region_extent_attribute_group = { + .attrs = region_extent_attrs, + .is_visible = region_extent_visible, +}; + +__ATTRIBUTE_GROUPS(region_extent_attribute); + static void cxled_release_extent(struct cxl_endpoint_decoder *cxled, struct cxled_extent *ed_extent) { @@ -44,6 +101,7 @@ static void region_extent_release(struct device *dev) static const struct device_type region_extent_type = { .name = "extent", .release = region_extent_release, + .groups = region_extent_attribute_groups, }; bool is_region_extent(struct device *dev) From patchwork Fri Aug 16 14:00:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766371 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 817DD1C7B7A; Fri, 16 Aug 2024 14:00:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816836; cv=none; b=HibTEq8LtU83hCMBPaAjhGTNk6W/pY7j1nofLsJiJ9kEI4Pc559XqOR56D406m9x0rlEkEoBVC65QllnOwtwH/vXN6CKbT6+bM5S5SSg4J6LJe8ldawVX7OLlu+USpa02kGWqTqYhFEaCFpIIBxImO+K6iI+ThhUHM4CoLqExYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816836; c=relaxed/simple; bh=uhWZ3N3tkYuVbTriArf1p2KiIjpv3vdwFOAw9cg65g0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=UMn1J6vilT4S/r8mjHN1vITgGlsVQmQTdNtW77S5rlIETHbAKwswA8NbRMBdDHl2phNBgur5u7mRYwVGHP5bhHzaVBkCMTSlmr1xz44KUz+l38Nov/KUfSSPegwm859s79HZqmrNXMnWKWyrWbCTh/TDRks6uzs/JlW7bdH9hls= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Kno6vn8R; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Kno6vn8R" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816834; x=1755352834; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=uhWZ3N3tkYuVbTriArf1p2KiIjpv3vdwFOAw9cg65g0=; b=Kno6vn8RoqY/fX2Utp2Wa0Y8rb7VIrukIuBanvA8QIHXE1RwR2otVwmZ eRjiKBX+/M8LGuUtdG45jT7LTFLRBTuZFCWT5JS6MQK1ncYRUGzLCFtxF EjNtrJgcakQho9aL7fLPaErPX1CWmh+mul4mZLOZxK33/kZt+jCW2fGdw nlloN2XWAPn4Q48tysMS5atCy8wMPnHoE0cj5E1qPeYg8zgu8buNg4HaP 3tcDSJroQj3i6d3PQPqRuWTwo9IcPveUKJ7U+wdGTssJ2pHVAhv5e0Nt4 mPz6pRkyJRgtkfByngRFRzaXQ/Gz2mpnfcf+Hdj4ibpXNAS1SRAE3bVKW A==; X-CSE-ConnectionGUID: CNfoG8tlTmKBzZPGi3KMEA== X-CSE-MsgGUID: 0bZfAK+eQcW5dCBmJK5RBw== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272871" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272871" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:34 -0700 X-CSE-ConnectionGUID: Rzfcri0JTn+1InsogSFATg== X-CSE-MsgGUID: T1FooDEkTV2ClewjVNfjbg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411590" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:34 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:00:08 -0500 Subject: [PATCH v2 20/25] dax/bus: Factor out dev dax resize logic Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-20-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=8750; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=uhWZ3N3tkYuVbTriArf1p2KiIjpv3vdwFOAw9cg65g0=; b=DOIG6cnTVdPIaD1b8V7NWfTnFy6tYB47vO+/ShmJjmsXKyU19u61EcgB2HOmcD5ElqH+whSQ6 vFE7rGMuuS6BB/I6bv6gIAgpHsge2YwtJxHsnCHmLfNhfqrkhLx8T6I X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= Dynamic Capacity regions must limit dev dax resources to those areas which have extents backing real memory. Such DAX regions are dubbed 'sparse' regions. In order to manage where memory is available four alternatives were considered: 1) Create a single region resource child on region creation which reserves the entire region. Then as extents are added punch holes in this reservation. This requires new resource manipulation to punch the holes and still requires an additional iteration over the extent areas which may already have existing dev dax resources used. 2) Maintain an ordered xarray of extents which can be queried while processing the resize logic. The issue is that existing region->res children may artificially limit the allocation size sent to alloc_dev_dax_range(). IE the resource children can't be directly used in the resize logic to find where space in the region is. This also poses a problem of managing the available size in 2 places. 3) Maintain a separate resource tree with extents. This option is the same as 2) but with the different data structure. Most ideally there should be a unified representation of the resource tree not two places to look for space. 4) Create region resource children for each extent. Manage the dax dev resize logic in the same way as before but use a region child (extent) resource as the parents to find space within each extent. Option 4 can leverage the existing resize algorithm to find space within the extents. It manages the available space in a singular resource tree which is less complicated for finding space. In preparation for this change, factor out the dev_dax_resize logic. For static regions use dax_region->res as the parent to find space for the dax ranges. Future patches will use the same algorithm with individual extent resources as the parent. Signed-off-by: Ira Weiny --- Changes: [iweiny: Rebase on new DAX region locking] [iweiny: Reword commit message] [iweiny: Drop reviews] --- drivers/dax/bus.c | 129 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 79 insertions(+), 50 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index d8cb5195a227..975860371d9f 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -844,11 +844,9 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) return 0; } -static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, - resource_size_t size) +static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, + u64 start, resource_size_t size) { - struct dax_region *dax_region = dev_dax->region; - struct resource *res = &dax_region->res; struct device *dev = &dev_dax->dev; struct dev_dax_range *ranges; unsigned long pgoff = 0; @@ -866,14 +864,14 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, return 0; } - alloc = __request_region(res, start, size, dev_name(dev), 0); + alloc = __request_region(parent, start, size, dev_name(dev), 0); if (!alloc) return -ENOMEM; ranges = krealloc(dev_dax->ranges, sizeof(*ranges) * (dev_dax->nr_range + 1), GFP_KERNEL); if (!ranges) { - __release_region(res, alloc->start, resource_size(alloc)); + __release_region(parent, alloc->start, resource_size(alloc)); return -ENOMEM; } @@ -1026,50 +1024,45 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) return true; } -static ssize_t dev_dax_resize(struct dax_region *dax_region, - struct dev_dax *dev_dax, resource_size_t size) +/** + * dev_dax_resize_static - Expand the device into the unused portion of the + * region. This may involve adjusting the end of an existing resource, or + * allocating a new resource. + * + * @parent: parent resource to allocate this range in + * @dev_dax: DAX device to be expanded + * @to_alloc: amount of space to alloc; must be <= space available in @parent + * + * Return the amount of space allocated or -ERRNO on failure + */ +static ssize_t dev_dax_resize_static(struct resource *parent, + struct dev_dax *dev_dax, + resource_size_t to_alloc) { - resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; - resource_size_t dev_size = dev_dax_size(dev_dax); - struct resource *region_res = &dax_region->res; - struct device *dev = &dev_dax->dev; struct resource *res, *first; - resource_size_t alloc = 0; int rc; - if (dev->driver) - return -EBUSY; - if (size == dev_size) - return 0; - if (size > dev_size && size - dev_size > avail) - return -ENOSPC; - if (size < dev_size) - return dev_dax_shrink(dev_dax, size); - - to_alloc = size - dev_size; - if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc), - "resize of %pa misaligned\n", &to_alloc)) - return -ENXIO; - - /* - * Expand the device into the unused portion of the region. This - * may involve adjusting the end of an existing resource, or - * allocating a new resource. - */ -retry: - first = region_res->child; - if (!first) - return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc); + first = parent->child; + if (!first) { + rc = alloc_dev_dax_range(parent, dev_dax, + parent->start, to_alloc); + if (rc) + return rc; + return to_alloc; + } - rc = -ENOSPC; for (res = first; res; res = res->sibling) { struct resource *next = res->sibling; + resource_size_t alloc; /* space at the beginning of the region */ - if (res == first && res->start > dax_region->res.start) { - alloc = min(res->start - dax_region->res.start, to_alloc); - rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, alloc); - break; + if (res == first && res->start > parent->start) { + alloc = min(res->start - parent->start, to_alloc); + rc = alloc_dev_dax_range(parent, dev_dax, + parent->start, alloc); + if (rc) + return rc; + return alloc; } alloc = 0; @@ -1078,21 +1071,55 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region, alloc = min(next->start - (res->end + 1), to_alloc); /* space at the end of the region */ - if (!alloc && !next && res->end < region_res->end) - alloc = min(region_res->end - res->end, to_alloc); + if (!alloc && !next && res->end < parent->end) + alloc = min(parent->end - res->end, to_alloc); if (!alloc) continue; if (adjust_ok(dev_dax, res)) { rc = adjust_dev_dax_range(dev_dax, res, resource_size(res) + alloc); - break; + if (rc) + return rc; + return alloc; } - rc = alloc_dev_dax_range(dev_dax, res->end + 1, alloc); - break; + rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc); + if (rc) + return rc; + return alloc; } - if (rc) - return rc; + + /* available was already calculated and should never be an issue */ + dev_WARN_ONCE(&dev_dax->dev, 1, "space not found?"); + return 0; +} + +static ssize_t dev_dax_resize(struct dax_region *dax_region, + struct dev_dax *dev_dax, resource_size_t size) +{ + resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; + resource_size_t dev_size = dev_dax_size(dev_dax); + struct device *dev = &dev_dax->dev; + resource_size_t alloc = 0; + + if (dev->driver) + return -EBUSY; + if (size == dev_size) + return 0; + if (size > dev_size && size - dev_size > avail) + return -ENOSPC; + if (size < dev_size) + return dev_dax_shrink(dev_dax, size); + + to_alloc = size - dev_size; + if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc), + "resize of %pa misaligned\n", &to_alloc)) + return -ENXIO; + +retry: + alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc); + if (alloc <= 0) + return alloc; to_alloc -= alloc; if (to_alloc) goto retry; @@ -1198,7 +1225,8 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr, to_alloc = range_len(&r); if (alloc_is_aligned(dev_dax, to_alloc)) - rc = alloc_dev_dax_range(dev_dax, r.start, to_alloc); + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start, + to_alloc); up_write(&dax_dev_rwsem); up_write(&dax_region_rwsem); @@ -1466,7 +1494,8 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) device_initialize(dev); dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); - rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, data->size); + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start, + data->size); if (rc) goto err_range; From patchwork Fri Aug 16 14:00:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13766372 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E39671C822E; Fri, 16 Aug 2024 14:00:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816839; cv=none; b=MJ/mlblkbn573zsVkxYG8KvwIpPulhevHruCNl4bDuMIWamcCT20H156d5n4C5Iy47vYXItDKjdqgfzVq3HpliWog6cvJdLzT3As+PBMT4hX0QUavfFP4DSqNHfJAorLgJYcVS3YcQoabaR/urbyBNnvUrGb+bPPosA1RHP0LmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723816839; c=relaxed/simple; bh=J+SqFT7xfsF0QKh/v76tk87A+rm51scrLbJvAUgNomo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ERbdq67ryPfgf28pD5tpl9lvgfkUJfGw6qTrYQbKWvpiIey0YAcboNv4BP42NPWGH8QrMCvgeHz6BPijw++j41AFcR4ZEPZwm9jkYcjLYmVZs7pNKGfPu56r8IvxILQnEl9PhzG/nYruC0SWMwuGwZacCr9lTmQpq8wGJHT1UdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ah8mMii0; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ah8mMii0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723816837; x=1755352837; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=J+SqFT7xfsF0QKh/v76tk87A+rm51scrLbJvAUgNomo=; b=Ah8mMii0TWl+1uq+qBf0yW7WdFDo4UVmiNXZ10ZhMG2h+9ScmcWq0XcF k1J8uypHRSGw+/tMnIbdFMjKBUEDIjqKOPkZke6/QM9ejdjktwYfU7Nrb a7GeikENsUouoOL2JYvPrD8OXTgYu7BFkikv2TO5bfphYStITIFl8Arax KogBvT76JQO2x3m4DpA8/vZHLvaaReovtw2in3ghB1pAr1T2CT6TpKPUM A/xA/C1st5IVDKzXom0KsHwM5M7qHuGEEMBlY+XnBGlxpSEMEbJHbzqhG na5M2bMr4uOtTl48XusqcYit4If44gaJXJTtaxNX1vWel1JVbmz0TiSRz g==; X-CSE-ConnectionGUID: 2O4l2DpxSviQFD3l2RBg4g== X-CSE-MsgGUID: 4Io8k9dNTi2sxu5JoxGNXQ== X-IronPort-AV: E=McAfee;i="6700,10204,11166"; a="22272880" X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="22272880" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:36 -0700 X-CSE-ConnectionGUID: VFJA8WiEQJKbOuizXn5cMw== X-CSE-MsgGUID: ZcypzRNHROySectoOxB1QQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,151,1719903600"; d="scan'208";a="90411614" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.125.111.52]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2024 07:00:35 -0700 From: Ira Weiny Date: Fri, 16 Aug 2024 09:00:09 -0500 Subject: [PATCH v2 21/25] dax/region: Create resources on sparse DAX regions Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240816-dcd-type2-upstream-v2-21-20189a10ad7d@intel.com> References: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> In-Reply-To: <20240816-dcd-type2-upstream-v2-0-20189a10ad7d@intel.com> To: Dave Jiang , Fan Ni , Jonathan Cameron , Navneet Singh Cc: Dan Williams , Davidlohr Bueso , Alison Schofield , Vishal Verma , Ira Weiny , linux-btrfs@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1723816790; l=26205; i=ira.weiny@intel.com; s=20221211; h=from:subject:message-id; bh=J+SqFT7xfsF0QKh/v76tk87A+rm51scrLbJvAUgNomo=; b=3VemM2fAW/tpJiHQ6Fa3ue8runJyhIvLvMMUL71oOJO7u/F7+fteKwC64PSSdmHuExeoJUQ3B HdyZ0w6n3FaB4Yt18JzVxVYfxc26TUrNT0rRkq6NgiEV5UCoW6fAd36 X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=noldbkG+Wp1qXRrrkfY1QJpDf7QsOEthbOT7vm0PqsE= DAX regions which map dynamic capacity partitions require that memory be allowed to come and go. Recall sparse regions were created for this purpose. Now that extents can be realized within DAX regions the DAX region driver can start tracking sub-resource information. The tight relationship between DAX region operations and extent operations require memory changes to be controlled synchronously with the user of the region. Synchronize through the dax_region_rwsem and by having the region driver drive both the region device as well as the extent sub-devices. Recall requests to remove extents can happen at any time and that a host is not obligated to release the memory until it is not being used. If an extent is not used allow a release response. The DAX layer has no need for the details of the CXL memory extent devices. Expose extents to the DAX layer as device children of the DAX region device. A single callback from the driver aids the DAX layer to determine if the child device is an extent. The DAX layer also registers a devres function to automatically clean up when the device is removed from the region. There is a race between extents being surfaced and the dax_cxl driver being loaded. The driver must therefore scan for any existing extents while still under the device lock. Respond to extent notifications. Manage the DAX region resource tree based on the extents lifetime. Return the status of remove notifications to lower layers such that it can manage the hardware appropriately. Signed-off-by: Navneet Singh Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny --- Changes: [iweiny: patch reorder] [iweiny: move hunks from other patches to clarify code changes and add/release flows WRT dax regions] [iweiny: use %par] [iweiny: clean up variable names] [iweiny: Simplify sparse_ops] [Fan: avoid open coding range_len()] [djbw: s/reg_ext/region_extent] --- drivers/cxl/core/extent.c | 76 +++++++++++++-- drivers/cxl/cxl.h | 6 ++ drivers/dax/bus.c | 243 +++++++++++++++++++++++++++++++++++++++++----- drivers/dax/bus.h | 3 +- drivers/dax/cxl.c | 63 +++++++++++- drivers/dax/dax-private.h | 34 +++++++ drivers/dax/hmem/hmem.c | 2 +- drivers/dax/pmem.c | 2 +- 8 files changed, 391 insertions(+), 38 deletions(-) diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c index d7d526a51e2b..103b0bec3a4a 100644 --- a/drivers/cxl/core/extent.c +++ b/drivers/cxl/core/extent.c @@ -271,20 +271,67 @@ static void calc_hpa_range(struct cxl_endpoint_decoder *cxled, hpa_range->end = hpa_range->start + range_len(dpa_range) - 1; } +static int cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event, + struct region_extent *region_extent) +{ + struct cxl_dax_region *cxlr_dax; + struct device *dev; + int rc = 0; + + cxlr_dax = cxlr->cxlr_dax; + dev = &cxlr_dax->dev; + dev_dbg(dev, "Trying notify: type %d HPA %par\n", + event, ®ion_extent->hpa_range); + + /* + * NOTE the lack of a driver indicates a notification has failed. No + * user space coordiantion was possible. + */ + device_lock(dev); + if (dev->driver) { + struct cxl_driver *driver = to_cxl_drv(dev->driver); + struct cxl_notify_data notify_data = (struct cxl_notify_data) { + .event = event, + .region_extent = region_extent, + }; + + if (driver->notify) { + dev_dbg(dev, "Notify: type %d HPA %par\n", + event, ®ion_extent->hpa_range); + rc = driver->notify(dev, ¬ify_data); + } + } + device_unlock(dev); + return rc; +} + +struct rm_data { + struct cxl_region *cxlr; + struct range *range; +}; + static int cxlr_rm_extent(struct device *dev, void *data) { struct region_extent *region_extent = to_region_extent(dev); - struct range *region_hpa_range = data; + struct rm_data *rm_data = data; + int rc; if (!region_extent) return 0; /* - * Any extent which 'touches' the released range is removed. + * Any extent which 'touches' the released range is attempted to be + * removed. */ - if (range_overlaps(region_hpa_range, ®ion_extent->hpa_range)) { + if (range_overlaps(rm_data->range, ®ion_extent->hpa_range)) { + struct cxl_region *cxlr = rm_data->cxlr; + dev_dbg(dev, "Remove region extent HPA %par\n", ®ion_extent->hpa_range); + rc = cxlr_notify_extent(cxlr, DCD_RELEASE_CAPACITY, region_extent); + if (rc == -EBUSY) + return 0; + /* Extent not in use or error, remove it */ region_rm_extent(region_extent); } return 0; @@ -312,8 +359,13 @@ int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent) calc_hpa_range(cxled, cxlr->cxlr_dax, &dpa_range, &hpa_range); + struct rm_data rm_data = { + .cxlr = cxlr, + .range = &hpa_range, + }; + /* Remove region extents which overlap */ - return device_for_each_child(&cxlr->cxlr_dax->dev, &hpa_range, + return device_for_each_child(&cxlr->cxlr_dax->dev, &rm_data, cxlr_rm_extent); } @@ -338,8 +390,20 @@ static int cxlr_add_extent(struct cxl_dax_region *cxlr_dax, return rc; } - /* device model handles freeing region_extent */ - return online_region_extent(region_extent); + rc = online_region_extent(region_extent); + /* device model handled freeing region_extent */ + if (rc) + return rc; + + rc = cxlr_notify_extent(cxlr_dax->cxlr, DCD_ADD_CAPACITY, region_extent); + /* + * The region device was breifly live but DAX layer ensures it was not + * used + */ + if (rc) + region_rm_extent(region_extent); + + return rc; } /* Callers are expected to ensure cxled has been attached to a region */ diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 30bfd1570c63..3ce3fe354c77 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -916,10 +916,16 @@ bool is_cxl_region(struct device *dev); extern struct bus_type cxl_bus_type; +struct cxl_notify_data { + enum dc_event event; + struct region_extent *region_extent; +}; + struct cxl_driver { const char *name; int (*probe)(struct device *dev); void (*remove)(struct device *dev); + int (*notify)(struct device *dev, struct cxl_notify_data *notify_data); struct device_driver drv; int id; }; diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 975860371d9f..f14b0cfa7edd 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -183,6 +183,83 @@ static bool is_sparse(struct dax_region *dax_region) return (dax_region->res.flags & IORESOURCE_DAX_SPARSE_CAP) != 0; } +static void __dax_release_resource(struct dax_resource *dax_resource) +{ + struct dax_region *dax_region = dax_resource->region; + + lockdep_assert_held_write(&dax_region_rwsem); + dev_dbg(dax_region->dev, "Extent release resource %pr\n", + dax_resource->res); + if (dax_resource->res) + __release_region(&dax_region->res, dax_resource->res->start, + resource_size(dax_resource->res)); + dax_resource->res = NULL; +} + +static void dax_release_resource(void *res) +{ + struct dax_resource *dax_resource = res; + + guard(rwsem_write)(&dax_region_rwsem); + __dax_release_resource(dax_resource); + kfree(dax_resource); +} + +int dax_region_add_resource(struct dax_region *dax_region, + struct device *device, + resource_size_t start, resource_size_t length) +{ + struct resource *new_resource; + int rc; + + struct dax_resource *dax_resource __free(kfree) = + kzalloc(sizeof(*dax_resource), GFP_KERNEL); + if (!dax_resource) + return -ENOMEM; + + guard(rwsem_write)(&dax_region_rwsem); + + dev_dbg(dax_region->dev, "DAX region resource %pr\n", &dax_region->res); + new_resource = __request_region(&dax_region->res, start, length, "extent", 0); + if (!new_resource) { + dev_err(dax_region->dev, "Failed to add region s:%pa l:%pa\n", + &start, &length); + return -ENOSPC; + } + + dev_dbg(dax_region->dev, "add resource %pr\n", new_resource); + dax_resource->region = dax_region; + dax_resource->res = new_resource; + dev_set_drvdata(device, dax_resource); + rc = devm_add_action_or_reset(device, dax_release_resource, + no_free_ptr(dax_resource)); + /* On error; ensure driver data is cleared under semaphore */ + if (rc) + dev_set_drvdata(device, NULL); + return rc; +} +EXPORT_SYMBOL_GPL(dax_region_add_resource); + +int dax_region_rm_resource(struct dax_region *dax_region, + struct device *dev) +{ + struct dax_resource *dax_resource; + + guard(rwsem_write)(&dax_region_rwsem); + + dax_resource = dev_get_drvdata(dev); + if (!dax_resource) + return 0; + + if (dax_resource->use_cnt) + return -EBUSY; + + /* avoid races with users trying to use the extent */ + __dax_release_resource(dax_resource); + return 0; +} +EXPORT_SYMBOL_GPL(dax_region_rm_resource); + bool static_dev_dax(struct dev_dax *dev_dax) { return is_static(dev_dax->region); @@ -296,19 +373,44 @@ static ssize_t region_align_show(struct device *dev, static struct device_attribute dev_attr_region_align = __ATTR(align, 0400, region_align_show, NULL); +#define for_each_child_resource(extent, res) \ + for (res = (extent)->child; res; res = res->sibling) + +resource_size_t +dax_avail_size(struct resource *dax_resource) +{ + resource_size_t rc; + struct resource *used_res; + + rc = resource_size(dax_resource); + for_each_child_resource(dax_resource, used_res) + rc -= resource_size(used_res); + return rc; +} +EXPORT_SYMBOL_GPL(dax_avail_size); + #define for_each_dax_region_resource(dax_region, res) \ for (res = (dax_region)->res.child; res; res = res->sibling) static unsigned long long dax_region_avail_size(struct dax_region *dax_region) { - resource_size_t size = resource_size(&dax_region->res); + resource_size_t size; struct resource *res; lockdep_assert_held(&dax_region_rwsem); - if (is_sparse(dax_region)) - return 0; + if (is_sparse(dax_region)) { + /* + * Children of a sparse region represent available space not + * used space. + */ + size = 0; + for_each_dax_region_resource(dax_region, res) + size += dax_avail_size(res); + return size; + } + size = resource_size(&dax_region->res); for_each_dax_region_resource(dax_region, res) size -= resource_size(res); return size; @@ -449,15 +551,26 @@ EXPORT_SYMBOL_GPL(kill_dev_dax); static void trim_dev_dax_range(struct dev_dax *dev_dax) { int i = dev_dax->nr_range - 1; - struct range *range = &dev_dax->ranges[i].range; + struct dev_dax_range *dev_range = &dev_dax->ranges[i]; + struct range *range = &dev_range->range; struct dax_region *dax_region = dev_dax->region; + struct resource *res = &dax_region->res; lockdep_assert_held_write(&dax_region_rwsem); dev_dbg(&dev_dax->dev, "delete range[%d]: %#llx:%#llx\n", i, (unsigned long long)range->start, (unsigned long long)range->end); - __release_region(&dax_region->res, range->start, range_len(range)); + if (dev_range->dax_resource) { + res = dev_range->dax_resource->res; + dev_dbg(&dev_dax->dev, "Trim sparse extent %pr\n", res); + } + + __release_region(res, range->start, range_len(range)); + + if (dev_range->dax_resource) + dev_range->dax_resource->use_cnt--; + if (--dev_dax->nr_range == 0) { kfree(dev_dax->ranges); dev_dax->ranges = NULL; @@ -640,7 +753,7 @@ static void dax_region_unregister(void *region) struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, - unsigned long flags) + unsigned long flags, struct dax_sparse_ops *sparse_ops) { struct dax_region *dax_region; @@ -658,12 +771,16 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, || !IS_ALIGNED(range_len(range), align)) return NULL; + if (!sparse_ops && (flags & IORESOURCE_DAX_SPARSE_CAP)) + return NULL; + dax_region = kzalloc(sizeof(*dax_region), GFP_KERNEL); if (!dax_region) return NULL; dev_set_drvdata(parent, dax_region); kref_init(&dax_region->kref); + dax_region->sparse_ops = sparse_ops; dax_region->id = region_id; dax_region->align = align; dax_region->dev = parent; @@ -845,7 +962,8 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) } static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, - u64 start, resource_size_t size) + u64 start, resource_size_t size, + struct dax_resource *dax_resource) { struct device *dev = &dev_dax->dev; struct dev_dax_range *ranges; @@ -884,6 +1002,7 @@ static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax, .start = alloc->start, .end = alloc->end, }, + .dax_resource = dax_resource, }; dev_dbg(dev, "alloc range[%d]: %pa:%pa\n", dev_dax->nr_range - 1, @@ -966,7 +1085,8 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) int i; for (i = dev_dax->nr_range - 1; i >= 0; i--) { - struct range *range = &dev_dax->ranges[i].range; + struct dev_dax_range *dev_range = &dev_dax->ranges[i]; + struct range *range = &dev_range->range; struct dax_mapping *mapping = dev_dax->ranges[i].mapping; struct resource *adjust = NULL, *res; resource_size_t shrink; @@ -982,12 +1102,21 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) continue; } - for_each_dax_region_resource(dax_region, res) - if (strcmp(res->name, dev_name(dev)) == 0 - && res->start == range->start) { - adjust = res; - break; - } + if (dev_range->dax_resource) { + for_each_child_resource(dev_range->dax_resource->res, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + } else { + for_each_dax_region_resource(dax_region, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + } if (dev_WARN_ONCE(dev, !adjust || i != dev_dax->nr_range - 1, "failed to find matching resource\n")) @@ -1025,19 +1154,21 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) } /** - * dev_dax_resize_static - Expand the device into the unused portion of the - * region. This may involve adjusting the end of an existing resource, or - * allocating a new resource. + * __dev_dax_resize - Expand the device into the unused portion of the region. + * This may involve adjusting the end of an existing resource, or allocating a + * new resource. * * @parent: parent resource to allocate this range in * @dev_dax: DAX device to be expanded * @to_alloc: amount of space to alloc; must be <= space available in @parent + * @dax_resource: if sparse; the parent resource * * Return the amount of space allocated or -ERRNO on failure */ -static ssize_t dev_dax_resize_static(struct resource *parent, - struct dev_dax *dev_dax, - resource_size_t to_alloc) +static ssize_t __dev_dax_resize(struct resource *parent, + struct dev_dax *dev_dax, + resource_size_t to_alloc, + struct dax_resource *dax_resource) { struct resource *res, *first; int rc; @@ -1045,7 +1176,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, first = parent->child; if (!first) { rc = alloc_dev_dax_range(parent, dev_dax, - parent->start, to_alloc); + parent->start, to_alloc, + dax_resource); if (rc) return rc; return to_alloc; @@ -1059,7 +1191,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, if (res == first && res->start > parent->start) { alloc = min(res->start - parent->start, to_alloc); rc = alloc_dev_dax_range(parent, dev_dax, - parent->start, alloc); + parent->start, alloc, + dax_resource); if (rc) return rc; return alloc; @@ -1083,7 +1216,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent, return rc; return alloc; } - rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc); + rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc, + dax_resource); if (rc) return rc; return alloc; @@ -1094,6 +1228,54 @@ static ssize_t dev_dax_resize_static(struct resource *parent, return 0; } +static ssize_t dev_dax_resize_static(struct dax_region *dax_region, + struct dev_dax *dev_dax, + resource_size_t to_alloc) +{ + return __dev_dax_resize(&dax_region->res, dev_dax, to_alloc, NULL); +} + +static int find_free_extent(struct device *dev, void *data) +{ + struct dax_region *dax_region = data; + struct dax_resource *dax_resource; + + if (!dax_region->sparse_ops->is_extent(dev)) + return 0; + + dax_resource = dev_get_drvdata(dev); + if (!dax_resource || !dax_avail_size(dax_resource->res)) + return 0; + return 1; +} + +static ssize_t dev_dax_resize_sparse(struct dax_region *dax_region, + struct dev_dax *dev_dax, + resource_size_t to_alloc) +{ + struct dax_resource *dax_resource; + resource_size_t available_size; + struct device *extent_dev; + ssize_t alloc; + + extent_dev = device_find_child(dax_region->dev, dax_region, + find_free_extent); + if (!extent_dev) + return 0; + + dax_resource = dev_get_drvdata(extent_dev); + if (!dax_resource) + return 0; + + available_size = dax_avail_size(dax_resource->res); + to_alloc = min(available_size, to_alloc); + alloc = __dev_dax_resize(dax_resource->res, dev_dax, to_alloc, dax_resource); + if (alloc > 0) + dax_resource->use_cnt++; + put_device(extent_dev); + return alloc; +} + static ssize_t dev_dax_resize(struct dax_region *dax_region, struct dev_dax *dev_dax, resource_size_t size) { @@ -1117,7 +1299,10 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region, return -ENXIO; retry: - alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc); + if (is_sparse(dax_region)) + alloc = dev_dax_resize_sparse(dax_region, dev_dax, to_alloc); + else + alloc = dev_dax_resize_static(dax_region, dev_dax, to_alloc); if (alloc <= 0) return alloc; to_alloc -= alloc; @@ -1226,7 +1411,7 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr, to_alloc = range_len(&r); if (alloc_is_aligned(dev_dax, to_alloc)) rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start, - to_alloc); + to_alloc, NULL); up_write(&dax_dev_rwsem); up_write(&dax_region_rwsem); @@ -1494,8 +1679,14 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) device_initialize(dev); dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); + if (is_sparse(dax_region) && data->size) { + dev_err(parent, "Sparse DAX region devices are created initially with 0 size"); + rc = -EINVAL; + goto err_id; + } + rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start, - data->size); + data->size, NULL); if (rc) goto err_range; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 783bfeef42cc..ae5029ea6047 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -9,6 +9,7 @@ struct dev_dax; struct resource; struct dax_device; struct dax_region; +struct dax_sparse_ops; /* dax bus specific ioresource flags */ #define IORESOURCE_DAX_STATIC BIT(0) @@ -17,7 +18,7 @@ struct dax_region; struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, - unsigned long flags); + unsigned long flags, struct dax_sparse_ops *sparse_ops); struct dev_dax_data { struct dax_region *dax_region; diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c index 367e86b1c22a..bf3b82b0120d 100644 --- a/drivers/dax/cxl.c +++ b/drivers/dax/cxl.c @@ -5,6 +5,60 @@ #include "../cxl/cxl.h" #include "bus.h" +#include "dax-private.h" + +static int __cxl_dax_add_resource(struct dax_region *dax_region, + struct region_extent *region_extent) +{ + resource_size_t start, length; + struct device *dev; + + dev = ®ion_extent->dev; + start = dax_region->res.start + region_extent->hpa_range.start; + length = range_len(®ion_extent->hpa_range); + return dax_region_add_resource(dax_region, dev, start, length); +} + +static int cxl_dax_add_resource(struct device *dev, void *data) +{ + struct dax_region *dax_region = data; + struct region_extent *region_extent; + + region_extent = to_region_extent(dev); + if (!region_extent) + return 0; + + dev_dbg(dax_region->dev, "Adding resource HPA %par\n", + ®ion_extent->hpa_range); + + return __cxl_dax_add_resource(dax_region, region_extent); +} + +static int cxl_dax_region_notify(struct device *dev, + struct cxl_notify_data *notify_data) +{ + struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); + struct dax_region *dax_region = dev_get_drvdata(dev); + struct region_extent *region_extent = notify_data->region_extent; + + switch (notify_data->event) { + case DCD_ADD_CAPACITY: + return __cxl_dax_add_resource(dax_region, region_extent); + case DCD_RELEASE_CAPACITY: + return dax_region_rm_resource(dax_region, ®ion_extent->dev); + case DCD_FORCED_CAPACITY_RELEASE: + default: + dev_err(&cxlr_dax->dev, "Unknown DC event %d\n", + notify_data->event); + break; + } + + return -ENXIO; +} + +struct dax_sparse_ops sparse_ops = { + .is_extent = is_region_extent, +}; static int cxl_dax_region_probe(struct device *dev) { @@ -24,14 +78,16 @@ static int cxl_dax_region_probe(struct device *dev) flags |= IORESOURCE_DAX_SPARSE_CAP; dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid, - PMD_SIZE, flags); + PMD_SIZE, flags, &sparse_ops); if (!dax_region) return -ENOMEM; - if (cxlr->mode == CXL_REGION_DC) + if (cxlr->mode == CXL_REGION_DC) { + device_for_each_child(&cxlr_dax->dev, dax_region, + cxl_dax_add_resource); /* Add empty seed dax device */ dev_size = 0; - else + } else dev_size = range_len(&cxlr_dax->hpa_range); data = (struct dev_dax_data) { @@ -47,6 +103,7 @@ static int cxl_dax_region_probe(struct device *dev) static struct cxl_driver cxl_dax_region_driver = { .name = "cxl_dax_region", .probe = cxl_dax_region_probe, + .notify = cxl_dax_region_notify, .id = CXL_DEVICE_DAX_REGION, .drv = { .suppress_bind_attrs = true, diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index ccde98c3d4e2..9e9f98c85620 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -16,6 +16,36 @@ struct inode *dax_inode(struct dax_device *dax_dev); int dax_bus_init(void); void dax_bus_exit(void); +/** + * struct dax_resource - For sparse regions; an active resource + * @region: dax_region this resources is in + * @res: resource + * @use_cnt: count the number of uses of this resource + * + * Changes to the dax_reigon and the dax_resources within it are protected by + * dax_region_rwsem + */ +struct dax_resource { + struct dax_region *region; + struct resource *res; + unsigned int use_cnt; +}; +int dax_region_add_resource(struct dax_region *dax_region, struct device *dev, + resource_size_t start, resource_size_t length); +int dax_region_rm_resource(struct dax_region *dax_region, + struct device *dev); +resource_size_t dax_avail_size(struct resource *dax_resource); + +typedef int (*match_cb)(struct device *dev, resource_size_t *size_avail); + +/** + * struct dax_sparse_ops - Operations for sparse regions + * @is_extent: return if the device is an extent + */ +struct dax_sparse_ops { + bool (*is_extent)(struct device *dev); +}; + /** * struct dax_region - mapping infrastructure for dax devices * @id: kernel-wide unique region for a memory range @@ -27,6 +57,7 @@ void dax_bus_exit(void); * @res: resource tree to track instance allocations * @seed: allow userspace to find the first unbound seed device * @youngest: allow userspace to find the most recently created device + * @sparse_ops: operations required for sparse regions */ struct dax_region { int id; @@ -38,6 +69,7 @@ struct dax_region { struct resource res; struct device *seed; struct device *youngest; + struct dax_sparse_ops *sparse_ops; }; struct dax_mapping { @@ -62,6 +94,7 @@ struct dax_mapping { * @pgoff: page offset * @range: resource-span * @mapping: device to assist in interrogating the range layout + * @dax_resource: if not NULL; dax sparse resource containing this range */ struct dev_dax { struct dax_region *region; @@ -79,6 +112,7 @@ struct dev_dax { unsigned long pgoff; struct range range; struct dax_mapping *mapping; + struct dax_resource *dax_resource; } *ranges; }; diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index 5e7c53f18491..0eea65052874 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -28,7 +28,7 @@ static int dax_hmem_probe(struct platform_device *pdev) mri = dev->platform_data; dax_region = alloc_dax_region(dev, pdev->id, &mri->range, - mri->target_node, PMD_SIZE, flags); + mri->target_node, PMD_SIZE, flags, NULL); if (!dax_region) return -ENOMEM; diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index c8ebf4e281f2..f927e855f240 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -54,7 +54,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev) range.start += offset; dax_region = alloc_dax_region(dev, region_id, &range, nd_region->target_node, le32_to_cpu(pfn_sb->align), - IORESOURCE_DAX_STATIC); + IORESOURCE_DAX_STATIC, NULL); if (!dax_region) return ERR_PTR(-ENOMEM);