From patchwork Fri Jul 15 00:00:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918544 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 820686D19 for ; Fri, 15 Jul 2022 00:00:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843249; x=1689379249; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IX0zDmBSJKnXh8wY1T5AtKjkfjgtsxcgcyDoSxKglnk=; b=gPMFj9bYae/odJNPoBlfI14CkLbEkFuanInvUtO8KiZF7pjlebS2kyBM zKpnKU4zkhJwT7sUPypPXTJcRRUhKWR7txlgaopzhbTfkTepMHjSD6/W0 iy42uwqXNfd8UtqpPI2cHsmELjimZPug5YSJtjDaiv4n6lPFlTQBr0wO/ 4xlO0ngLfunlCtb6aSdKBfPQ/rq7Nm/ji3LCcj2zWds25NnTpwanQ3zI1 y3Vj7CBuaS6iVmb+7T8w3yqYVxDpP7IjMFnF6HomDWnnVwRLmr8yKVHMv a9ywiClxw8VmwCjEpSoJAVv2w+TcG3nuCJ1mS0eMQPiU7H8BHkrBpTaIJ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="286799304" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="286799304" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:00:48 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="738462685" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:00:48 -0700 Subject: [PATCH v2 01/28] Documentation/cxl: Use a double line break between entries From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:00:47 -0700 Message-ID: <165784324750.1758207.10379257962719807754.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Make it easier to read delineations between the "Description" line break, new paragraph line breaks, and new entries. Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 1fd5984b6158..16d9ffa94bbd 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -7,6 +7,7 @@ Description: all descendant memdevs for unbind. Writing '1' to this attribute flushes that work. + What: /sys/bus/cxl/devices/memX/firmware_version Date: December, 2020 KernelVersion: v5.12 @@ -16,6 +17,7 @@ Description: Memory Device Output Payload in the CXL-2.0 specification. + What: /sys/bus/cxl/devices/memX/ram/size Date: December, 2020 KernelVersion: v5.12 @@ -25,6 +27,7 @@ Description: identically named field in the Identify Memory Device Output Payload in the CXL-2.0 specification. + What: /sys/bus/cxl/devices/memX/pmem/size Date: December, 2020 KernelVersion: v5.12 @@ -34,6 +37,7 @@ Description: identically named field in the Identify Memory Device Output Payload in the CXL-2.0 specification. + What: /sys/bus/cxl/devices/memX/serial Date: January, 2022 KernelVersion: v5.18 @@ -43,6 +47,7 @@ Description: capability. Mandatory for CXL devices, see CXL 2.0 8.1.12.2 Memory Device PCIe Capabilities and Extended Capabilities. + What: /sys/bus/cxl/devices/memX/numa_node Date: January, 2022 KernelVersion: v5.18 @@ -52,6 +57,7 @@ Description: host PCI device for this memory device, emit the CPU node affinity for this device. + What: /sys/bus/cxl/devices/*/devtype Date: June, 2021 KernelVersion: v5.14 @@ -61,6 +67,7 @@ Description: mirrors the same value communicated in the DEVTYPE environment variable for uevents for devices on the "cxl" bus. + What: /sys/bus/cxl/devices/*/modalias Date: December, 2021 KernelVersion: v5.18 @@ -70,6 +77,7 @@ Description: mirrors the same value communicated in the MODALIAS environment variable for uevents for devices on the "cxl" bus. + What: /sys/bus/cxl/devices/portX/uport Date: June, 2021 KernelVersion: v5.14 @@ -81,6 +89,7 @@ Description: the CXL portX object to the device that published the CXL port capability. + What: /sys/bus/cxl/devices/portX/dportY Date: June, 2021 KernelVersion: v5.14 @@ -94,6 +103,7 @@ Description: integer reflects the hardware port unique-id used in the hardware decoder target list. + What: /sys/bus/cxl/devices/decoderX.Y Date: June, 2021 KernelVersion: v5.14 @@ -106,6 +116,7 @@ Description: cxl_port container of this decoder, and 'Y' represents the instance id of a given decoder resource. + What: /sys/bus/cxl/devices/decoderX.Y/{start,size} Date: June, 2021 KernelVersion: v5.14 @@ -120,6 +131,7 @@ Description: and dynamically updates based on the active memory regions in that address space. + What: /sys/bus/cxl/devices/decoderX.Y/locked Date: June, 2021 KernelVersion: v5.14 @@ -132,6 +144,7 @@ Description: secondary bus reset, of the PCIe bridge that provides the bus for this decoders uport, unlocks / resets the decoder. + What: /sys/bus/cxl/devices/decoderX.Y/target_list Date: June, 2021 KernelVersion: v5.14 @@ -142,6 +155,7 @@ Description: configured interleave order of the decoder's dport instances. Each entry in the list is a dport id. + What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3} Date: June, 2021 KernelVersion: v5.14 @@ -154,6 +168,7 @@ Description: memory, volatile memory, accelerator memory, and / or expander memory may be mapped behind this decoder's memory window. + What: /sys/bus/cxl/devices/decoderX.Y/target_type Date: June, 2021 KernelVersion: v5.14 From patchwork Fri Jul 15 00:00:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918545 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C0AA6D17 for ; Fri, 15 Jul 2022 00:00:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843256; x=1689379256; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dtdJFh7kmK+xJYbCR5yQu8ImFrxaMhVVXsLFYvEJhPc=; b=nGZ1JhPl6nUbogLPCQH7C9BT2+iD0E6HqDMdMGPtncfxh2V/ZQOYtsC5 HwOkhXLUADQI6hwjU5ZzSUMYL7Oh2Y1qcl68keuW+3gWnp9NmXWRl514l a1q+Ze6eLLWJB7mB88C+YbXXO+voFeQEVYRm7b7kCxKtlglp1ubSuQE2g Rsgw8CEYwaC5EG9Z3v3ZVQhGA56pubJTNYXyYluZ0wq9WBuYflOo8zz3l W/EGsCqi+2gJfOvrS+HEWRyEyj4v9FCEohOS4E1MgQlhQ8r9WfbbhRdra V+Op6GsaviDLofXpih/Q0s/Qkz4DMtQw+o6CuA1nVIeSr7D4TU8MU1EEU A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="266072932" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="266072932" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:00:54 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="600309476" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:00:53 -0700 Subject: [PATCH v2 02/28] cxl/core: Define a 'struct cxl_switch_decoder' From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:00:53 -0700 Message-ID: <165784325340.1758207.5064717153608954960.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently 'struct cxl_decoder' contains the superset of attributes needed for all decoder types. Before more type-specific attributes are added to the common definition, reorganize 'struct cxl_decoder' into type specific objects. This patch, the first of three, factors out a cxl_switch_decoder type. See the new kdoc for what a 'struct cxl_switch_decoder' represents in a CXL topology. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/acpi.c | 4 + drivers/cxl/core/hdm.c | 33 +++++-- drivers/cxl/core/port.c | 188 ++++++++++++++++++++++++++++-------------- drivers/cxl/cxl.h | 30 +++++-- tools/testing/cxl/test/cxl.c | 23 ++++- 5 files changed, 189 insertions(+), 89 deletions(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 541fc0b28b8f..62bf22ffb7aa 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -81,6 +81,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, int target_map[CXL_DECODER_MAX_INTERLEAVE]; struct cxl_cfmws_context *ctx = arg; struct cxl_port *root_port = ctx->root_port; + struct cxl_switch_decoder *cxlsd; struct device *dev = ctx->dev; struct acpi_cedt_cfmws *cfmws; struct cxl_decoder *cxld; @@ -106,10 +107,11 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, for (i = 0; i < ways; i++) target_map[i] = cfmws->interleave_targets[i]; - cxld = cxl_root_decoder_alloc(root_port, ways); + cxlsd = cxl_root_decoder_alloc(root_port, ways); if (IS_ERR(cxld)) return 0; + cxld = &cxlsd->cxld; cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions); cxld->target_type = CXL_DECODER_EXPANDER; cxld->hpa_range = (struct range) { diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index c524e772fdae..2f10d42798de 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -49,20 +49,20 @@ static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, */ int devm_cxl_add_passthrough_decoder(struct cxl_port *port) { - struct cxl_decoder *cxld; + struct cxl_switch_decoder *cxlsd; struct cxl_dport *dport; int single_port_map[1]; - cxld = cxl_switch_decoder_alloc(port, 1); - if (IS_ERR(cxld)) - return PTR_ERR(cxld); + cxlsd = cxl_switch_decoder_alloc(port, 1); + if (IS_ERR(cxlsd)) + return PTR_ERR(cxlsd); device_lock_assert(&port->dev); dport = list_first_entry(&port->dports, typeof(*dport), list); single_port_map[0] = dport->port_id; - return add_hdm_decoder(port, cxld, single_port_map); + return add_hdm_decoder(port, &cxlsd->cxld, single_port_map); } EXPORT_SYMBOL_NS_GPL(devm_cxl_add_passthrough_decoder, CXL); @@ -255,14 +255,23 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) int rc, target_count = cxlhdm->target_count; struct cxl_decoder *cxld; - if (is_cxl_endpoint(port)) + if (is_cxl_endpoint(port)) { cxld = cxl_endpoint_decoder_alloc(port); - else - cxld = cxl_switch_decoder_alloc(port, target_count); - if (IS_ERR(cxld)) { - dev_warn(&port->dev, - "Failed to allocate the decoder\n"); - return PTR_ERR(cxld); + if (IS_ERR(cxld)) { + dev_warn(&port->dev, + "Failed to allocate the decoder\n"); + return PTR_ERR(cxld); + } + } else { + struct cxl_switch_decoder *cxlsd; + + cxlsd = cxl_switch_decoder_alloc(port, target_count); + if (IS_ERR(cxlsd)) { + dev_warn(&port->dev, + "Failed to allocate the decoder\n"); + return PTR_ERR(cxlsd); + } + cxld = &cxlsd->cxld; } rc = init_hdm_decoder(port, cxld, target_map, hdm, i); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index f62c0a6e17ea..27a2a6b839aa 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -120,20 +120,21 @@ static ssize_t target_type_show(struct device *dev, } static DEVICE_ATTR_RO(target_type); -static ssize_t emit_target_list(struct cxl_decoder *cxld, char *buf) +static ssize_t emit_target_list(struct cxl_switch_decoder *cxlsd, char *buf) { + struct cxl_decoder *cxld = &cxlsd->cxld; ssize_t offset = 0; int i, rc = 0; for (i = 0; i < cxld->interleave_ways; i++) { - struct cxl_dport *dport = cxld->target[i]; + struct cxl_dport *dport = cxlsd->target[i]; struct cxl_dport *next = NULL; if (!dport) break; if (i + 1 < cxld->interleave_ways) - next = cxld->target[i + 1]; + next = cxlsd->target[i + 1]; rc = sysfs_emit_at(buf, offset, "%d%s", dport->port_id, next ? "," : ""); if (rc < 0) @@ -144,18 +145,20 @@ static ssize_t emit_target_list(struct cxl_decoder *cxld, char *buf) return offset; } +static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev); + static ssize_t target_list_show(struct device *dev, struct device_attribute *attr, char *buf) { - struct cxl_decoder *cxld = to_cxl_decoder(dev); + struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev); ssize_t offset; unsigned int seq; int rc; do { - seq = read_seqbegin(&cxld->target_lock); - rc = emit_target_list(cxld, buf); - } while (read_seqretry(&cxld->target_lock, seq)); + seq = read_seqbegin(&cxlsd->target_lock); + rc = emit_target_list(cxlsd, buf); + } while (read_seqretry(&cxlsd->target_lock, seq)); if (rc < 0) return rc; @@ -233,14 +236,28 @@ static const struct attribute_group *cxl_decoder_endpoint_attribute_groups[] = { NULL, }; +static void __cxl_decoder_release(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + + ida_free(&port->decoder_ida, cxld->id); + put_device(&port->dev); +} + static void cxl_decoder_release(struct device *dev) { struct cxl_decoder *cxld = to_cxl_decoder(dev); - struct cxl_port *port = to_cxl_port(dev->parent); - ida_free(&port->decoder_ida, cxld->id); + __cxl_decoder_release(cxld); kfree(cxld); - put_device(&port->dev); +} + +static void cxl_switch_decoder_release(struct device *dev) +{ + struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev); + + __cxl_decoder_release(&cxlsd->cxld); + kfree(cxlsd); } static const struct device_type cxl_decoder_endpoint_type = { @@ -251,13 +268,13 @@ static const struct device_type cxl_decoder_endpoint_type = { static const struct device_type cxl_decoder_switch_type = { .name = "cxl_decoder_switch", - .release = cxl_decoder_release, + .release = cxl_switch_decoder_release, .groups = cxl_decoder_switch_attribute_groups, }; static const struct device_type cxl_decoder_root_type = { .name = "cxl_decoder_root", - .release = cxl_decoder_release, + .release = cxl_switch_decoder_release, .groups = cxl_decoder_root_attribute_groups, }; @@ -272,15 +289,29 @@ bool is_root_decoder(struct device *dev) } EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL); +static bool is_switch_decoder(struct device *dev) +{ + return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type; +} + struct cxl_decoder *to_cxl_decoder(struct device *dev) { - if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release, + if (dev_WARN_ONCE(dev, + !is_switch_decoder(dev) && !is_endpoint_decoder(dev), "not a cxl_decoder device\n")) return NULL; return container_of(dev, struct cxl_decoder, dev); } EXPORT_SYMBOL_NS_GPL(to_cxl_decoder, CXL); +static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev) +{ + if (dev_WARN_ONCE(dev, !is_switch_decoder(dev), + "not a cxl_switch_decoder device\n")) + return NULL; + return container_of(dev, struct cxl_switch_decoder, cxld.dev); +} + static void cxl_ep_release(struct cxl_ep *ep) { if (!ep) @@ -1146,7 +1177,7 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port, } EXPORT_SYMBOL_NS_GPL(cxl_find_dport_by_dev, CXL); -static int decoder_populate_targets(struct cxl_decoder *cxld, +static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd, struct cxl_port *port, int *target_map) { int i, rc = 0; @@ -1159,17 +1190,17 @@ static int decoder_populate_targets(struct cxl_decoder *cxld, if (list_empty(&port->dports)) return -EINVAL; - write_seqlock(&cxld->target_lock); - for (i = 0; i < cxld->nr_targets; i++) { + write_seqlock(&cxlsd->target_lock); + for (i = 0; i < cxlsd->nr_targets; i++) { struct cxl_dport *dport = find_dport(port, target_map[i]); if (!dport) { rc = -ENXIO; break; } - cxld->target[i] = dport; + cxlsd->target[i] = dport; } - write_sequnlock(&cxld->target_lock); + write_sequnlock(&cxlsd->target_lock); return rc; } @@ -1177,56 +1208,34 @@ static int decoder_populate_targets(struct cxl_decoder *cxld, static struct lock_class_key cxl_decoder_key; /** - * cxl_decoder_alloc - Allocate a new CXL decoder + * cxl_decoder_alloc - Common decoder setup / initialization * @port: owning port of this decoder - * @nr_targets: downstream targets accessible by this decoder. All upstream - * ports and root ports must have at least 1 target. Endpoint - * devices will have 0 targets. Callers wishing to register an - * endpoint device should specify 0. - * - * A port should contain one or more decoders. Each of those decoders enable - * some address space for CXL.mem utilization. A decoder is expected to be - * configured by the caller before registering. + * @cxld: common decoder properties to initialize * - * Return: A new cxl decoder to be registered by cxl_decoder_add(). The decoder - * is initialized to be a "passthrough" decoder. + * A port may contain one or more decoders. Each of those decoders + * enable some address space for CXL.mem utilization. A decoder is + * expected to be configured by the caller before registering via + * cxl_decoder_add() */ -static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port, - unsigned int nr_targets) +static int cxl_decoder_init(struct cxl_port *port, struct cxl_decoder *cxld) { - struct cxl_decoder *cxld; struct device *dev; - int rc = 0; - - if (nr_targets > CXL_DECODER_MAX_INTERLEAVE) - return ERR_PTR(-EINVAL); - - cxld = kzalloc(struct_size(cxld, target, nr_targets), GFP_KERNEL); - if (!cxld) - return ERR_PTR(-ENOMEM); + int rc; rc = ida_alloc(&port->decoder_ida, GFP_KERNEL); if (rc < 0) - goto err; + return rc; /* need parent to stick around to release the id */ get_device(&port->dev); cxld->id = rc; - cxld->nr_targets = nr_targets; - seqlock_init(&cxld->target_lock); dev = &cxld->dev; device_initialize(dev); lockdep_set_class(&dev->mutex, &cxl_decoder_key); device_set_pm_not_required(dev); dev->parent = &port->dev; dev->bus = &cxl_bus_type; - if (is_cxl_root(port)) - cxld->dev.type = &cxl_decoder_root_type; - else if (is_cxl_endpoint(port)) - cxld->dev.type = &cxl_decoder_endpoint_type; - else - cxld->dev.type = &cxl_decoder_switch_type; /* Pre initialize an "empty" decoder */ cxld->interleave_ways = 1; @@ -1237,10 +1246,19 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port, .end = -1, }; - return cxld; -err: - kfree(cxld); - return ERR_PTR(rc); + return 0; +} + +static int cxl_switch_decoder_init(struct cxl_port *port, + struct cxl_switch_decoder *cxlsd, + int nr_targets) +{ + if (nr_targets > CXL_DECODER_MAX_INTERLEAVE) + return -EINVAL; + + cxlsd->nr_targets = nr_targets; + seqlock_init(&cxlsd->target_lock); + return cxl_decoder_init(port, &cxlsd->cxld); } /** @@ -1253,13 +1271,29 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port, * firmware description of CXL resources into a CXL standard decode * topology. */ -struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port, - unsigned int nr_targets) +struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port, + unsigned int nr_targets) { + struct cxl_switch_decoder *cxlsd; + struct cxl_decoder *cxld; + int rc; + if (!is_cxl_root(port)) return ERR_PTR(-EINVAL); - return cxl_decoder_alloc(port, nr_targets); + cxlsd = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL); + if (!cxlsd) + return ERR_PTR(-ENOMEM); + + rc = cxl_switch_decoder_init(port, cxlsd, nr_targets); + if (rc) { + kfree(cxlsd); + return ERR_PTR(rc); + } + + cxld = &cxlsd->cxld; + cxld->dev.type = &cxl_decoder_root_type; + return cxlsd; } EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL); @@ -1274,13 +1308,29 @@ EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL); * that sit between Switch Upstream Ports / Switch Downstream Ports and * Host Bridges / Root Ports. */ -struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port, - unsigned int nr_targets) +struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port, + unsigned int nr_targets) { + struct cxl_switch_decoder *cxlsd; + struct cxl_decoder *cxld; + int rc; + if (is_cxl_root(port) || is_cxl_endpoint(port)) return ERR_PTR(-EINVAL); - return cxl_decoder_alloc(port, nr_targets); + cxlsd = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL); + if (!cxlsd) + return ERR_PTR(-ENOMEM); + + rc = cxl_switch_decoder_init(port, cxlsd, nr_targets); + if (rc) { + kfree(cxlsd); + return ERR_PTR(rc); + } + + cxld = &cxlsd->cxld; + cxld->dev.type = &cxl_decoder_switch_type; + return cxlsd; } EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL); @@ -1292,10 +1342,24 @@ EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL); */ struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) { + struct cxl_decoder *cxld; + int rc; + if (!is_cxl_endpoint(port)) return ERR_PTR(-EINVAL); - return cxl_decoder_alloc(port, 0); + cxld = kzalloc(sizeof(*cxld), GFP_KERNEL); + if (!cxld) + return ERR_PTR(-ENOMEM); + + rc = cxl_decoder_init(port, cxld); + if (rc) { + kfree(cxld); + return ERR_PTR(rc); + } + + cxld->dev.type = &cxl_decoder_endpoint_type; + return cxld; } EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_alloc, CXL); @@ -1337,7 +1401,9 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map) port = to_cxl_port(cxld->dev.parent); if (!is_endpoint_decoder(dev)) { - rc = decoder_populate_targets(cxld, port, target_map); + struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev); + + rc = decoder_populate_targets(cxlsd, port, target_map); if (rc && (cxld->flags & CXL_DECODER_F_ENABLE)) { dev_err(&port->dev, "Failed to populate active decoder targets\n"); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 570bd9f8141b..0289c06ec72c 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -220,7 +220,7 @@ enum cxl_decoder_type { #define CXL_DECODER_MAX_INTERLEAVE 16 /** - * struct cxl_decoder - CXL address range decode configuration + * struct cxl_decoder - Common CXL HDM Decoder Attributes * @dev: this decoder's device * @id: kernel device name id * @hpa_range: Host physical address range mapped by this decoder @@ -228,9 +228,6 @@ enum cxl_decoder_type { * @interleave_granularity: data stride per dport * @target_type: accelerator vs expander (type2 vs type3) selector * @flags: memory type capabilities and locking - * @target_lock: coordinate coherent reads of the target list - * @nr_targets: number of elements in @target - * @target: active ordered target list in current decoder configuration */ struct cxl_decoder { struct device dev; @@ -240,6 +237,23 @@ struct cxl_decoder { int interleave_granularity; enum cxl_decoder_type target_type; unsigned long flags; +}; + +/** + * struct cxl_switch_decoder - Switch specific CXL HDM Decoder + * @cxld: base cxl_decoder object + * @target_lock: coordinate coherent reads of the target list + * @nr_targets: number of elements in @target + * @target: active ordered target list in current decoder configuration + * + * The 'switch' decoder type represents the decoder instances of cxl_port's that + * route from the root of a CXL memory decode topology to the endpoints. They + * come in two flavors, root-level decoders, statically defined by platform + * firmware, and mid-level decoders, where interleave-granularity, + * interleave-width, and the target list are mutable. + */ +struct cxl_switch_decoder { + struct cxl_decoder cxld; seqlock_t target_lock; int nr_targets; struct cxl_dport *target[]; @@ -364,10 +378,10 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port, struct cxl_decoder *to_cxl_decoder(struct device *dev); bool is_root_decoder(struct device *dev); bool is_endpoint_decoder(struct device *dev); -struct cxl_decoder *cxl_root_decoder_alloc(struct cxl_port *port, - unsigned int nr_targets); -struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port, - unsigned int nr_targets); +struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port, + unsigned int nr_targets); +struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port, + unsigned int nr_targets); int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map); struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port); int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map); diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 6e086fbc5c5b..7991ddc6e562 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -451,14 +451,23 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) struct cxl_decoder *cxld; int rc; - if (target_count) - cxld = cxl_switch_decoder_alloc(port, target_count); - else + if (target_count) { + struct cxl_switch_decoder *cxlsd; + + cxlsd = cxl_switch_decoder_alloc(port, target_count); + if (IS_ERR(cxlsd)) { + dev_warn(&port->dev, + "Failed to allocate the decoder\n"); + return PTR_ERR(cxlsd); + } + cxld = &cxlsd->cxld; + } else { cxld = cxl_endpoint_decoder_alloc(port); - if (IS_ERR(cxld)) { - dev_warn(&port->dev, - "Failed to allocate the decoder\n"); - return PTR_ERR(cxld); + if (IS_ERR(cxld)) { + dev_warn(&port->dev, + "Failed to allocate the decoder\n"); + return PTR_ERR(cxld); + } } cxld->hpa_range = (struct range) { From patchwork Fri Jul 15 00:00:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918547 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19E386D17 for ; Fri, 15 Jul 2022 00:01:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843277; x=1689379277; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g7uYXQiu25oHGwZq+lig75MTaX/7ZFGMAsa2V42H71M=; b=HKwFKxB7jczgehIvOp1nJzIUI1jxRQD/G5jHAtLqzS2qJ3ZONME/JQ4c 1ivDM3jcOTLB/w6FfzWcaLJ5GEiOuBi/9t8NIVZWDz7kLjY+C8DiPCNlN mNJxUI1gq48qGMYKiDVBFf1DqVmLoVyq8zL5s9rr78AluUIsURvlPwBgM 6sfLtc3tXQx/U95aS0LDO8FsdPWj1xQHaxbU/zye2NVc0POZ3zFLiiXQG HQJDeOws4U0NBO7v5UOqZU6OV2g3Un+S7DOdmAGde+Zfp0zUuX7GCMuaI xLLUZx0JKY2QC8l77GfhZoYGDEEYor7+GzP9t5PFSp+hFhPVTP8SWXXqK g==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="285683186" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="285683186" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:00 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="842325380" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:00:59 -0700 Subject: [PATCH v2 03/28] cxl/acpi: Track CXL resources in iomem_resource From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Greg Kroah-Hartman , Andrew Morton , David Hildenbrand , Jason Gunthorpe , Tony Luck , Christoph Hellwig , nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:00:59 -0700 Message-ID: <165784325943.1758207.5310344844375305118.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Recall that CXL capable address ranges, on ACPI platforms, are published in the CEDT.CFMWS (CXL Early Discovery Table: CXL Fixed Memory Window Structures). These windows represent both the actively mapped capacity and the potential address space that can be dynamically assigned to a new CXL decode configuration (region / interleave-set). CXL endpoints like DDR DIMMs can be mapped at any physical address including 0 and legacy ranges. There is an expectation and requirement that the /proc/iomem interface and the iomem_resource tree in the kernel reflect the full set of platform address ranges. I.e. that every address range that platform firmware and bus drivers enumerate be reflected as an iomem_resource entry. The hard requirement to do this for CXL arises from the fact that facilities like CONFIG_DEVICE_PRIVATE expect to be able to treat empty iomem_resource ranges as free for software to use as proxy address space. Without CXL publishing its potential address ranges in iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently steal capacity reserved for runtime provisioning of new CXL regions. So, iomem_resource needs to know about both active and potential CXL resource ranges. The active CXL resources might already be reflected in iomem_resource as "System RAM". insert_resource_expand_to_fit() handles re-parenting "System RAM" underneath a CXL window. The "_expand_to_fit()" behavior handles cases where a CXL window is not a strict superset of an existing entry in the iomem_resource tree. The "_expand_to_fit()" behavior is acceptable from the perspective of resource allocation. The expansion happens because a conflicting resource range is already populated, which means the resource boundary expansion does not result in any additional free CXL address space being made available. CXL address space allocation is always bounded by the orginal unexpanded address range. However, the potential for expansion does mean that something like walk_iomem_res_desc(IORES_DESC_CXL...) can only return fuzzy answers on corner case platforms that cause the resource tree to expand a CXL window resource over a range that is not decoded by CXL. This would be an odd platform configuration, but if it becomes a problem in practice the CXL subsytem could just publish an API that returns definitive answers. Cc: Greg Kroah-Hartman Cc: Andrew Morton Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Tony Luck Cc: Christoph Hellwig Signed-off-by: Dan Williams Acked-by: Greg Kroah-Hartman Reviewed-by: Jonathan Cameron --- drivers/cxl/acpi.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/ioport.h | 1 kernel/resource.c | 7 ++ 3 files changed, 149 insertions(+), 3 deletions(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 62bf22ffb7aa..e2b6cbd04846 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -73,6 +73,8 @@ static int cxl_acpi_cfmws_verify(struct device *dev, struct cxl_cfmws_context { struct device *dev; struct cxl_port *root_port; + struct resource *cxl_res; + int id; }; static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, @@ -81,11 +83,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, int target_map[CXL_DECODER_MAX_INTERLEAVE]; struct cxl_cfmws_context *ctx = arg; struct cxl_port *root_port = ctx->root_port; + struct resource *cxl_res = ctx->cxl_res; struct cxl_switch_decoder *cxlsd; struct device *dev = ctx->dev; struct acpi_cedt_cfmws *cfmws; struct cxl_decoder *cxld; unsigned int ways, i, ig; + struct resource *res; int rc; cfmws = (struct acpi_cedt_cfmws *) header; @@ -107,6 +111,23 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, for (i = 0; i < ways; i++) target_map[i] = cfmws->interleave_targets[i]; + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + res->name = kasprintf(GFP_KERNEL, "CXL Window %d", ctx->id++); + if (!res->name) + goto err_name; + + res->start = cfmws->base_hpa; + res->end = cfmws->base_hpa + cfmws->window_size - 1; + res->flags = IORESOURCE_MEM; + + /* add to the local resource tracking to establish a sort order */ + rc = insert_resource(cxl_res, res); + if (rc) + goto err_insert; + cxlsd = cxl_root_decoder_alloc(root_port, ways); if (IS_ERR(cxld)) return 0; @@ -115,8 +136,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions); cxld->target_type = CXL_DECODER_EXPANDER; cxld->hpa_range = (struct range) { - .start = cfmws->base_hpa, - .end = cfmws->base_hpa + cfmws->window_size - 1, + .start = res->start, + .end = res->end, }; cxld->interleave_ways = ways; cxld->interleave_granularity = ig; @@ -137,6 +158,12 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, cxld->hpa_range.start, cxld->hpa_range.end); return 0; + +err_insert: + kfree(res->name); +err_name: + kfree(res); + return -ENOMEM; } __mock struct acpi_device *to_cxl_host_bridge(struct device *host, @@ -291,9 +318,101 @@ static void cxl_acpi_lock_reset_class(void *dev) device_lock_reset_class(dev); } +static void del_cxl_resource(struct resource *res) +{ + kfree(res->name); + kfree(res); +} + +static void cxl_set_public_resource(struct resource *priv, struct resource *pub) +{ + priv->desc = (unsigned long) pub; +} + +static struct resource *cxl_get_public_resource(struct resource *priv) +{ + return (struct resource *) priv->desc; +} + +static void remove_cxl_resources(void *data) +{ + struct resource *res, *next, *cxl = data; + + for (res = cxl->child; res; res = next) { + struct resource *victim = cxl_get_public_resource(res); + + next = res->sibling; + remove_resource(res); + + if (victim) { + remove_resource(victim); + kfree(victim); + } + + del_cxl_resource(res); + } +} + +/** + * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource + * @cxl_res: A standalone resource tree where each CXL window is a sibling + * + * Walk each CXL window in @cxl_res and add it to iomem_resource potentially + * expanding its boundaries to ensure that any conflicting resources become + * children. If a window is expanded it may then conflict with a another window + * entry and require the window to be truncated or trimmed. Consider this + * situation: + * + * |-- "CXL Window 0" --||----- "CXL Window 1" -----| + * |--------------- "System RAM" -------------| + * + * ...where platform firmware has established as System RAM resource across 2 + * windows, but has left some portion of window 1 for dynamic CXL region + * provisioning. In this case "Window 0" will span the entirety of the "System + * RAM" span, and "CXL Window 1" is truncated to the remaining tail past the end + * of that "System RAM" resource. + */ +static int add_cxl_resources(struct resource *cxl_res) +{ + struct resource *res, *new, *next; + + for (res = cxl_res->child; res; res = next) { + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return -ENOMEM; + new->name = res->name; + new->start = res->start; + new->end = res->end; + new->flags = IORESOURCE_MEM; + new->desc = IORES_DESC_CXL; + + /* + * Record the public resource in the private cxl_res tree for + * later removal. + */ + cxl_set_public_resource(res, new); + + insert_resource_expand_to_fit(&iomem_resource, new); + + next = res->sibling; + while (next && resource_overlaps(new, next)) { + if (resource_contains(new, next)) { + struct resource *_next = next->sibling; + + remove_resource(next); + del_cxl_resource(next); + next = _next; + } else + next->start = new->end + 1; + } + } + return 0; +} + static int cxl_acpi_probe(struct platform_device *pdev) { int rc; + struct resource *cxl_res; struct cxl_port *root_port; struct device *host = &pdev->dev; struct acpi_device *adev = ACPI_COMPANION(host); @@ -305,6 +424,14 @@ static int cxl_acpi_probe(struct platform_device *pdev) if (rc) return rc; + cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL); + if (!cxl_res) + return -ENOMEM; + cxl_res->name = "CXL mem"; + cxl_res->start = 0; + cxl_res->end = -1; + cxl_res->flags = IORESOURCE_MEM; + root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL); if (IS_ERR(root_port)) return PTR_ERR(root_port); @@ -315,11 +442,22 @@ static int cxl_acpi_probe(struct platform_device *pdev) if (rc < 0) return rc; + rc = devm_add_action_or_reset(host, remove_cxl_resources, cxl_res); + if (rc) + return rc; + ctx = (struct cxl_cfmws_context) { .dev = host, .root_port = root_port, + .cxl_res = cxl_res, }; - acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx); + rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx); + if (rc < 0) + return -ENXIO; + + rc = add_cxl_resources(cxl_res); + if (rc) + return rc; /* * Root level scanned with host-bridge as dports, now scan host-bridges diff --git a/include/linux/ioport.h b/include/linux/ioport.h index ec5f71f7135b..79d1ad6d6275 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -141,6 +141,7 @@ enum { IORES_DESC_DEVICE_PRIVATE_MEMORY = 6, IORES_DESC_RESERVED = 7, IORES_DESC_SOFT_RESERVED = 8, + IORES_DESC_CXL = 9, }; /* diff --git a/kernel/resource.c b/kernel/resource.c index 34eaee179689..53a534db350e 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -891,6 +891,13 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new) } write_unlock(&resource_lock); } +/* + * Not for general consumption, only early boot memory map parsing, PCI + * resource discovery, and late discovery of CXL resources are expected + * to use this interface. The former are built-in and only the latter, + * CXL, is a module. + */ +EXPORT_SYMBOL_NS_GPL(insert_resource_expand_to_fit, CXL); /** * remove_resource - Remove a resource in the resource tree From patchwork Fri Jul 15 00:01:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918546 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1837D6D17 for ; Fri, 15 Jul 2022 00:01:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843273; x=1689379273; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yIncCty2UR6oXNxbEEHSGl8uWj7bPEwM/IqHg+ZT7g8=; b=eW1uIJmMQFuo/cPRQ8wkfPvFyKhdYzXQbTeroFe4+lsizid93nygtSfT bS5lnr0XERoo3uvOPeDDDClDEC4r3JMkRwX//VwlkZRxOnLYoPlClC78C UgNPwxDRch4Xs5E4/Iq93sVup4x0XJuYwrWHj5MRxBHNCFlm1bVzcFMIh bfDgUP6LS7fMj6fJU6cTrc4WGz4+SuuZVoeQWF2jsk9IYOfczEvMUkISs Rvu1zkeKjlrhRag6+D+6xi6c1RC9Ntwy/vOhBWI7a+DHY2ws8mxAWD6vT pE70Td3GlvyJyFm8H+MCWeYRwCmAC/nXNIciyXPw/fcETplXzSctsif8/ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="283219233" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="283219233" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:06 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="546461332" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:05 -0700 Subject: [PATCH v2 04/28] cxl/core: Define a 'struct cxl_root_decoder' From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:05 -0700 Message-ID: <165784326541.1758207.9915663937394448341.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Previously the target routing specifics of switch decoders were factored out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'. This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a switch decoder that also track the associated CXL window platform resource. Note that the reason the resource for a given root decoder needs to be looked up after the fact (i.e. after cxl_parse_cfmws() and add_cxl_resource()) is because add_cxl_resource() may have merged CXL windows in order to keep them at the top of the resource tree / decode hierarchy. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/acpi.c | 40 ++++++++++++++++++++++++++++++++++++---- drivers/cxl/core/port.c | 34 +++++++++++++++++++++++++++------- drivers/cxl/cxl.h | 15 +++++++++++++-- 3 files changed, 76 insertions(+), 13 deletions(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index e2b6cbd04846..8f021241699f 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -84,7 +84,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, struct cxl_cfmws_context *ctx = arg; struct cxl_port *root_port = ctx->root_port; struct resource *cxl_res = ctx->cxl_res; - struct cxl_switch_decoder *cxlsd; + struct cxl_root_decoder *cxlrd; struct device *dev = ctx->dev; struct acpi_cedt_cfmws *cfmws; struct cxl_decoder *cxld; @@ -128,11 +128,11 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, if (rc) goto err_insert; - cxlsd = cxl_root_decoder_alloc(root_port, ways); - if (IS_ERR(cxld)) + cxlrd = cxl_root_decoder_alloc(root_port, ways); + if (IS_ERR(cxlrd)) return 0; - cxld = &cxlsd->cxld; + cxld = &cxlrd->cxlsd.cxld; cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions); cxld->target_type = CXL_DECODER_EXPANDER; cxld->hpa_range = (struct range) { @@ -409,6 +409,32 @@ static int add_cxl_resources(struct resource *cxl_res) return 0; } +static int pair_cxl_resource(struct device *dev, void *data) +{ + struct resource *cxl_res = data; + struct resource *p; + + if (!is_root_decoder(dev)) + return 0; + + for (p = cxl_res->child; p; p = p->sibling) { + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct resource res = { + .start = cxld->hpa_range.start, + .end = cxld->hpa_range.end, + .flags = IORESOURCE_MEM, + }; + + if (resource_contains(p, &res)) { + cxlrd->res = cxl_get_public_resource(p); + break; + } + } + + return 0; +} + static int cxl_acpi_probe(struct platform_device *pdev) { int rc; @@ -459,6 +485,12 @@ static int cxl_acpi_probe(struct platform_device *pdev) if (rc) return rc; + /* + * Populate the root decoders with their related iomem resource, + * if present + */ + device_for_each_child(&root_port->dev, cxl_res, pair_cxl_resource); + /* * Root level scanned with host-bridge as dports, now scan host-bridges * for their role as CXL uports to their CXL-capable PCIe Root Ports. diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 27a2a6b839aa..4953a1c7b245 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -260,6 +260,23 @@ static void cxl_switch_decoder_release(struct device *dev) kfree(cxlsd); } +struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev) +{ + if (dev_WARN_ONCE(dev, !is_root_decoder(dev), + "not a cxl_root_decoder device\n")) + return NULL; + return container_of(dev, struct cxl_root_decoder, cxlsd.cxld.dev); +} +EXPORT_SYMBOL_NS_GPL(to_cxl_root_decoder, CXL); + +static void cxl_root_decoder_release(struct device *dev) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + + __cxl_decoder_release(&cxlrd->cxlsd.cxld); + kfree(cxlrd); +} + static const struct device_type cxl_decoder_endpoint_type = { .name = "cxl_decoder_endpoint", .release = cxl_decoder_release, @@ -274,7 +291,7 @@ static const struct device_type cxl_decoder_switch_type = { static const struct device_type cxl_decoder_root_type = { .name = "cxl_decoder_root", - .release = cxl_switch_decoder_release, + .release = cxl_root_decoder_release, .groups = cxl_decoder_root_attribute_groups, }; @@ -1271,9 +1288,10 @@ static int cxl_switch_decoder_init(struct cxl_port *port, * firmware description of CXL resources into a CXL standard decode * topology. */ -struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port, - unsigned int nr_targets) +struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, + unsigned int nr_targets) { + struct cxl_root_decoder *cxlrd; struct cxl_switch_decoder *cxlsd; struct cxl_decoder *cxld; int rc; @@ -1281,19 +1299,21 @@ struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port, if (!is_cxl_root(port)) return ERR_PTR(-EINVAL); - cxlsd = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL); - if (!cxlsd) + cxlrd = kzalloc(struct_size(cxlrd, cxlsd.target, nr_targets), + GFP_KERNEL); + if (!cxlrd) return ERR_PTR(-ENOMEM); + cxlsd = &cxlrd->cxlsd; rc = cxl_switch_decoder_init(port, cxlsd, nr_targets); if (rc) { - kfree(cxlsd); + kfree(cxlrd); return ERR_PTR(rc); } cxld = &cxlsd->cxld; cxld->dev.type = &cxl_decoder_root_type; - return cxlsd; + return cxlrd; } EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 0289c06ec72c..ebdac8e7d181 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -260,6 +260,16 @@ struct cxl_switch_decoder { }; +/** + * struct cxl_root_decoder - Static platform CXL address decoder + * @res: host / parent resource for region allocations + * @cxlsd: base cxl switch decoder + */ +struct cxl_root_decoder { + struct resource *res; + struct cxl_switch_decoder cxlsd; +}; + /** * enum cxl_nvdimm_brige_state - state machine for managing bus rescans * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed @@ -376,10 +386,11 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dev); struct cxl_decoder *to_cxl_decoder(struct device *dev); +struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev); bool is_root_decoder(struct device *dev); bool is_endpoint_decoder(struct device *dev); -struct cxl_switch_decoder *cxl_root_decoder_alloc(struct cxl_port *port, - unsigned int nr_targets); +struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, + unsigned int nr_targets); struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port, unsigned int nr_targets); int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map); From patchwork Fri Jul 15 00:01:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918552 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8D946D17 for ; Fri, 15 Jul 2022 00:01:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843305; x=1689379305; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+ihFnLM0nNzB7sHl4ZXxnyM/LsP/E8epqdxlAvbvAKI=; b=feBpUD3VinWAkUX4ESrm74Oe9vhwINibUZLSKAOEAHTZy2lodP+OYhV0 cBgNG9YdTxQyzFGnk9fuhTaKcxL3xCTT+c2LBjlqtHOUSyEcZYd5bzpbA uJqdxiZlQ0lkon8SxbxGHP6OmYR+BPSfcyxYAOSQZbxrnS6V805rMjnJc 34ZXqpROlSaTtV0GLNmMFW3GO2JBizmKoGNoehTJUha1JLod9F3kAlJYn RH47XePqjMxdysZ4kyhLMqVhRqjZysI7Eatnf2qEFK9a3d+n2P6ehJlav JnKuY7sh9tnMFK9tgUhM29KiNJo+VvkwCFqhJ5672r1K2SQdyUCP+PRKx g==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="347338986" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="347338986" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:12 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="663984569" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:11 -0700 Subject: [PATCH v2 05/28] cxl/core: Define a 'struct cxl_endpoint_decoder' From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:10 -0700 Message-ID: <165784327088.1758207.15502834501671201192.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Previously the target routing specifics of switch decoders and platform CXL window resource tracking of root decoders were factored out of 'struct cxl_decoder'. While switch decoders translate from SPA to downstream ports, endpoint decoders translate from SPA to DPA. This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an endpoint-specific Device Physical Address (DPA) resource. For now this just defines ->dpa_res, a follow-on patch will handle requesting DPA resource ranges from a device-DPA resource tree. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/hdm.c | 9 ++++++--- drivers/cxl/core/port.c | 31 +++++++++++++++++++++---------- drivers/cxl/cxl.h | 15 ++++++++++++++- tools/testing/cxl/test/cxl.c | 10 +++++++--- 4 files changed, 48 insertions(+), 17 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 2f10d42798de..650363d5272f 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -256,12 +256,15 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) struct cxl_decoder *cxld; if (is_cxl_endpoint(port)) { - cxld = cxl_endpoint_decoder_alloc(port); - if (IS_ERR(cxld)) { + struct cxl_endpoint_decoder *cxled; + + cxled = cxl_endpoint_decoder_alloc(port); + if (IS_ERR(cxled)) { dev_warn(&port->dev, "Failed to allocate the decoder\n"); - return PTR_ERR(cxld); + return PTR_ERR(cxled); } + cxld = &cxled->cxld; } else { struct cxl_switch_decoder *cxlsd; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 4953a1c7b245..ca4f23204e5c 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -244,12 +244,12 @@ static void __cxl_decoder_release(struct cxl_decoder *cxld) put_device(&port->dev); } -static void cxl_decoder_release(struct device *dev) +static void cxl_endpoint_decoder_release(struct device *dev) { - struct cxl_decoder *cxld = to_cxl_decoder(dev); + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - __cxl_decoder_release(cxld); - kfree(cxld); + __cxl_decoder_release(&cxled->cxld); + kfree(cxled); } static void cxl_switch_decoder_release(struct device *dev) @@ -279,7 +279,7 @@ static void cxl_root_decoder_release(struct device *dev) static const struct device_type cxl_decoder_endpoint_type = { .name = "cxl_decoder_endpoint", - .release = cxl_decoder_release, + .release = cxl_endpoint_decoder_release, .groups = cxl_decoder_endpoint_attribute_groups, }; @@ -321,6 +321,15 @@ struct cxl_decoder *to_cxl_decoder(struct device *dev) } EXPORT_SYMBOL_NS_GPL(to_cxl_decoder, CXL); +struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev) +{ + if (dev_WARN_ONCE(dev, !is_endpoint_decoder(dev), + "not a cxl_endpoint_decoder device\n")) + return NULL; + return container_of(dev, struct cxl_endpoint_decoder, cxld.dev); +} +EXPORT_SYMBOL_NS_GPL(to_cxl_endpoint_decoder, CXL); + static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev) { if (dev_WARN_ONCE(dev, !is_switch_decoder(dev), @@ -1360,26 +1369,28 @@ EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL); * * Return: A new cxl decoder to be registered by cxl_decoder_add() */ -struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) +struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) { + struct cxl_endpoint_decoder *cxled; struct cxl_decoder *cxld; int rc; if (!is_cxl_endpoint(port)) return ERR_PTR(-EINVAL); - cxld = kzalloc(sizeof(*cxld), GFP_KERNEL); - if (!cxld) + cxled = kzalloc(sizeof(*cxled), GFP_KERNEL); + if (!cxled) return ERR_PTR(-ENOMEM); + cxld = &cxled->cxld; rc = cxl_decoder_init(port, cxld); if (rc) { - kfree(cxld); + kfree(cxled); return ERR_PTR(rc); } cxld->dev.type = &cxl_decoder_endpoint_type; - return cxld; + return cxled; } EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_alloc, CXL); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index ebdac8e7d181..7e1460d89296 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -239,6 +239,18 @@ struct cxl_decoder { unsigned long flags; }; +/** + * struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder + * @cxld: base cxl_decoder_object + * @dpa_res: actively claimed DPA span of this decoder + * @skip: offset into @dpa_res where @cxld.hpa_range maps + */ +struct cxl_endpoint_decoder { + struct cxl_decoder cxld; + struct resource *dpa_res; + resource_size_t skip; +}; + /** * struct cxl_switch_decoder - Switch specific CXL HDM Decoder * @cxld: base cxl_decoder object @@ -387,6 +399,7 @@ struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port, struct cxl_decoder *to_cxl_decoder(struct device *dev); struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev); +struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev); bool is_root_decoder(struct device *dev); bool is_endpoint_decoder(struct device *dev); struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, @@ -394,7 +407,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port, unsigned int nr_targets); int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map); -struct cxl_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port); +struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port); int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map); int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld); int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint); diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 7991ddc6e562..4dad0fa7ac4c 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -462,12 +462,16 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) } cxld = &cxlsd->cxld; } else { - cxld = cxl_endpoint_decoder_alloc(port); - if (IS_ERR(cxld)) { + struct cxl_endpoint_decoder *cxled; + + cxled = cxl_endpoint_decoder_alloc(port); + + if (IS_ERR(cxled)) { dev_warn(&port->dev, "Failed to allocate the decoder\n"); - return PTR_ERR(cxld); + return PTR_ERR(cxled); } + cxld = &cxled->cxld; } cxld->hpa_range = (struct range) { From patchwork Fri Jul 15 00:01:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918548 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FB136D17 for ; Fri, 15 Jul 2022 00:01:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843282; x=1689379282; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rFuYJU2RizqKHIoiSQcEsnIWX6Vqm+7jVfD6gBeg/Wc=; b=Z7ADAyWW5/8uG154Kjzz5E0eCl6w+BvAK/nt/H3Xb4x+2JbPqvFfXxPZ /hqTDa4Ix0++vHuikeP48OQbIzpraZm3XT3U9/Bcw6GXxYnQrQ42mlRfj Igpz8D/tdVvtD+Qyc/Nsj8Z2jRz3Rkup0VBFkfjfUV+GiUzuXCHWRhaCJ 8YFl2BzC3hJa88+ewtBGQsqnS/Z6IgUCDbuZs8io8UdGrjN+EVSJ+KPKO 2xRQfvtHVYj4PreoGKDi1XqzmZIo1WApaI8Oe9QeSBsOpajxgSDHEvyC3 Cjm1lFAHFKZwmsfJ5uZlQvfpc7NP8iWI6XIJ24rt1Dfr2XrhU3g8zHQbX g==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="268686786" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="268686786" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:18 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="685766185" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:17 -0700 Subject: [PATCH v2 06/28] cxl/hdm: Enumerate allocated DPA From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:16 -0700 Message-ID: <165784327682.1758207.7914919426043855876.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for provisioning CXL regions, add accounting for the DPA space consumed by existing regions / decoders. Recall, a CXL region is a memory range comprised from one or more endpoint devices contributing a mapping of their DPA into HPA space through a decoder. Record the DPA ranges covered by committed decoders at initial probe of endpoint ports relative to a per-device resource tree of the DPA type (pmem or volatile-ram). The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA state across all endpoints and their decoders at once. The vast majority of DPA operations are reads as region creation is expected to be as rare as disk partitioning and volume creation. The device_lock() for this synchronization is specifically avoided for concern of entangling with sysfs attribute removal. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/hdm.c | 143 ++++++++++++++++++++++++++++++++++++++++++++---- drivers/cxl/cxl.h | 2 + drivers/cxl/cxlmem.h | 13 ++++ 3 files changed, 147 insertions(+), 11 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 650363d5272f..d4c17325001b 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -153,10 +153,105 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, CXL); +/* + * Must be called in a context that synchronizes against this decoder's + * port ->remove() callback (like an endpoint decoder sysfs attribute) + */ +static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct resource *res = cxled->dpa_res; + + lockdep_assert_held_write(&cxl_dpa_rwsem); + + if (cxled->skip) + __release_region(&cxlds->dpa_res, res->start - cxled->skip, + cxled->skip); + cxled->skip = 0; + __release_region(&cxlds->dpa_res, res->start, resource_size(res)); + cxled->dpa_res = NULL; +} + +static void cxl_dpa_release(void *cxled) +{ + down_write(&cxl_dpa_rwsem); + __cxl_dpa_release(cxled); + up_write(&cxl_dpa_rwsem); +} + +static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, + resource_size_t base, resource_size_t len, + resource_size_t skipped) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *port = cxled_to_port(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *dev = &port->dev; + struct resource *res; + + lockdep_assert_held_write(&cxl_dpa_rwsem); + + if (!len) + return 0; + + if (cxled->dpa_res) { + dev_dbg(dev, "decoder%d.%d: existing allocation %pr assigned\n", + port->id, cxled->cxld.id, cxled->dpa_res); + return -EBUSY; + } + + if (skipped) { + res = __request_region(&cxlds->dpa_res, base - skipped, skipped, + dev_name(&cxled->cxld.dev), 0); + if (!res) { + dev_dbg(dev, + "decoder%d.%d: failed to reserve skipped space\n", + port->id, cxled->cxld.id); + return -EBUSY; + } + } + res = __request_region(&cxlds->dpa_res, base, len, + dev_name(&cxled->cxld.dev), 0); + if (!res) { + dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n", + port->id, cxled->cxld.id); + if (skipped) + __release_region(&cxlds->dpa_res, base - skipped, + skipped); + return -EBUSY; + } + cxled->dpa_res = res; + cxled->skip = skipped; + + return 0; +} + +static int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, + resource_size_t base, resource_size_t len, + resource_size_t skipped) +{ + struct cxl_port *port = cxled_to_port(cxled); + int rc; + + down_write(&cxl_dpa_rwsem); + rc = __cxl_dpa_reserve(cxled, base, len, skipped); + up_write(&cxl_dpa_rwsem); + + if (rc) + return rc; + + return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); +} + static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, - int *target_map, void __iomem *hdm, int which) + int *target_map, void __iomem *hdm, int which, + u64 *dpa_base) { - u64 size, base; + struct cxl_endpoint_decoder *cxled = NULL; + u64 size, base, skip, dpa_size; + bool committed; + u32 remainder; int i, rc; u32 ctrl; union { @@ -164,11 +259,15 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, unsigned char target_id[8]; } target_list; + if (is_endpoint_decoder(&cxld->dev)) + cxled = to_cxl_endpoint_decoder(&cxld->dev); + ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which)); base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which)); size = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which)); + committed = !!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED); - if (!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)) + if (!committed) size = 0; if (base == U64_MAX || size == U64_MAX) { dev_warn(&port->dev, "decoder%d.%d: Invalid resource range\n", @@ -181,8 +280,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, .end = base + size - 1, }; - /* switch decoders are always enabled if committed */ - if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED) { + /* decoders are enabled if committed */ + if (committed) { cxld->flags |= CXL_DECODER_F_ENABLE; if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK) cxld->flags |= CXL_DECODER_F_LOCK; @@ -211,14 +310,35 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, if (rc) return rc; - if (is_endpoint_decoder(&cxld->dev)) + if (!cxled) { + target_list.value = + ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which)); + for (i = 0; i < cxld->interleave_ways; i++) + target_map[i] = target_list.target_id[i]; + return 0; + } - target_list.value = - ioread64_hi_lo(hdm + CXL_HDM_DECODER0_TL_LOW(which)); - for (i = 0; i < cxld->interleave_ways; i++) - target_map[i] = target_list.target_id[i]; + if (!committed) + return 0; + dpa_size = div_u64_rem(size, cxld->interleave_ways, &remainder); + if (remainder) { + dev_err(&port->dev, + "decoder%d.%d: invalid committed configuration size: %#llx ways: %d\n", + port->id, cxld->id, size, cxld->interleave_ways); + return -ENXIO; + } + skip = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SKIP_LOW(which)); + rc = devm_cxl_dpa_reserve(cxled, *dpa_base + skip, dpa_size, skip); + if (rc) { + dev_err(&port->dev, + "decoder%d.%d: Failed to reserve DPA range %#llx - %#llx\n (%d)", + port->id, cxld->id, *dpa_base, + *dpa_base + dpa_size + skip - 1, rc); + return rc; + } + *dpa_base += dpa_size + skip; return 0; } @@ -231,6 +351,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) void __iomem *hdm = cxlhdm->regs.hdm_decoder; struct cxl_port *port = cxlhdm->port; int i, committed; + u64 dpa_base = 0; u32 ctrl; /* @@ -277,7 +398,7 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) cxld = &cxlsd->cxld; } - rc = init_hdm_decoder(port, cxld, target_map, hdm, i); + rc = init_hdm_decoder(port, cxld, target_map, hdm, i, &dpa_base); if (rc) { put_device(&cxld->dev); return rc; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 7e1460d89296..d5e4cfac35ea 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -56,6 +56,8 @@ #define CXL_HDM_DECODER0_CTRL_TYPE BIT(12) #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24) #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28) +#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i) +#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i) static inline int cxl_hdm_decoder_count(u32 cap_hdr) { diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index c6d6f57856cc..eee96016c3c7 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -50,6 +50,19 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev) return container_of(dev, struct cxl_memdev, dev); } +static inline struct cxl_port *cxled_to_port(struct cxl_endpoint_decoder *cxled) +{ + return to_cxl_port(cxled->cxld.dev.parent); +} + +static inline struct cxl_memdev * +cxled_to_memdev(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent); + + return to_cxl_memdev(port->uport); +} + bool is_cxl_memdev(struct device *dev); static inline bool is_cxl_endpoint(struct cxl_port *port) { From patchwork Fri Jul 15 00:01:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918551 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DC926D17 for ; Fri, 15 Jul 2022 00:01:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843303; x=1689379303; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zbeC4suVaifJOdnAyRi2cfH0scr2EdvKoYXquZzpLB0=; b=h9yBJuOOpE9ieTBETEl6x7uicNUw853wm2sWUpwpW8Pj6FoSOD6rMQ3x 7TX66NLH7R9kdBHihFqEyDLju99OVr6zZXR3B4d0aDtDH20QiO7Lr8ioX TtOHD/27VHu4hEq8Mkn9k/TPMcwFmRKmmDOCnVfnh8/tajMHiYqBkw/3g 7Uqoaznt9ZhcUYth2i11jFfoNrwdqTKbqatEvvRC8JVeTUS7SsAvTjpuL Xqw7YMEp84ujiPbwfzkBtiuiv/ho7x2REvi0MdeuJYW7Yo5wDz2xCpEte YhHY9v0+W1Cvb3R/Wm47Do2VNS85DaeBhG1BXlNyZjqjwQB9Q+FK+WoPq Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="285683404" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="285683404" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:23 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="571302165" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:23 -0700 Subject: [PATCH v2 07/28] cxl/hdm: Add 'mode' attribute to decoder objects From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Jonathan Cameron , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:22 -0700 Message-ID: <165784328277.1758207.16889065926766678946.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Recall that the Device Physical Address (DPA) space of a CXL Memory Expander is potentially partitioned into a volatile and persistent portion. A decoder maps a Host Physical Address (HPA) range to a DPA range and that translation depends on the value of all previous (lower instance number) decoders before the current one. In preparation for allowing dynamic provisioning of regions, decoders need an ABI to indicate which DPA partition a decoder targets. This ABI needs to be prepared for the possibility that some other agent committed and locked a decoder that spans the partition boundary. Add 'decoderX.Y/mode' to endpoint decoders that indicates which partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder currently spans the partition boundary. Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/165603881967.551046.6007594190951596439.stgit@dwillia2-xfh Signed-off-by: Dan Williams --- Documentation/ABI/testing/sysfs-bus-cxl | 16 ++++++++++++++++ drivers/cxl/core/hdm.c | 10 ++++++++++ drivers/cxl/core/port.c | 20 ++++++++++++++++++++ drivers/cxl/cxl.h | 9 +++++++++ 4 files changed, 55 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 16d9ffa94bbd..b8ef8aedaf39 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -179,3 +179,19 @@ Description: expander memory (type-3). The 'target_type' attribute indicates the current setting which may dynamically change based on what memory regions are activated in this decode hierarchy. + + +What: /sys/bus/cxl/devices/decoderX.Y/mode +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it + translates from a host physical address range, to a device local + address range. Device-local address ranges are further split + into a 'ram' (volatile memory) range and 'pmem' (persistent + memory) range. The 'mode' attribute emits one of 'ram', 'pmem', + 'mixed', or 'none'. The 'mixed' indication is for error cases + when a decoder straddles the volatile/persistent partition + boundary, and 'none' indicates the decoder is not actively + decoding, or no DPA allocation policy has been set. diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index d4c17325001b..acd46b0d69c6 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -224,6 +224,16 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->dpa_res = res; cxled->skip = skipped; + if (resource_contains(&cxlds->pmem_res, res)) + cxled->mode = CXL_DECODER_PMEM; + else if (resource_contains(&cxlds->ram_res, res)) + cxled->mode = CXL_DECODER_RAM; + else { + dev_dbg(dev, "decoder%d.%d: %pr mixed\n", port->id, + cxled->cxld.id, cxled->dpa_res); + cxled->mode = CXL_DECODER_MIXED; + } + return 0; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index ca4f23204e5c..0ac5dcd612e0 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -172,6 +172,25 @@ static ssize_t target_list_show(struct device *dev, } static DEVICE_ATTR_RO(target_list); +static ssize_t mode_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + + switch (cxled->mode) { + case CXL_DECODER_RAM: + return sysfs_emit(buf, "ram\n"); + case CXL_DECODER_PMEM: + return sysfs_emit(buf, "pmem\n"); + case CXL_DECODER_NONE: + return sysfs_emit(buf, "none\n"); + case CXL_DECODER_MIXED: + default: + return sysfs_emit(buf, "mixed\n"); + } +} +static DEVICE_ATTR_RO(mode); + static struct attribute *cxl_decoder_base_attrs[] = { &dev_attr_start.attr, &dev_attr_size.attr, @@ -222,6 +241,7 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = { static struct attribute *cxl_decoder_endpoint_attrs[] = { &dev_attr_target_type.attr, + &dev_attr_mode.attr, NULL, }; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index d5e4cfac35ea..3e7363dde80f 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -241,16 +241,25 @@ struct cxl_decoder { unsigned long flags; }; +enum cxl_decoder_mode { + CXL_DECODER_NONE, + CXL_DECODER_RAM, + CXL_DECODER_PMEM, + CXL_DECODER_MIXED, +}; + /** * struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder * @cxld: base cxl_decoder_object * @dpa_res: actively claimed DPA span of this decoder * @skip: offset into @dpa_res where @cxld.hpa_range maps + * @mode: which memory type / access-mode-partition this decoder targets */ struct cxl_endpoint_decoder { struct cxl_decoder cxld; struct resource *dpa_res; resource_size_t skip; + enum cxl_decoder_mode mode; }; /** From patchwork Fri Jul 15 00:01:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918549 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 452836D17 for ; Fri, 15 Jul 2022 00:01:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843296; x=1689379296; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tkxgOqLN93njdIDOPSIv2OVwmPW6481uGNMLoC79dd4=; b=NahUZbp6hJsX+PHifcOrSCgNNPELl9atEVOCHsKWZHhnLRJwWLxWEPWL YW1xG4z8uqlAxZQOh9lnypl1wj00/wLCoVi27EnxWWk9oVTyhoNGGaFEm CGy1J+lHTkkjCystLguZA9JrJtJrXR79lwf8W2oMLyuaIJdIFPGOVM2aw fzrnpRTn2iRpDhnAgTWs6NJkxsb5gcxCA20QMAlojNC79W/BOPyAUnH2g 32Xsnf5GESUF1+SZ5aLFavdRZnhSne4nfYK0OHy6C9L4QP45Zwk3TOaO6 48RBoPrPQcr18RNnwy2F/BN/PKnHW2AY4MFnDuYaiCBEe0YbzEYw65rQP g==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="371976927" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="371976927" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:29 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="698993528" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:28 -0700 Subject: [PATCH v2 08/28] cxl/hdm: Track next decoder to allocate From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:28 -0700 Message-ID: <165784328827.1758207.9627538529944559954.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The CXL specification enforces that endpoint decoders are committed in hw instance id order. In preparation for adding dynamic DPA allocation, record the hw instance id in endpoint decoders, and enforce allocations to occur in hw instance id order. Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/hdm.c | 15 +++++++++++++++ drivers/cxl/core/port.c | 1 + drivers/cxl/cxl.h | 2 ++ 3 files changed, 18 insertions(+) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index acd46b0d69c6..582f48141767 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -160,6 +160,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, CXL); static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct resource *res = cxled->dpa_res; @@ -171,6 +172,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) cxled->skip = 0; __release_region(&cxlds->dpa_res, res->start, resource_size(res)); cxled->dpa_res = NULL; + port->hdm_end--; } static void cxl_dpa_release(void *cxled) @@ -201,6 +203,18 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, return -EBUSY; } + if (port->hdm_end + 1 != cxled->cxld.id) { + /* + * Assumes alloc and commit order is always in hardware instance + * order per expectations from 8.2.5.12.20 Committing Decoder + * Programming that enforce decoder[m] committed before + * decoder[m+1] commit start. + */ + dev_dbg(dev, "decoder%d.%d: expected decoder%d.%d\n", port->id, + cxled->cxld.id, port->id, port->hdm_end + 1); + return -EBUSY; + } + if (skipped) { res = __request_region(&cxlds->dpa_res, base - skipped, skipped, dev_name(&cxled->cxld.dev), 0); @@ -233,6 +247,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->cxld.id, cxled->dpa_res); cxled->mode = CXL_DECODER_MIXED; } + port->hdm_end++; return 0; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 0ac5dcd612e0..109611318760 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -502,6 +502,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, port->component_reg_phys = component_reg_phys; ida_init(&port->decoder_ida); + port->hdm_end = -1; INIT_LIST_HEAD(&port->dports); INIT_LIST_HEAD(&port->endpoints); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 3e7363dde80f..70cd24e4f3ce 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -333,6 +333,7 @@ struct cxl_nvdimm { * @dports: cxl_dport instances referenced by decoders * @endpoints: cxl_ep instances, endpoints that are a descendant of this port * @decoder_ida: allocator for decoder ids + * @hdm_end: track last allocated HDM decoder instance for allocation ordering * @component_reg_phys: component register capability base address (optional) * @dead: last ep has been removed, force port re-creation * @depth: How deep this port is relative to the root. depth 0 is the root. @@ -345,6 +346,7 @@ struct cxl_port { struct list_head dports; struct list_head endpoints; struct ida decoder_ida; + int hdm_end; resource_size_t component_reg_phys; bool dead; unsigned int depth; From patchwork Fri Jul 15 00:01:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918550 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C659F6D17 for ; Fri, 15 Jul 2022 00:01:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843300; x=1689379300; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JJqrcVjP4sirdUNzSuDkZbpLNqXGBwqEDLX3wI6q8lA=; b=itmYJDKsI3hdZta+sCEWfZpijIpmkS0tEUdBxl0sEOzer5CL1UUzSK/s Ep/lzL87fiX+83a8ODBAZTrOHylrpTMbhYag0GIoZzWrTAweM+kJktRzo QIMD89vQQcRdNeWYmu2moMioQNopUCCc45HM27EqXLmW0p5l7J1jMDMA2 UoggKI2zSbT2w3I7FZtX2gTVno2niMPmUUE7Cc+q/uxMOr68mwpm+i9pP Skf6KoxFZCW2Is2JCZwdbiEa+V/EG/f28/qel+A9ePZ50jZy9i4+XYiar zu5Wq8aA0lgz/Y6ZTZSI2Eg2GGvD0SG3RDq3L2A6XMOiYqGZhV/scR5LV w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="266073281" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="266073281" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:34 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="628896664" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:34 -0700 Subject: [PATCH v2 09/28] cxl/hdm: Add support for allocating DPA to an endpoint decoder From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:34 -0700 Message-ID: <165784329399.1758207.16732038126938632700.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The region provisioning flow will roughly follow a sequence of: 1/ Allocate DPA to a set of decoders 2/ Allocate HPA to a region 3/ Associate decoders with a region and validate that the DPA allocations and topologies match the parameters of the region. For now, this change (step 1) arranges for DPA capacity to be allocated and deleted from non-committed decoders based on the decoder's mode / partition selection. Capacity is allocated from the lowest DPA in the partition and any 'pmem' allocation blocks out all remaining ram capacity in its 'skip' setting. DPA allocations are enforced in decoder instance order. I.e. decoder N + 1 always starts at a higher DPA than instance N, and deleting allocations must proceed from the highest-instance allocated decoder to the lowest. Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 37 ++++++ drivers/cxl/core/core.h | 7 + drivers/cxl/core/hdm.c | 180 +++++++++++++++++++++++++++++++ drivers/cxl/core/port.c | 73 ++++++++++++- 4 files changed, 295 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index b8ef8aedaf39..9b6cc7cdc73b 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -186,7 +186,7 @@ Date: May, 2022 KernelVersion: v5.20 Contact: linux-cxl@vger.kernel.org Description: - (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it translates from a host physical address range, to a device local address range. Device-local address ranges are further split into a 'ram' (volatile memory) range and 'pmem' (persistent @@ -195,3 +195,38 @@ Description: when a decoder straddles the volatile/persistent partition boundary, and 'none' indicates the decoder is not actively decoding, or no DPA allocation policy has been set. + + 'mode' can be written, when the decoder is in the 'disabled' + state, with either 'ram' or 'pmem' to set the boundaries for the + next allocation. + + +What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint", + and its 'dpa_size' attribute is non-zero, this attribute + indicates the device physical address (DPA) base address of the + allocation. + + +What: /sys/bus/cxl/devices/decoderX.Y/dpa_size +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it + translates from a host physical address range, to a device local + address range. The range, base address plus length in bytes, of + DPA allocated to this decoder is conveyed in these 2 attributes. + Allocations can be mutated as long as the decoder is in the + disabled state. A write to 'dpa_size' releases the previous DPA + allocation and then attempts to allocate from the free capacity + in the device partition referred to by 'decoderX.Y/mode'. + Allocate and free requests can only be performed on the highest + instance number disabled decoder with non-zero size. I.e. + allocations are enforced to occur in increasing 'decoderX.Y/id' + order and frees are enforced to occur in decreasing + 'decoderX.Y/id' order. diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index a0808cdaffba..5551b82b2da0 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -18,6 +18,13 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, resource_size_t length); struct dentry *cxl_debugfs_create_dir(const char *dir); +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, + enum cxl_decoder_mode mode); +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); +resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); + int cxl_memdev_init(void); void cxl_memdev_exit(void); void cxl_mbox_init(void); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 582f48141767..596b57fb60df 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -182,6 +182,19 @@ static void cxl_dpa_release(void *cxled) up_write(&cxl_dpa_rwsem); } +/* + * Must be called from context that will not race port device + * unregistration, like decoder sysfs attribute methods + */ +static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_port *port = cxled_to_port(cxled); + + lockdep_assert_held_write(&cxl_dpa_rwsem); + devm_remove_action(&port->dev, cxl_dpa_release, cxled); + __cxl_dpa_release(cxled); +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -269,6 +282,173 @@ static int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); } +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) +{ + resource_size_t size = 0; + + down_read(&cxl_dpa_rwsem); + if (cxled->dpa_res) + size = resource_size(cxled->dpa_res); + up_read(&cxl_dpa_rwsem); + + return size; +} + +resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled) +{ + resource_size_t base = -1; + + down_read(&cxl_dpa_rwsem); + if (cxled->dpa_res) + base = cxled->dpa_res->start; + up_read(&cxl_dpa_rwsem); + + return base; +} + +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_port *port = cxled_to_port(cxled); + struct device *dev = &cxled->cxld.dev; + int rc; + + down_write(&cxl_dpa_rwsem); + if (!cxled->dpa_res) { + rc = 0; + goto out; + } + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { + dev_dbg(dev, "decoder enabled\n"); + rc = -EBUSY; + goto out; + } + if (cxled->cxld.id != port->hdm_end) { + dev_dbg(dev, "expected decoder%d.%d\n", port->id, + port->hdm_end); + rc = -EBUSY; + goto out; + } + devm_cxl_dpa_release(cxled); + rc = 0; +out: + up_write(&cxl_dpa_rwsem); + return rc; +} + +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, + enum cxl_decoder_mode mode) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *dev = &cxled->cxld.dev; + int rc; + + switch (mode) { + case CXL_DECODER_RAM: + case CXL_DECODER_PMEM: + break; + default: + dev_dbg(dev, "unsupported mode: %d\n", mode); + return -EINVAL; + } + + down_write(&cxl_dpa_rwsem); + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { + rc = -EBUSY; + goto out; + } + + /* + * Only allow modes that are supported by the current partition + * configuration + */ + if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { + dev_dbg(dev, "no available pmem capacity\n"); + rc = -ENXIO; + goto out; + } + if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { + dev_dbg(dev, "no available ram capacity\n"); + rc = -ENXIO; + goto out; + } + + cxled->mode = mode; + rc = 0; +out: + up_write(&cxl_dpa_rwsem); + + return rc; +} + +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + resource_size_t free_ram_start, free_pmem_start; + struct cxl_port *port = cxled_to_port(cxled); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *dev = &cxled->cxld.dev; + resource_size_t start, avail, skip; + struct resource *p, *last; + int rc; + + down_write(&cxl_dpa_rwsem); + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { + dev_dbg(dev, "decoder enabled\n"); + rc = -EBUSY; + goto out; + } + + for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling) + last = p; + if (last) + free_ram_start = last->end + 1; + else + free_ram_start = cxlds->ram_res.start; + + for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling) + last = p; + if (last) + free_pmem_start = last->end + 1; + else + free_pmem_start = cxlds->pmem_res.start; + + if (cxled->mode == CXL_DECODER_RAM) { + start = free_ram_start; + avail = cxlds->ram_res.end - start + 1; + skip = 0; + } else if (cxled->mode == CXL_DECODER_PMEM) { + resource_size_t skip_start, skip_end; + + start = free_pmem_start; + avail = cxlds->pmem_res.end - start + 1; + skip_start = free_ram_start; + skip_end = start - 1; + skip = skip_end - skip_start + 1; + } else { + dev_dbg(dev, "mode not set\n"); + rc = -EINVAL; + goto out; + } + + if (size > avail) { + dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, + cxled->mode == CXL_DECODER_RAM ? "ram" : "pmem", + &avail); + rc = -ENOSPC; + goto out; + } + + rc = __cxl_dpa_reserve(cxled, start, size, skip); +out: + up_write(&cxl_dpa_rwsem); + + if (rc) + return rc; + + return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); +} + static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, int *target_map, void __iomem *hdm, int which, u64 *dpa_base) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 109611318760..fdc1be7db917 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -189,7 +189,76 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, return sysfs_emit(buf, "mixed\n"); } } -static DEVICE_ATTR_RO(mode); + +static ssize_t mode_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + enum cxl_decoder_mode mode; + ssize_t rc; + + if (sysfs_streq(buf, "pmem")) + mode = CXL_DECODER_PMEM; + else if (sysfs_streq(buf, "ram")) + mode = CXL_DECODER_RAM; + else + return -EINVAL; + + rc = cxl_dpa_set_mode(cxled, mode); + if (rc) + return rc; + + return len; +} +static DEVICE_ATTR_RW(mode); + +static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + u64 base = cxl_dpa_resource_start(cxled); + + return sysfs_emit(buf, "%#llx\n", base); +} +static DEVICE_ATTR_RO(dpa_resource); + +static ssize_t dpa_size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + resource_size_t size = cxl_dpa_size(cxled); + + return sysfs_emit(buf, "%pa\n", &size); +} + +static ssize_t dpa_size_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); + unsigned long long size; + ssize_t rc; + + rc = kstrtoull(buf, 0, &size); + if (rc) + return rc; + + if (!IS_ALIGNED(size, SZ_256M)) + return -EINVAL; + + rc = cxl_dpa_free(cxled); + if (rc) + return rc; + + if (size == 0) + return len; + + rc = cxl_dpa_alloc(cxled, size); + if (rc) + return rc; + + return len; +} +static DEVICE_ATTR_RW(dpa_size); static struct attribute *cxl_decoder_base_attrs[] = { &dev_attr_start.attr, @@ -242,6 +311,8 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = { static struct attribute *cxl_decoder_endpoint_attrs[] = { &dev_attr_target_type.attr, &dev_attr_mode.attr, + &dev_attr_dpa_size.attr, + &dev_attr_dpa_resource.attr, NULL, }; From patchwork Fri Jul 15 00:01:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918553 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDF2F7461 for ; Fri, 15 Jul 2022 00:01:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843309; x=1689379309; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bxzdzJ6W/nVe6j8R5tao1r1yoeBQICSH9UM4Ttac/L0=; b=A8tHFjw+84CcUZqPuIRzG+rXUQA3K9a6F+NJm3pf8AAJ1jZ1xQdE1GaH hd55ua30FY5Au12zxew003prGci+uGzTBfTTxN3yVKViod/wL4Hf7+7uV PNVNzd+fixcDl/2GvMRfgxw7MzRi8MkaIsIC6W+xH5Oyr3yVFh7hsWe8Q YnRM70MUMShAM5iE0GAOwjYGzbCgTqwcWKaI5ZuQ2UuR17ketanbCuWQB bqpNNHtrmNP/PABN5NYA28X8dbpAQ4piCJipQuX1/mBQ7qy9zBqshq2sl 8hmEj5JkN0u196OuIK2ekQ8GktpD2m5S13VKZ9o0clrscpB+O+PF05dw6 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="349626533" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="349626533" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:40 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="654092715" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:39 -0700 Subject: [PATCH v2 10/28] cxl/port: Record dport in endpoint references From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:39 -0700 Message-ID: <165784329944.1758207.15203961796832072116.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Recall that the primary role of the cxl_mem driver is to probe if the given endpoint is connected to a CXL port topology. In that process it walks its device ancestry to its PCI root port. If that root port is also a CXL root port then the probe process adds cxl_port object instances at switch in the path between to the root and the endpoint. As those cxl_port instances are added, or if a previous enumeration attempt already created the port, a 'struct cxl_ep' instance is registered with that port to track the endpoints interested in that port. At the time the cxl_ep is registered the downstream egress path from the port to the endpoint is known. Take the opportunity to record that information as it will be needed for dynamic programming of decoder targets during region provisioning. Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/port.c | 52 ++++++++++++++++++++++++++++++++--------------- drivers/cxl/cxl.h | 2 ++ 2 files changed, 37 insertions(+), 17 deletions(-) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index fdc1be7db917..a8d350361548 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -882,8 +882,9 @@ static struct cxl_ep *find_ep(struct cxl_port *port, struct device *ep_dev) return NULL; } -static int add_ep(struct cxl_port *port, struct cxl_ep *new) +static int add_ep(struct cxl_ep *new) { + struct cxl_port *port = new->dport->port; struct cxl_ep *dup; device_lock(&port->dev); @@ -901,14 +902,14 @@ static int add_ep(struct cxl_port *port, struct cxl_ep *new) /** * cxl_add_ep - register an endpoint's interest in a port - * @port: a port in the endpoint's topology ancestry + * @dport: the dport that routes to @ep_dev * @ep_dev: device representing the endpoint * * Intermediate CXL ports are scanned based on the arrival of endpoints. * When those endpoints depart the port can be destroyed once all * endpoints that care about that port have been removed. */ -static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev) +static int cxl_add_ep(struct cxl_dport *dport, struct device *ep_dev) { struct cxl_ep *ep; int rc; @@ -919,8 +920,9 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev) INIT_LIST_HEAD(&ep->list); ep->ep = get_device(ep_dev); + ep->dport = dport; - rc = add_ep(port, ep); + rc = add_ep(ep); if (rc) cxl_ep_release(ep); return rc; @@ -929,11 +931,13 @@ static int cxl_add_ep(struct cxl_port *port, struct device *ep_dev) struct cxl_find_port_ctx { const struct device *dport_dev; const struct cxl_port *parent_port; + struct cxl_dport **dport; }; static int match_port_by_dport(struct device *dev, const void *data) { const struct cxl_find_port_ctx *ctx = data; + struct cxl_dport *dport; struct cxl_port *port; if (!is_cxl_port(dev)) @@ -942,7 +946,10 @@ static int match_port_by_dport(struct device *dev, const void *data) return 0; port = to_cxl_port(dev); - return cxl_find_dport_by_dev(port, ctx->dport_dev) != NULL; + dport = cxl_find_dport_by_dev(port, ctx->dport_dev); + if (ctx->dport) + *ctx->dport = dport; + return dport != NULL; } static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx) @@ -958,24 +965,32 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx) return NULL; } -static struct cxl_port *find_cxl_port(struct device *dport_dev) +static struct cxl_port *find_cxl_port(struct device *dport_dev, + struct cxl_dport **dport) { struct cxl_find_port_ctx ctx = { .dport_dev = dport_dev, + .dport = dport, }; + struct cxl_port *port; - return __find_cxl_port(&ctx); + port = __find_cxl_port(&ctx); + return port; } static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port, - struct device *dport_dev) + struct device *dport_dev, + struct cxl_dport **dport) { struct cxl_find_port_ctx ctx = { .dport_dev = dport_dev, .parent_port = parent_port, + .dport = dport, }; + struct cxl_port *port; - return __find_cxl_port(&ctx); + port = __find_cxl_port(&ctx); + return port; } /* @@ -1060,7 +1075,7 @@ static void cxl_detach_ep(void *data) if (!dport_dev) break; - port = find_cxl_port(dport_dev); + port = find_cxl_port(dport_dev, NULL); if (!port) continue; @@ -1135,6 +1150,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, struct device *dparent = grandparent(dport_dev); struct cxl_port *port, *parent_port = NULL; resource_size_t component_reg_phys; + struct cxl_dport *dport; int rc; if (!dparent) { @@ -1148,7 +1164,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, return -ENXIO; } - parent_port = find_cxl_port(dparent); + parent_port = find_cxl_port(dparent, NULL); if (!parent_port) { /* iterate to create this parent_port */ return -EAGAIN; @@ -1163,13 +1179,14 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, goto out; } - port = find_cxl_port_at(parent_port, dport_dev); + port = find_cxl_port_at(parent_port, dport_dev, &dport); if (!port) { component_reg_phys = find_component_registers(uport_dev); port = devm_cxl_add_port(&parent_port->dev, uport_dev, component_reg_phys, parent_port); + /* retry find to pick up the new dport information */ if (!IS_ERR(port)) - get_device(&port->dev); + port = find_cxl_port_at(parent_port, dport_dev, &dport); } out: device_unlock(&parent_port->dev); @@ -1179,7 +1196,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, else { dev_dbg(&cxlmd->dev, "add to new port %s:%s\n", dev_name(&port->dev), dev_name(port->uport)); - rc = cxl_add_ep(port, &cxlmd->dev); + rc = cxl_add_ep(dport, &cxlmd->dev); if (rc == -EEXIST) { /* * "can't" happen, but this error code means @@ -1213,6 +1230,7 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) for (iter = dev; iter; iter = grandparent(iter)) { struct device *dport_dev = grandparent(iter); struct device *uport_dev; + struct cxl_dport *dport; struct cxl_port *port; if (!dport_dev) @@ -1228,12 +1246,12 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) dev_dbg(dev, "scan: iter: %s dport_dev: %s parent: %s\n", dev_name(iter), dev_name(dport_dev), dev_name(uport_dev)); - port = find_cxl_port(dport_dev); + port = find_cxl_port(dport_dev, &dport); if (port) { dev_dbg(&cxlmd->dev, "found already registered port %s:%s\n", dev_name(&port->dev), dev_name(port->uport)); - rc = cxl_add_ep(port, &cxlmd->dev); + rc = cxl_add_ep(dport, &cxlmd->dev); /* * If the endpoint already exists in the port's list, @@ -1274,7 +1292,7 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL); struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd) { - return find_cxl_port(grandparent(&cxlmd->dev)); + return find_cxl_port(grandparent(&cxlmd->dev), NULL); } EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 70cd24e4f3ce..31f33844279a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -371,10 +371,12 @@ struct cxl_dport { /** * struct cxl_ep - track an endpoint's interest in a port * @ep: device that hosts a generic CXL endpoint (expander or accelerator) + * @dport: which dport routes to this endpoint on @port * @list: node on port->endpoints list */ struct cxl_ep { struct device *ep; + struct cxl_dport *dport; struct list_head list; }; From patchwork Fri Jul 15 00:01:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918556 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38DB77460 for ; Fri, 15 Jul 2022 00:02:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843333; x=1689379333; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hvrXDq/xttwXGrZboalvi3MvsAoY14tzi6rYe/HKduc=; b=Zj6FDfqFVYuKTtOXcWsBGoEjQ+tW94+kvb0iPd+ynCqp9NNGxuffDPR+ bx6sWnbEYpuR8eL4qu/gcGYKHF/WAfscZz2a8inOaAQBQ0eH/Eh/iPJo+ k0t9y5ZDMdPafEB64ANEyBYvNd1zxUheEFQv+e8E3bgmJbEgTSl2EEZ1K o4LKqvtmafNKdoWtgEwm5BTlYBMzCY0dQw1v7jrkuEiUEXbRWygdP+Hq0 +2vhfRTtBQX+5zPmi5dd9iEr2GRy/TZMT111T+zBWA6pQH6VFnDqbAJYq Vjx5HpS5P6fH276Yb1V8A3zKkU7aVXI4b9bvD+/OdACrRCQ5LxEuiV8Aw Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="311320497" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="311320497" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:46 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="685766491" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:45 -0700 Subject: [PATCH v2 11/28] cxl/port: Record parent dport when adding ports From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Jonathan Cameron , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:45 -0700 Message-ID: <165784330511.1758207.16540797912136148491.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 At the time that cxl_port instances are being created, cache the dport from the parent port that points to this new child port. This will be useful for region provisioning when walking the tree to calculate decoder targets, and saves rewalking the dport list after the fact to build this information. Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220624041950.559155-1-dan.j.williams@intel.com Signed-off-by: Dan Williams --- drivers/cxl/acpi.c | 3 +-- drivers/cxl/core/port.c | 27 +++++++++++++++------------ drivers/cxl/cxl.h | 7 +++++-- drivers/cxl/mem.c | 10 ++++++---- 4 files changed, 27 insertions(+), 20 deletions(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 8f021241699f..64004eb672d0 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -211,8 +211,7 @@ static int add_host_bridge_uport(struct device *match, void *arg) if (rc) return rc; - port = devm_cxl_add_port(host, match, dport->component_reg_phys, - root_port); + port = devm_cxl_add_port(host, match, dport->component_reg_phys, dport); if (IS_ERR(port)) return PTR_ERR(port); dev_dbg(host, "%s: add: %s\n", dev_name(match), dev_name(&port->dev)); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index a8d350361548..6d2846404ab8 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -526,7 +526,7 @@ static struct lock_class_key cxl_port_key; static struct cxl_port *cxl_port_alloc(struct device *uport, resource_size_t component_reg_phys, - struct cxl_port *parent_port) + struct cxl_dport *parent_dport) { struct cxl_port *port; struct device *dev; @@ -549,11 +549,13 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, * description. */ dev = &port->dev; - if (parent_port) { + if (parent_dport) { + struct cxl_port *parent_port = parent_dport->port; struct cxl_port *iter; dev->parent = &parent_port->dev; port->depth = parent_port->depth + 1; + port->parent_dport = parent_dport; /* * walk to the host bridge, or the first ancestor that knows @@ -595,24 +597,24 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, * @host: host device for devm operations * @uport: "physical" device implementing this upstream port * @component_reg_phys: (optional) for configurable cxl_port instances - * @parent_port: next hop up in the CXL memory decode hierarchy + * @parent_dport: next hop up in the CXL memory decode hierarchy */ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport, resource_size_t component_reg_phys, - struct cxl_port *parent_port) + struct cxl_dport *parent_dport) { struct cxl_port *port; struct device *dev; int rc; - port = cxl_port_alloc(uport, component_reg_phys, parent_port); + port = cxl_port_alloc(uport, component_reg_phys, parent_dport); if (IS_ERR(port)) return port; dev = &port->dev; if (is_cxl_memdev(uport)) rc = dev_set_name(dev, "endpoint%d", port->id); - else if (parent_port) + else if (parent_dport) rc = dev_set_name(dev, "port%d", port->id); else rc = dev_set_name(dev, "root%d", port->id); @@ -1014,7 +1016,7 @@ static void delete_endpoint(void *data) struct cxl_port *parent_port; struct device *parent; - parent_port = cxl_mem_find_port(cxlmd); + parent_port = cxl_mem_find_port(cxlmd, NULL); if (!parent_port) goto out; parent = &parent_port->dev; @@ -1149,8 +1151,8 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, { struct device *dparent = grandparent(dport_dev); struct cxl_port *port, *parent_port = NULL; + struct cxl_dport *dport, *parent_dport; resource_size_t component_reg_phys; - struct cxl_dport *dport; int rc; if (!dparent) { @@ -1164,7 +1166,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, return -ENXIO; } - parent_port = find_cxl_port(dparent, NULL); + parent_port = find_cxl_port(dparent, &parent_dport); if (!parent_port) { /* iterate to create this parent_port */ return -EAGAIN; @@ -1183,7 +1185,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, if (!port) { component_reg_phys = find_component_registers(uport_dev); port = devm_cxl_add_port(&parent_port->dev, uport_dev, - component_reg_phys, parent_port); + component_reg_phys, parent_dport); /* retry find to pick up the new dport information */ if (!IS_ERR(port)) port = find_cxl_port_at(parent_port, dport_dev, &dport); @@ -1290,9 +1292,10 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) } EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL); -struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd) +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd, + struct cxl_dport **dport) { - return find_cxl_port(grandparent(&cxlmd->dev), NULL); + return find_cxl_port(grandparent(&cxlmd->dev), dport); } EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 31f33844279a..973e0efe4bd4 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -332,6 +332,7 @@ struct cxl_nvdimm { * @id: id for port device-name * @dports: cxl_dport instances referenced by decoders * @endpoints: cxl_ep instances, endpoints that are a descendant of this port + * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids * @hdm_end: track last allocated HDM decoder instance for allocation ordering * @component_reg_phys: component register capability base address (optional) @@ -345,6 +346,7 @@ struct cxl_port { int id; struct list_head dports; struct list_head endpoints; + struct cxl_dport *parent_dport; struct ida decoder_ida; int hdm_end; resource_size_t component_reg_phys; @@ -399,11 +401,12 @@ int devm_cxl_register_pci_bus(struct device *host, struct device *uport, struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port); struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport, resource_size_t component_reg_phys, - struct cxl_port *parent_port); + struct cxl_dport *parent_dport); struct cxl_port *find_cxl_root(struct device *dev); int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd); int cxl_bus_rescan(void); -struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd); +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd, + struct cxl_dport **dport); bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd); struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port, diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 7513bea55145..2786d3402c9e 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -26,14 +26,15 @@ */ static int create_endpoint(struct cxl_memdev *cxlmd, - struct cxl_port *parent_port) + struct cxl_dport *parent_dport) { + struct cxl_port *parent_port = parent_dport->port; struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_port *endpoint; int rc; endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev, - cxlds->component_reg_phys, parent_port); + cxlds->component_reg_phys, parent_dport); if (IS_ERR(endpoint)) return PTR_ERR(endpoint); @@ -76,6 +77,7 @@ static int cxl_mem_probe(struct device *dev) { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_port *parent_port; + struct cxl_dport *dport; struct dentry *dentry; int rc; @@ -100,7 +102,7 @@ static int cxl_mem_probe(struct device *dev) if (rc) return rc; - parent_port = cxl_mem_find_port(cxlmd); + parent_port = cxl_mem_find_port(cxlmd, &dport); if (!parent_port) { dev_err(dev, "CXL port topology not found\n"); return -ENXIO; @@ -114,7 +116,7 @@ static int cxl_mem_probe(struct device *dev) goto unlock; } - rc = create_endpoint(cxlmd, parent_port); + rc = create_endpoint(cxlmd, dport); unlock: device_unlock(&parent_port->dev); put_device(&parent_port->dev); From patchwork Fri Jul 15 00:01:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918554 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA63A7460 for ; Fri, 15 Jul 2022 00:02:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843323; x=1689379323; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x6DEwfrNxtFEjZVK3XQKyoDaLt7xeCbDUji9zrzl8Ps=; b=DKFSIBRjtqH1Xx1dLP6bUJ4set5hbYDHBHtdp0LNPQlZ3jnJ4/E909yC IKWsmjPRsmNwIga7f1G2Y4YS1wybkA/WHCKATjmFOSSv62OUCWXGDlCL5 TPMEKYW6m8LPV5PU3mAo/noGv1KzjeJlESizAcP+WLwFBIJuoBsM5Ortl ABD1+phHzqXuonGhClMV7BQ/3akUWftoegtAbMrhbGbMj7FgkjN7Bd6Qy NZ6emxH+UWT5wmSoE8x2G7Kj0zazBhNaqAVb1MaPToYmTfGTcZrhHAx+z NCkVwOV+nxCx7OpFPqE48wuFPkPayFPvw/Bc1sRxsN6Fo1Hc9yHdF9SAc A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="349626637" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="349626637" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:51 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="628896816" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:51 -0700 Subject: [PATCH v2 12/28] cxl/port: Move 'cxl_ep' references to an xarray per port From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Jonathan Cameron , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:51 -0700 Message-ID: <165784331102.1758207.16035137217204481073.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for region provisioning that needs to walk the topology by endpoints, use an xarray to record endpoint interest in a given port. In addition to being more space and time efficient it also reduces the complexity of the implementation by moving locking internal to the xarray implementation. It also allows for a single cxl_ep reference to be recorded in multiple xarrays. Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220624041950.559155-2-dan.j.williams@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/port.c | 60 +++++++++++++++++++++++------------------------ drivers/cxl/cxl.h | 4 +-- 2 files changed, 30 insertions(+), 34 deletions(-) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 6d2846404ab8..727d861e21db 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -431,22 +431,27 @@ static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev) static void cxl_ep_release(struct cxl_ep *ep) { - if (!ep) - return; - list_del(&ep->list); put_device(ep->ep); kfree(ep); } +static void cxl_ep_remove(struct cxl_port *port, struct cxl_ep *ep) +{ + if (!ep) + return; + xa_erase(&port->endpoints, (unsigned long) ep->ep); + cxl_ep_release(ep); +} + static void cxl_port_release(struct device *dev) { struct cxl_port *port = to_cxl_port(dev); - struct cxl_ep *ep, *_e; + unsigned long index; + struct cxl_ep *ep; - device_lock(dev); - list_for_each_entry_safe(ep, _e, &port->endpoints, list) - cxl_ep_release(ep); - device_unlock(dev); + xa_for_each(&port->endpoints, index, ep) + cxl_ep_remove(port, ep); + xa_destroy(&port->endpoints); ida_free(&cxl_port_ida, port->id); kfree(port); } @@ -577,7 +582,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, ida_init(&port->decoder_ida); port->hdm_end = -1; INIT_LIST_HEAD(&port->dports); - INIT_LIST_HEAD(&port->endpoints); + xa_init(&port->endpoints); device_initialize(dev); lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth); @@ -873,33 +878,21 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port, } EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, CXL); -static struct cxl_ep *find_ep(struct cxl_port *port, struct device *ep_dev) -{ - struct cxl_ep *ep; - - device_lock_assert(&port->dev); - list_for_each_entry(ep, &port->endpoints, list) - if (ep->ep == ep_dev) - return ep; - return NULL; -} - static int add_ep(struct cxl_ep *new) { struct cxl_port *port = new->dport->port; - struct cxl_ep *dup; + int rc; device_lock(&port->dev); if (port->dead) { device_unlock(&port->dev); return -ENXIO; } - dup = find_ep(port, new->ep); - if (!dup) - list_add_tail(&new->list, &port->endpoints); + rc = xa_insert(&port->endpoints, (unsigned long)new->ep, new, + GFP_KERNEL); device_unlock(&port->dev); - return dup ? -EEXIST : 0; + return rc; } /** @@ -920,7 +913,6 @@ static int cxl_add_ep(struct cxl_dport *dport, struct device *ep_dev) if (!ep) return -ENOMEM; - INIT_LIST_HEAD(&ep->list); ep->ep = get_device(ep_dev); ep->dport = dport; @@ -1063,6 +1055,12 @@ static void delete_switch_port(struct cxl_port *port, struct list_head *dports) devm_release_action(port->dev.parent, unregister_port, port); } +static struct cxl_ep *cxl_ep_load(struct cxl_port *port, + struct cxl_memdev *cxlmd) +{ + return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev); +} + static void cxl_detach_ep(void *data) { struct cxl_memdev *cxlmd = data; @@ -1101,11 +1099,11 @@ static void cxl_detach_ep(void *data) } device_lock(&port->dev); - ep = find_ep(port, &cxlmd->dev); + ep = cxl_ep_load(port, cxlmd); dev_dbg(&cxlmd->dev, "disconnect %s from %s\n", ep ? dev_name(ep->ep) : "", dev_name(&port->dev)); - cxl_ep_release(ep); - if (ep && !port->dead && list_empty(&port->endpoints) && + cxl_ep_remove(port, ep); + if (ep && !port->dead && xa_empty(&port->endpoints) && !is_cxl_root(parent_port)) { /* * This was the last ep attached to a dynamically @@ -1199,7 +1197,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, dev_dbg(&cxlmd->dev, "add to new port %s:%s\n", dev_name(&port->dev), dev_name(port->uport)); rc = cxl_add_ep(dport, &cxlmd->dev); - if (rc == -EEXIST) { + if (rc == -EBUSY) { /* * "can't" happen, but this error code means * something to the caller, so translate it. @@ -1262,7 +1260,7 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) * the parent_port lock as the current port may be being * reaped. */ - if (rc && rc != -EEXIST) { + if (rc && rc != -EBUSY) { put_device(&port->dev); return rc; } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 973e0efe4bd4..de5cb8288cd4 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -345,7 +345,7 @@ struct cxl_port { struct device *host_bridge; int id; struct list_head dports; - struct list_head endpoints; + struct xarray endpoints; struct cxl_dport *parent_dport; struct ida decoder_ida; int hdm_end; @@ -374,12 +374,10 @@ struct cxl_dport { * struct cxl_ep - track an endpoint's interest in a port * @ep: device that hosts a generic CXL endpoint (expander or accelerator) * @dport: which dport routes to this endpoint on @port - * @list: node on port->endpoints list */ struct cxl_ep { struct device *ep; struct cxl_dport *dport; - struct list_head list; }; /* From patchwork Fri Jul 15 00:01:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918555 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 253577460 for ; Fri, 15 Jul 2022 00:02:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843328; x=1689379328; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7GZsp765NhD7A40eOR8AtSk30NgIqkrFvVTX1pphqJ8=; b=cOViZmCM6bEts/i52GQZd2vADq5SW1rfuz/eNkSr0pJN+CkV/hgBuruB lrrRdKAtiqmk/zdKbGjOXXe1W1gADYGcw7SHR3tIjjBHn49EkZarm5crh gIHCA2cHr/vHc0suwsRkibSiA7G6j0YGOf+LjtVkDifGYoC+v4/AxuEhk k1ezT5uGTPlzvlP9O0JrdfAh+1fdf+DerkeSsXS/65kF4wG2UHcTltEkX p5Nk6//qft6A5RugnTrLvvpde8KCwuwN65OTQZJZp8P3UyBU6SlZFg6c/ y3rVUFIAhVUGgMU6t3p+3+rKDl/vrLm6NFEJCXM2PwnO/VRpg+4vFX0A2 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="284417819" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="284417819" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:57 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="923290629" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:01:56 -0700 Subject: [PATCH v2 13/28] cxl/port: Move dport tracking to an xarray From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:01:56 -0700 Message-ID: <165784331647.1758207.6345820282285119339.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Reduce the complexity and the overhead of walking the topology to determine endpoint connectivity to root decoder interleave configurations. Note that cxl_detach_ep(), after it determines that the last @ep has departed and decides to delete the port, now needs to walk the dport array with the device_lock() held to remove entries. Previously list_splice_init() could be used atomically delete all dport entries at once and then perform entry tear down outside the lock. There is no list_splice_init() equivalent for the xarray. Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/hdm.c | 6 ++- drivers/cxl/core/port.c | 85 ++++++++++++++++++++--------------------------- drivers/cxl/cxl.h | 12 ++++--- 3 files changed, 47 insertions(+), 56 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 596b57fb60df..4a0325b02ca4 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -50,8 +50,9 @@ static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, int devm_cxl_add_passthrough_decoder(struct cxl_port *port) { struct cxl_switch_decoder *cxlsd; - struct cxl_dport *dport; + struct cxl_dport *dport = NULL; int single_port_map[1]; + unsigned long index; cxlsd = cxl_switch_decoder_alloc(port, 1); if (IS_ERR(cxlsd)) @@ -59,7 +60,8 @@ int devm_cxl_add_passthrough_decoder(struct cxl_port *port) device_lock_assert(&port->dev); - dport = list_first_entry(&port->dports, typeof(*dport), list); + xa_for_each(&port->dports, index, dport) + break; single_port_map[0] = dport->port_id; return add_hdm_decoder(port, &cxlsd->cxld, single_port_map); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 727d861e21db..b2c44e7ef6a8 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -452,6 +452,7 @@ static void cxl_port_release(struct device *dev) xa_for_each(&port->endpoints, index, ep) cxl_ep_remove(port, ep); xa_destroy(&port->endpoints); + xa_destroy(&port->dports); ida_free(&cxl_port_ida, port->id); kfree(port); } @@ -581,7 +582,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, port->component_reg_phys = component_reg_phys; ida_init(&port->decoder_ida); port->hdm_end = -1; - INIT_LIST_HEAD(&port->dports); + xa_init(&port->dports); xa_init(&port->endpoints); device_initialize(dev); @@ -711,17 +712,13 @@ static int match_root_child(struct device *dev, const void *match) return 0; port = to_cxl_port(dev); - device_lock(dev); - list_for_each_entry(dport, &port->dports, list) { - iter = match; - while (iter) { - if (iter == dport->dport) - goto out; - iter = iter->parent; - } + iter = match; + while (iter) { + dport = cxl_find_dport_by_dev(port, iter); + if (dport) + break; + iter = iter->parent; } -out: - device_unlock(dev); return !!iter; } @@ -745,9 +742,10 @@ EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL); static struct cxl_dport *find_dport(struct cxl_port *port, int id) { struct cxl_dport *dport; + unsigned long index; device_lock_assert(&port->dev); - list_for_each_entry (dport, &port->dports, list) + xa_for_each(&port->dports, index, dport) if (dport->port_id == id) return dport; return NULL; @@ -759,15 +757,15 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *new) device_lock_assert(&port->dev); dup = find_dport(port, new->port_id); - if (dup) + if (dup) { dev_err(&port->dev, "unable to add dport%d-%s non-unique port id (%s)\n", new->port_id, dev_name(new->dport), dev_name(dup->dport)); - else - list_add_tail(&new->list, &port->dports); - - return dup ? -EEXIST : 0; + return -EBUSY; + } + return xa_insert(&port->dports, (unsigned long)new->dport, new, + GFP_KERNEL); } /* @@ -794,10 +792,8 @@ static void cxl_dport_remove(void *data) struct cxl_dport *dport = data; struct cxl_port *port = dport->port; + xa_erase(&port->dports, (unsigned long) dport->dport); put_device(dport->dport); - cond_cxl_root_lock(port); - list_del(&dport->list); - cond_cxl_root_unlock(port); } static void cxl_dport_unlink(void *data) @@ -849,7 +845,6 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port, if (!dport) return ERR_PTR(-ENOMEM); - INIT_LIST_HEAD(&dport->list); dport->dport = dport_dev; dport->port_id = port_id; dport->component_reg_phys = component_reg_phys; @@ -1040,19 +1035,27 @@ EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL); * for a port to be unregistered is when all memdevs beneath that port have gone * through ->remove(). This "bottom-up" removal selectively removes individual * child ports manually. This depends on devm_cxl_add_port() to not change is - * devm action registration order. + * devm action registration order, and for dports to have already been + * destroyed by reap_dports(). */ -static void delete_switch_port(struct cxl_port *port, struct list_head *dports) +static void delete_switch_port(struct cxl_port *port) +{ + devm_release_action(port->dev.parent, cxl_unlink_uport, port); + devm_release_action(port->dev.parent, unregister_port, port); +} + +static void reap_dports(struct cxl_port *port) { - struct cxl_dport *dport, *_d; + struct cxl_dport *dport; + unsigned long index; - list_for_each_entry_safe(dport, _d, dports, list) { + device_lock_assert(&port->dev); + + xa_for_each(&port->dports, index, dport) { devm_release_action(&port->dev, cxl_dport_unlink, dport); devm_release_action(&port->dev, cxl_dport_remove, dport); devm_kfree(&port->dev, dport); } - devm_release_action(port->dev.parent, cxl_unlink_uport, port); - devm_release_action(port->dev.parent, unregister_port, port); } static struct cxl_ep *cxl_ep_load(struct cxl_port *port, @@ -1069,8 +1072,8 @@ static void cxl_detach_ep(void *data) for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) { struct device *dport_dev = grandparent(iter); struct cxl_port *port, *parent_port; - LIST_HEAD(reap_dports); struct cxl_ep *ep; + bool died = false; if (!dport_dev) break; @@ -1110,15 +1113,16 @@ static void cxl_detach_ep(void *data) * enumerated port. Block new cxl_add_ep() and garbage * collect the port. */ + died = true; port->dead = true; - list_splice_init(&port->dports, &reap_dports); + reap_dports(port); } device_unlock(&port->dev); - if (!list_empty(&reap_dports)) { + if (died) { dev_dbg(&cxlmd->dev, "delete %s\n", dev_name(&port->dev)); - delete_switch_port(port, &reap_dports); + delete_switch_port(port); } put_device(&port->dev); device_unlock(&parent_port->dev); @@ -1297,23 +1301,6 @@ struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd, } EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL); -struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port, - const struct device *dev) -{ - struct cxl_dport *dport; - - device_lock(&port->dev); - list_for_each_entry(dport, &port->dports, list) - if (dport->dport == dev) { - device_unlock(&port->dev); - return dport; - } - - device_unlock(&port->dev); - return NULL; -} -EXPORT_SYMBOL_NS_GPL(cxl_find_dport_by_dev, CXL); - static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd, struct cxl_port *port, int *target_map) { @@ -1324,7 +1311,7 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd, device_lock_assert(&port->dev); - if (list_empty(&port->dports)) + if (xa_empty(&port->dports)) return -EINVAL; write_seqlock(&cxlsd->target_lock); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index de5cb8288cd4..bf5f0c305115 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -344,7 +344,7 @@ struct cxl_port { struct device *uport; struct device *host_bridge; int id; - struct list_head dports; + struct xarray dports; struct xarray endpoints; struct cxl_dport *parent_dport; struct ida decoder_ida; @@ -354,20 +354,24 @@ struct cxl_port { unsigned int depth; }; +static inline struct cxl_dport * +cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev) +{ + return xa_load(&port->dports, (unsigned long)dport_dev); +} + /** * struct cxl_dport - CXL downstream port * @dport: PCI bridge or firmware device representing the downstream link * @port_id: unique hardware identifier for dport in decoder target list * @component_reg_phys: downstream port component registers * @port: reference to cxl_port that contains this downstream port - * @list: node for a cxl_port's list of cxl_dport instances */ struct cxl_dport { struct device *dport; int port_id; resource_size_t component_reg_phys; struct cxl_port *port; - struct list_head list; }; /** @@ -410,8 +414,6 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd); struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port, struct device *dport, int port_id, resource_size_t component_reg_phys); -struct cxl_dport *cxl_find_dport_by_dev(struct cxl_port *port, - const struct device *dev); struct cxl_decoder *to_cxl_decoder(struct device *dev); struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev); From patchwork Fri Jul 15 00:02:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918558 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6EB27460 for ; Fri, 15 Jul 2022 00:02:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843350; x=1689379350; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MHmWXe/fj8s99H1gM2mKlKkxWW6tK1PQHi8PV+spXQI=; b=nU68YlKyRt4VCPPaKJrlgekFnkpIa7oBHNk0ocLvCXrokD11gjgYdbAG HeeiyqX86ur+dzAqP/XpL2Ekq6AdbqmJd7pDe0vHDaAmURtAeLpqcgpAd ZtkNadV+p60+SiT3BIcQ7u0lNUmSDfMkAbThfYxyb+VgfxHlUL+4wpNfj JtmTPoOvXoxxx4grQLnOgYEYatQ4UBdZ9MRwiNNXltXWq0OEv2R6cW9Em 1Wpr7ap37aJfSky4/ayQAbRP+8z46vnSzu3yTiTLvjkXTeHjs4/5MYFpj Gusryf8gF0EVVWPthZouUUNUUA+eNs3B2gLGeh9g3mE+fr5JpVFE/vQxG A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="266073625" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="266073625" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:03 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="546461988" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:02 -0700 Subject: [PATCH v2 14/28] cxl/hdm: Add sysfs attributes for interleave ways + granularity From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:02 -0700 Message-ID: <165784332235.1758207.7185062713652694607.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Ben Widawsky The region provisioning flow involves selecting interleave ways + granularity settings for a region, and then programming the decoder topology to meet those constraints, if possible. For example, root decoders set the minimum interleave ways + granularity for any hosted regions. Given decoder programming is not atomic and collisions can occur between multiple requesting regions userspace will be responsible for conflict resolution and it needs these attributes to make those decisions. Signed-off-by: Ben Widawsky [djbw: reword changelog, make read-only, add sysfs ABI documentaion] Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 27 +++++++++++++++++++++++++++ drivers/cxl/core/port.c | 23 +++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 9b6cc7cdc73b..0362ae98218e 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -230,3 +230,30 @@ Description: allocations are enforced to occur in increasing 'decoderX.Y/id' order and frees are enforced to occur in decreasing 'decoderX.Y/id' order. + + +What: /sys/bus/cxl/devices/decoderX.Y/interleave_ways +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) The number of targets across which this decoder's host + physical address (HPA) memory range is interleaved. The device + maps every Nth block of HPA (of size == + 'interleave_granularity') to consecutive DPA addresses. The + decoder's position in the interleave is determined by the + device's (endpoint or switch) switch ancestry. For root + decoders their interleave is specified by platform firmware and + they only specify a downstream target order for host bridges. + + +What: /sys/bus/cxl/devices/decoderX.Y/interleave_granularity +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) The number of consecutive bytes of host physical address + space this decoder claims at address N before the decode rotates + to the next target in the interleave at address N + + interleave_granularity (assuming N is aligned to + interleave_granularity). diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index b2c44e7ef6a8..a43735f349d6 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -260,10 +260,33 @@ static ssize_t dpa_size_store(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_RW(dpa_size); +static ssize_t interleave_granularity_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cxl_decoder *cxld = to_cxl_decoder(dev); + + return sysfs_emit(buf, "%d\n", cxld->interleave_granularity); +} + +static DEVICE_ATTR_RO(interleave_granularity); + +static ssize_t interleave_ways_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct cxl_decoder *cxld = to_cxl_decoder(dev); + + return sysfs_emit(buf, "%d\n", cxld->interleave_ways); +} + +static DEVICE_ATTR_RO(interleave_ways); + static struct attribute *cxl_decoder_base_attrs[] = { &dev_attr_start.attr, &dev_attr_size.attr, &dev_attr_locked.attr, + &dev_attr_interleave_granularity.attr, + &dev_attr_interleave_ways.attr, NULL, }; From patchwork Fri Jul 15 00:02:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918557 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B874C7460 for ; Fri, 15 Jul 2022 00:02:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843347; x=1689379347; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MzYsXLz52hUwHRWQxC1ck79lMKivl+7wU+DUNvWzc6A=; b=chgclW/nrMKI4m+U6qB1S8zgntABatuCO7wwHgWla9Y1eDnpO0hS5jdr Ki8UaNZjAkczPAzCDROdmnXOyADHa/seuTTrUlWh9ibgYi7rZyEA0uRNG npFiaPzlTyFyFjBjmHD6h59as9BGSKw0Aeujo6A/qKT+1fbai8QVxyY3O MifYRsfGlWPnwZSsK2mEBFVA5vhvGGQewpAGdCBkFkbsKS/MLNTUN7T57 sp7fpzQTNwaeGpnO5qEqcRjWU+3L4oUrRER82OBbDXnY29EKNq9bGF/D6 l9JAR8hmW88o9qCxtHJjk48hgjccxsjKj27U1XCA7yfzZqUVhEGAuZOfX w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="349626824" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="349626824" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:08 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="596276237" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:08 -0700 Subject: [PATCH v2 15/28] cxl/mem: Enumerate port targets before adding endpoints From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Jonathan Cameron , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:07 -0700 Message-ID: <165784332785.1758207.6457933746912970734.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The port scanning algorithm in devm_cxl_enumerate_ports() walks up the topology and adds cxl_port objects starting from the root down to the endpoint. When those ports are initially created they know all their dports, but they do not know the downstream cxl_port instance that represents the next descendant in the topology. Rework create_endpoint() into devm_cxl_add_endpoint() that enumerates the downstream cxl_port topology into each port's 'struct cxl_ep' record for each endpoint it that the port is an ancestor. Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220624041950.559155-7-dan.j.williams@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/port.c | 41 +++++++++++++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 5 +++++ drivers/cxl/mem.c | 30 +----------------------------- 3 files changed, 47 insertions(+), 29 deletions(-) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index a43735f349d6..4907db798ad9 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1087,6 +1087,47 @@ static struct cxl_ep *cxl_ep_load(struct cxl_port *port, return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev); } +int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd, + struct cxl_dport *parent_dport) +{ + struct cxl_port *parent_port = parent_dport->port; + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct cxl_port *endpoint, *iter, *down; + int rc; + + /* + * Now that the path to the root is established record all the + * intervening ports in the chain. + */ + for (iter = parent_port, down = NULL; !is_cxl_root(iter); + down = iter, iter = to_cxl_port(iter->dev.parent)) { + struct cxl_ep *ep; + + ep = cxl_ep_load(iter, cxlmd); + ep->next = down; + } + + endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev, + cxlds->component_reg_phys, parent_dport); + if (IS_ERR(endpoint)) + return PTR_ERR(endpoint); + + dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev)); + + rc = cxl_endpoint_autoremove(cxlmd, endpoint); + if (rc) + return rc; + + if (!endpoint->dev.driver) { + dev_err(&cxlmd->dev, "%s failed probe\n", + dev_name(&endpoint->dev)); + return -ENXIO; + } + + return 0; +} +EXPORT_SYMBOL_NS_GPL(devm_cxl_add_endpoint, CXL); + static void cxl_detach_ep(void *data) { struct cxl_memdev *cxlmd = data; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index bf5f0c305115..a108d5c288ca 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -378,10 +378,13 @@ struct cxl_dport { * struct cxl_ep - track an endpoint's interest in a port * @ep: device that hosts a generic CXL endpoint (expander or accelerator) * @dport: which dport routes to this endpoint on @port + * @next: cxl switch port across the link attached to @dport NULL if + * attached to an endpoint */ struct cxl_ep { struct device *ep; struct cxl_dport *dport; + struct cxl_port *next; }; /* @@ -404,6 +407,8 @@ struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port); struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport, resource_size_t component_reg_phys, struct cxl_dport *parent_dport); +int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd, + struct cxl_dport *parent_dport); struct cxl_port *find_cxl_root(struct device *dev); int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd); int cxl_bus_rescan(void); diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 2786d3402c9e..64ccf053d32c 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -25,34 +25,6 @@ * in higher level operations. */ -static int create_endpoint(struct cxl_memdev *cxlmd, - struct cxl_dport *parent_dport) -{ - struct cxl_port *parent_port = parent_dport->port; - struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_port *endpoint; - int rc; - - endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev, - cxlds->component_reg_phys, parent_dport); - if (IS_ERR(endpoint)) - return PTR_ERR(endpoint); - - dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev)); - - rc = cxl_endpoint_autoremove(cxlmd, endpoint); - if (rc) - return rc; - - if (!endpoint->dev.driver) { - dev_err(&cxlmd->dev, "%s failed probe\n", - dev_name(&endpoint->dev)); - return -ENXIO; - } - - return 0; -} - static void enable_suspend(void *data) { cxl_mem_active_dec(); @@ -116,7 +88,7 @@ static int cxl_mem_probe(struct device *dev) goto unlock; } - rc = create_endpoint(cxlmd, dport); + rc = devm_cxl_add_endpoint(cxlmd, dport); unlock: device_unlock(&parent_port->dev); put_device(&parent_port->dev); From patchwork Fri Jul 15 00:02:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918596 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AB047460 for ; Fri, 15 Jul 2022 00:03:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843394; x=1689379394; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UDYAt3M3nQnYVxO8RqGbN31McxI7qhhgMZQ6vq7TSG0=; b=bt05UXkhUovA84bPJeD2WYgoyDbrqE481onuNn6yvdZyAnrNrsDJaBAR 1ai7PqxFkeKAe8CZsR+Tf8BforxgerK140hJrLDWxX6KT2GAsZDuqEJ+X zyzBSjMgb7rzUal4seGk4QAlwUt7eeSplVBgR+psouSBjbmHYdXOhJAZ/ Ief0OTwQq5HUtHkayRIYnzyLVzdUVoy/FTTRSEgPhEtvWQHR3NwT20sIi W8HhJ/DRnlnOA2xzAVLtaJZSZ5gAXWVbkSbq6VrPpDqsAzAEsQMslnG+F o8JYrRzZjtFQJXGI5urErHs0Y9UUGgtD1pdcnD2Y0rJk1Oyjg0OKi7dft A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="285684017" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="285684017" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:14 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="842326104" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:13 -0700 Subject: [PATCH v2 16/28] resource: Introduce alloc_free_mem_region() From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Jason Gunthorpe , Matthew Wilcox , Christoph Hellwig , nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:13 -0700 Message-ID: <165784333333.1758207.13703329337805274043.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The core of devm_request_free_mem_region() is a helper that searches for free space in iomem_resource and performs __request_region_locked() on the result of that search. The policy choices of the implementation conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is immediately marked busy, and a preference to search for the first-fit free range in descending order from the top of the physical address space. CXL has a need for a similar allocator, but with the following tweaks: 1/ Search for free space in ascending order 2/ Search for free space relative to a given CXL window 3/ 'insert' rather than 'request' the new resource given downstream drivers from the CXL Region driver (like the pmem or dax drivers) are responsible for request_mem_region() when they activate the memory range. Rework __request_free_mem_region() into get_free_mem_region() which takes a set of GFR_* (Get Free Region) flags to control the allocation policy (ascending vs descending), and "busy" policy (insert_resource() vs request_region()). As part of the consolidation of the legacy GFR_REQUEST_REGION case with the new default of just inserting a new resource into the free space some minor cleanups like not checking for NULL before calling devres_free() (which does its own check) is included. Suggested-by: Jason Gunthorpe Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/ Cc: Matthew Wilcox Cc: Christoph Hellwig Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- include/linux/ioport.h | 2 + kernel/resource.c | 178 +++++++++++++++++++++++++++++++++++++++--------- mm/Kconfig | 5 + 3 files changed, 150 insertions(+), 35 deletions(-) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 79d1ad6d6275..616b683563a9 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -330,6 +330,8 @@ struct resource *devm_request_free_mem_region(struct device *dev, struct resource *base, unsigned long size); struct resource *request_free_mem_region(struct resource *base, unsigned long size, const char *name); +struct resource *alloc_free_mem_region(struct resource *base, + unsigned long size, unsigned long align, const char *name); static inline void irqresource_disabled(struct resource *res, u32 irq) { diff --git a/kernel/resource.c b/kernel/resource.c index 53a534db350e..4c5e80b92f2f 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -489,8 +489,9 @@ int __weak page_is_ram(unsigned long pfn) } EXPORT_SYMBOL_GPL(page_is_ram); -static int __region_intersects(resource_size_t start, size_t size, - unsigned long flags, unsigned long desc) +static int __region_intersects(struct resource *parent, resource_size_t start, + size_t size, unsigned long flags, + unsigned long desc) { struct resource res; int type = 0; int other = 0; @@ -499,7 +500,7 @@ static int __region_intersects(resource_size_t start, size_t size, res.start = start; res.end = start + size - 1; - for (p = iomem_resource.child; p ; p = p->sibling) { + for (p = parent->child; p ; p = p->sibling) { bool is_type = (((p->flags & flags) == flags) && ((desc == IORES_DESC_NONE) || (desc == p->desc))); @@ -543,7 +544,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags, int ret; read_lock(&resource_lock); - ret = __region_intersects(start, size, flags, desc); + ret = __region_intersects(&iomem_resource, start, size, flags, desc); read_unlock(&resource_lock); return ret; @@ -1780,62 +1781,139 @@ void resource_list_free(struct list_head *head) } EXPORT_SYMBOL(resource_list_free); -#ifdef CONFIG_DEVICE_PRIVATE -static struct resource *__request_free_mem_region(struct device *dev, - struct resource *base, unsigned long size, const char *name) +#ifdef CONFIG_GET_FREE_REGION +#define GFR_DESCENDING (1UL << 0) +#define GFR_REQUEST_REGION (1UL << 1) +#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT) + +static resource_size_t gfr_start(struct resource *base, resource_size_t size, + resource_size_t align, unsigned long flags) +{ + if (flags & GFR_DESCENDING) { + resource_size_t end; + + end = min_t(resource_size_t, base->end, + (1ULL << MAX_PHYSMEM_BITS) - 1); + return end - size + 1; + } + + return ALIGN(base->start, align); +} + +static bool gfr_continue(struct resource *base, resource_size_t addr, + resource_size_t size, unsigned long flags) +{ + if (flags & GFR_DESCENDING) + return addr > size && addr >= base->start; + /* + * In the ascend case be careful that the last increment by + * @size did not wrap 0. + */ + return addr > addr - size && + addr <= min_t(resource_size_t, base->end, + (1ULL << MAX_PHYSMEM_BITS) - 1); +} + +static resource_size_t gfr_next(resource_size_t addr, resource_size_t size, + unsigned long flags) +{ + if (flags & GFR_DESCENDING) + return addr - size; + return addr + size; +} + +static void remove_free_mem_region(void *_res) +{ + struct resource *res = _res; + + if (res->parent) + remove_resource(res); + free_resource(res); +} + +static struct resource * +get_free_mem_region(struct device *dev, struct resource *base, + resource_size_t size, const unsigned long align, + const char *name, const unsigned long desc, + const unsigned long flags) { - resource_size_t end, addr; + resource_size_t addr; struct resource *res; struct region_devres *dr = NULL; - size = ALIGN(size, 1UL << PA_SECTION_SHIFT); - end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1); - addr = end - size + 1UL; + size = ALIGN(size, align); res = alloc_resource(GFP_KERNEL); if (!res) return ERR_PTR(-ENOMEM); - if (dev) { + if (dev && (flags & GFR_REQUEST_REGION)) { dr = devres_alloc(devm_region_release, sizeof(struct region_devres), GFP_KERNEL); if (!dr) { free_resource(res); return ERR_PTR(-ENOMEM); } + } else if (dev) { + if (devm_add_action_or_reset(dev, remove_free_mem_region, res)) + return ERR_PTR(-ENOMEM); } write_lock(&resource_lock); - for (; addr > size && addr >= base->start; addr -= size) { - if (__region_intersects(addr, size, 0, IORES_DESC_NONE) != - REGION_DISJOINT) + for (addr = gfr_start(base, size, align, flags); + gfr_continue(base, addr, size, flags); + addr = gfr_next(addr, size, flags)) { + if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) != + REGION_DISJOINT) continue; - if (__request_region_locked(res, &iomem_resource, addr, size, - name, 0)) - break; + if (flags & GFR_REQUEST_REGION) { + if (__request_region_locked(res, &iomem_resource, addr, + size, name, 0)) + break; - if (dev) { - dr->parent = &iomem_resource; - dr->start = addr; - dr->n = size; - devres_add(dev, dr); - } + if (dev) { + dr->parent = &iomem_resource; + dr->start = addr; + dr->n = size; + devres_add(dev, dr); + } - res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY; - write_unlock(&resource_lock); + res->desc = desc; + write_unlock(&resource_lock); + + + /* + * A driver is claiming this region so revoke any + * mappings. + */ + revoke_iomem(res); + } else { + res->start = addr; + res->end = addr + size - 1; + res->name = name; + res->desc = desc; + res->flags = IORESOURCE_MEM; + + /* + * Only succeed if the resource hosts an exclusive + * range after the insert + */ + if (__insert_resource(base, res) || res->child) + break; + + write_unlock(&resource_lock); + } - /* - * A driver is claiming this region so revoke any mappings. - */ - revoke_iomem(res); return res; } write_unlock(&resource_lock); - free_resource(res); - if (dr) + if (flags & GFR_REQUEST_REGION) { + free_resource(res); devres_free(dr); + } else if (dev) + devm_release_action(dev, remove_free_mem_region, res); return ERR_PTR(-ERANGE); } @@ -1854,18 +1932,48 @@ static struct resource *__request_free_mem_region(struct device *dev, struct resource *devm_request_free_mem_region(struct device *dev, struct resource *base, unsigned long size) { - return __request_free_mem_region(dev, base, size, dev_name(dev)); + unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION; + + return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN, + dev_name(dev), + IORES_DESC_DEVICE_PRIVATE_MEMORY, flags); } EXPORT_SYMBOL_GPL(devm_request_free_mem_region); struct resource *request_free_mem_region(struct resource *base, unsigned long size, const char *name) { - return __request_free_mem_region(NULL, base, size, name); + unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION; + + return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name, + IORES_DESC_DEVICE_PRIVATE_MEMORY, flags); } EXPORT_SYMBOL_GPL(request_free_mem_region); -#endif /* CONFIG_DEVICE_PRIVATE */ +/** + * alloc_free_mem_region - find a free region relative to @base + * @base: resource that will parent the new resource + * @size: size in bytes of memory to allocate from @base + * @align: alignment requirements for the allocation + * @name: resource name + * + * Buses like CXL, that can dynamically instantiate new memory regions, + * need a method to allocate physical address space for those regions. + * Allocate and insert a new resource to cover a free, unclaimed by a + * descendant of @base, range in the span of @base. + */ +struct resource *alloc_free_mem_region(struct resource *base, + unsigned long size, unsigned long align, + const char *name) +{ + /* Default of ascending direction and insert resource */ + unsigned long flags = 0; + + return get_free_mem_region(NULL, base, size, align, name, + IORES_DESC_NONE, flags); +} +EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL); +#endif /* CONFIG_GET_FREE_REGION */ static int __init strict_iomem(char *str) { diff --git a/mm/Kconfig b/mm/Kconfig index 169e64192e48..a5b4fee2e3fd 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -994,9 +994,14 @@ config HMM_MIRROR bool depends on MMU +config GET_FREE_REGION + depends on SPARSEMEM + bool + config DEVICE_PRIVATE bool "Unaddressable device memory (GPU memory, ...)" depends on ZONE_DEVICE + select GET_FREE_REGION help Allows creation of struct pages to represent unaddressable device From patchwork Fri Jul 15 00:02:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918598 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 287F47460 for ; Fri, 15 Jul 2022 00:03:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843420; x=1689379420; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=z04oSogVTDjrLHlB/Q0crOZsNJSV9cvtjQD48k1or+g=; b=i9Ga0vj8DW+ly8zirBpdVazRnfP0xXBQ5OAtjUSD0kB7YCNn9OF40uD9 EsMEU62Ch4jPfZIpcc/LkNkXih4SwtsykJupw+UmTbjUq0wzAZBd2mMBv z7iJFt51b/8Okt6ml8MrU4HS1LJ0lmE/o8oggFZ+FFUcQhxurRhHiwBI/ jrFlBgUaOE4fLQ3tF/xOki/f/PVVAoMxD4dQEh9wzimOlqlRmcVxk7SZl K+4a07GwKyVmBq60rPQf8l6+cKmpUS+VF4NDEgroObolAc0Tf8uF9yK54 FuvRQm6v5smLyA0U+Eo87v8wKXJN5AvpLkrKZ+ek30W8U9tGGIN9vGoZm Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="371977657" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="371977657" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:19 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="628897022" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:19 -0700 Subject: [PATCH v2 17/28] cxl/region: Add region creation support From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:19 -0700 Message-ID: <165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Ben Widawsky CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 25 +++ Documentation/driver-api/cxl/memory-devices.rst | 11 + drivers/cxl/Kconfig | 5 + drivers/cxl/core/Makefile | 1 drivers/cxl/core/core.h | 10 + drivers/cxl/core/port.c | 39 ++++ drivers/cxl/core/region.c | 201 +++++++++++++++++++++++ drivers/cxl/cxl.h | 18 ++ tools/testing/cxl/Kbuild | 1 9 files changed, 311 insertions(+) create mode 100644 drivers/cxl/core/region.c diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 0362ae98218e..b6156a499a5a 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -257,3 +257,28 @@ Description: to the next target in the interleave at address N + interleave_granularity (assuming N is aligned to interleave_granularity). + + +What: /sys/bus/cxl/devices/decoderX.Y/create_pmem_region +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) Write a string in the form 'regionZ' to start the process + of defining a new persistent memory region (interleave-set) + within the decode range bounded by root decoder 'decoderX.Y'. + The value written must match the current value returned from + reading this attribute. An atomic compare exchange operation is + done on write to assign the requested id to a region and + allocate the region-id for the next creation attempt. EBUSY is + returned if the region name written does not match the current + cached value. + + +What: /sys/bus/cxl/devices/decoderX.Y/delete_region +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (WO) Write a string in the form 'regionZ' to delete that region, + provided it is currently idle / not bound to a driver. diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst index db476bb170b6..66ddc58a21b1 100644 --- a/Documentation/driver-api/cxl/memory-devices.rst +++ b/Documentation/driver-api/cxl/memory-devices.rst @@ -362,6 +362,17 @@ CXL Core .. kernel-doc:: drivers/cxl/core/mbox.c :doc: cxl mbox +CXL Regions +----------- +.. kernel-doc:: drivers/cxl/region.h + :identifiers: + +.. kernel-doc:: drivers/cxl/core/region.c + :doc: cxl core region + +.. kernel-doc:: drivers/cxl/core/region.c + :identifiers: + External Interfaces =================== diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index f64e3984689f..aa2728de419e 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -102,4 +102,9 @@ config CXL_SUSPEND def_bool y depends on SUSPEND && CXL_MEM +config CXL_REGION + bool + default CXL_BUS + select MEMREGION + endif diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile index 9d35085d25af..79c7257f4107 100644 --- a/drivers/cxl/core/Makefile +++ b/drivers/cxl/core/Makefile @@ -10,3 +10,4 @@ cxl_core-y += memdev.o cxl_core-y += mbox.o cxl_core-y += pci.o cxl_core-y += hdm.o +cxl_core-$(CONFIG_CXL_REGION) += region.o diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 5551b82b2da0..29272df7e212 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -9,6 +9,16 @@ extern const struct device_type cxl_nvdimm_type; extern struct attribute_group cxl_base_attribute_group; +#ifdef CONFIG_CXL_REGION +extern struct device_attribute dev_attr_create_pmem_region; +extern struct device_attribute dev_attr_delete_region; +#define CXL_REGION_ATTR(x) (&dev_attr_##x.attr) +#define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr), +#else +#define CXL_REGION_ATTR(x) NULL +#define SET_CXL_REGION_ATTR(x) +#endif + struct cxl_send_command; struct cxl_mem_query_commands; int cxl_query_cmd(struct cxl_memdev *cxlmd, diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 4907db798ad9..89672b126b30 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only /* Copyright(c) 2020 Intel Corporation. All rights reserved. */ #include +#include #include #include #include @@ -300,11 +301,35 @@ static struct attribute *cxl_decoder_root_attrs[] = { &dev_attr_cap_type2.attr, &dev_attr_cap_type3.attr, &dev_attr_target_list.attr, + SET_CXL_REGION_ATTR(create_pmem_region) + SET_CXL_REGION_ATTR(delete_region) NULL, }; +static bool can_create_pmem(struct cxl_root_decoder *cxlrd) +{ + unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM; + + return (cxlrd->cxlsd.cxld.flags & flags) == flags; +} + +static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + + if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd)) + return 0; + + if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd)) + return 0; + + return a->mode; +} + static struct attribute_group cxl_decoder_root_attribute_group = { .attrs = cxl_decoder_root_attrs, + .is_visible = cxl_root_decoder_visible, }; static const struct attribute_group *cxl_decoder_root_attribute_groups[] = { @@ -387,6 +412,8 @@ static void cxl_root_decoder_release(struct device *dev) { struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + if (atomic_read(&cxlrd->region_id) >= 0) + memregion_free(atomic_read(&cxlrd->region_id)); __cxl_decoder_release(&cxlrd->cxlsd.cxld); kfree(cxlrd); } @@ -1484,6 +1511,18 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, cxld = &cxlsd->cxld; cxld->dev.type = &cxl_decoder_root_type; + /* + * cxl_root_decoder_release() special cases negative ids to + * detect memregion_alloc() failures. + */ + atomic_set(&cxlrd->region_id, -1); + rc = memregion_alloc(GFP_KERNEL); + if (rc < 0) { + put_device(&cxld->dev); + return ERR_PTR(rc); + } + + atomic_set(&cxlrd->region_id, rc); return cxlrd; } EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL); diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c new file mode 100644 index 000000000000..6d2a7aa53379 --- /dev/null +++ b/drivers/cxl/core/region.c @@ -0,0 +1,201 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */ +#include +#include +#include +#include +#include +#include +#include +#include "core.h" + +/** + * DOC: cxl core region + * + * CXL Regions represent mapped memory capacity in system physical address + * space. Whereas the CXL Root Decoders identify the bounds of potential CXL + * Memory ranges, Regions represent the active mapped capacity by the HDM + * Decoder Capability structures throughout the Host Bridges, Switches, and + * Endpoints in the topology. + */ + +static struct cxl_region *to_cxl_region(struct device *dev); + +static void cxl_region_release(struct device *dev) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + + memregion_free(cxlr->id); + kfree(cxlr); +} + +static const struct device_type cxl_region_type = { + .name = "cxl_region", + .release = cxl_region_release, +}; + +bool is_cxl_region(struct device *dev) +{ + return dev->type == &cxl_region_type; +} +EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL); + +static struct cxl_region *to_cxl_region(struct device *dev) +{ + if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type, + "not a cxl_region device\n")) + return NULL; + + return container_of(dev, struct cxl_region, dev); +} + +static void unregister_region(void *dev) +{ + device_unregister(dev); +} + +static struct lock_class_key cxl_region_key; + +static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int id) +{ + struct cxl_region *cxlr; + struct device *dev; + + cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL); + if (!cxlr) { + memregion_free(id); + return ERR_PTR(-ENOMEM); + } + + dev = &cxlr->dev; + device_initialize(dev); + lockdep_set_class(&dev->mutex, &cxl_region_key); + dev->parent = &cxlrd->cxlsd.cxld.dev; + device_set_pm_not_required(dev); + dev->bus = &cxl_bus_type; + dev->type = &cxl_region_type; + cxlr->id = id; + + return cxlr; +} + +/** + * devm_cxl_add_region - Adds a region to a decoder + * @cxlrd: root decoder + * @id: memregion id to create, or memregion_free() on failure + * @mode: mode for the endpoint decoders of this region + * @type: select whether this is an expander or accelerator (type-2 or type-3) + * + * This is the second step of region initialization. Regions exist within an + * address space which is mapped by a @cxlrd. + * + * Return: 0 if the region was added to the @cxlrd, else returns negative error + * code. The region will be named "regionZ" where Z is the unique region number. + */ +static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, + int id, + enum cxl_decoder_mode mode, + enum cxl_decoder_type type) +{ + struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); + struct cxl_region *cxlr; + struct device *dev; + int rc; + + cxlr = cxl_region_alloc(cxlrd, id); + if (IS_ERR(cxlr)) + return cxlr; + cxlr->mode = mode; + cxlr->type = type; + + dev = &cxlr->dev; + rc = dev_set_name(dev, "region%d", id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + rc = devm_add_action_or_reset(port->uport, unregister_region, cxlr); + if (rc) + return ERR_PTR(rc); + + dev_dbg(port->uport, "%s: created %s\n", + dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev)); + return cxlr; + +err: + put_device(dev); + return ERR_PTR(rc); +} + +static ssize_t create_pmem_region_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + + return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id)); +} + +static ssize_t create_pmem_region_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + struct cxl_region *cxlr; + unsigned int id, rc; + + rc = sscanf(buf, "region%u\n", &id); + if (rc != 1) + return -EINVAL; + + rc = memregion_alloc(GFP_KERNEL); + if (rc < 0) + return rc; + + if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) { + memregion_free(rc); + return -EBUSY; + } + + cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM, + CXL_DECODER_EXPANDER); + if (IS_ERR(cxlr)) + return PTR_ERR(cxlr); + + return len; +} +DEVICE_ATTR_RW(create_pmem_region); + +static struct cxl_region * +cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name) +{ + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct device *region_dev; + + region_dev = device_find_child_by_name(&cxld->dev, name); + if (!region_dev) + return ERR_PTR(-ENODEV); + + return to_cxl_region(region_dev); +} + +static ssize_t delete_region_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); + struct cxl_port *port = to_cxl_port(dev->parent); + struct cxl_region *cxlr; + + cxlr = cxl_find_region_by_name(cxlrd, buf); + if (IS_ERR(cxlr)) + return PTR_ERR(cxlr); + + devm_release_action(port->uport, unregister_region, cxlr); + put_device(&cxlr->dev); + + return len; +} +DEVICE_ATTR_WO(delete_region); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index a108d5c288ca..c3696e76306a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -286,13 +286,29 @@ struct cxl_switch_decoder { /** * struct cxl_root_decoder - Static platform CXL address decoder * @res: host / parent resource for region allocations + * @region_id: region id for next region provisioning event * @cxlsd: base cxl switch decoder */ struct cxl_root_decoder { struct resource *res; + atomic_t region_id; struct cxl_switch_decoder cxlsd; }; +/** + * struct cxl_region - CXL region + * @dev: This region's device + * @id: This region's id. Id is globally unique across all regions + * @mode: Endpoint decoder allocation / access mode + * @type: Endpoint decoder target type + */ +struct cxl_region { + struct device dev; + int id; + enum cxl_decoder_mode mode; + enum cxl_decoder_type type; +}; + /** * enum cxl_nvdimm_brige_state - state machine for managing bus rescans * @CXL_NVB_NEW: Set at bridge create and after cxl_pmem_wq is destroyed @@ -440,6 +456,8 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port); int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm); int devm_cxl_add_passthrough_decoder(struct cxl_port *port); +bool is_cxl_region(struct device *dev); + extern struct bus_type cxl_bus_type; struct cxl_driver { diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild index 33543231d453..500be85729cc 100644 --- a/tools/testing/cxl/Kbuild +++ b/tools/testing/cxl/Kbuild @@ -47,6 +47,7 @@ cxl_core-y += $(CXL_CORE_SRC)/memdev.o cxl_core-y += $(CXL_CORE_SRC)/mbox.o cxl_core-y += $(CXL_CORE_SRC)/pci.o cxl_core-y += $(CXL_CORE_SRC)/hdm.o +cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o cxl_core-y += config_check.o obj-m += test/ From patchwork Fri Jul 15 00:02:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918597 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B75867460 for ; Fri, 15 Jul 2022 00:03:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843397; x=1689379397; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6FbC1RR76T/PmytpA5zlyKcB5NyXYoCsVXtwPOxu1jw=; b=JLwZyd2mKZGCP0RbahdPijYtw6yuDibj1JwdERUDC9BYM1C+pUFUWFSv qcLZeVZ8gF1YUOZLjuQz5bgnVFW/iZCz7bTFS3l9n049g0VfDjzxn2ACL /ZY9xqo+okV0IODNFBSb8Fxmr2MNpPKXJMmvUogshjXkfg2pUIlRZdvSe zqf107Y6/Do+SGFkY0sc2/9h8+a0sF2ju3SOY5Af87pOMInF7mv49EdgP 2Zzwy2xBiUw3Xb5aOo7DS7/j2eJIGIetGAssNGGk2pMmvLt4FKmKaFG1n Rsf9T0b6v3Me3UN9izvqPiGg7MgyMIA+0HP2wVSTia9vN1NgG04hFgFrV A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="349627166" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="349627166" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:25 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="685766741" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:25 -0700 Subject: [PATCH v2 18/28] cxl/region: Add a 'uuid' attribute From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:24 -0700 Message-ID: <165784334465.1758207.8224025435884752570.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Ben Widawsky The process of provisioning a region involves triggering the creation of a new region object, pouring in the configuration, and then binding that configured object to the region driver to start its operation. For persistent memory regions the CXL specification mandates that it identified by a uuid. Add an ABI for userspace to specify a region's uuid. Signed-off-by: Ben Widawsky [djbw: simplify locking] Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 10 +++ drivers/cxl/core/region.c | 118 +++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 25 +++++++ 3 files changed, 153 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index b6156a499a5a..0760b8402c23 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -282,3 +282,13 @@ Contact: linux-cxl@vger.kernel.org Description: (WO) Write a string in the form 'regionZ' to delete that region, provided it is currently idle / not bound to a driver. + + +What: /sys/bus/cxl/devices/regionZ/uuid +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) Write a unique identifier for the region. This field must + be set for persistent regions and it must not conflict with the + UUID of another region. diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 6d2a7aa53379..22dccb4702e5 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include "core.h" @@ -17,10 +18,126 @@ * Memory ranges, Regions represent the active mapped capacity by the HDM * Decoder Capability structures throughout the Host Bridges, Switches, and * Endpoints in the topology. + * + * Region configuration has ordering constraints. UUID may be set at any time + * but is only visible for persistent regions. + */ + +/* + * All changes to the interleave configuration occur with this lock held + * for write. */ +static DECLARE_RWSEM(cxl_region_rwsem); static struct cxl_region *to_cxl_region(struct device *dev); +static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + ssize_t rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + rc = sysfs_emit(buf, "%pUb\n", &p->uuid); + up_read(&cxl_region_rwsem); + + return rc; +} + +static int is_dup(struct device *match, void *data) +{ + struct cxl_region_params *p; + struct cxl_region *cxlr; + uuid_t *uuid = data; + + if (!is_cxl_region(match)) + return 0; + + lockdep_assert_held(&cxl_region_rwsem); + cxlr = to_cxl_region(match); + p = &cxlr->params; + + if (uuid_equal(&p->uuid, uuid)) { + dev_dbg(match, "already has uuid: %pUb\n", uuid); + return -EBUSY; + } + + return 0; +} + +static ssize_t uuid_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + uuid_t temp; + ssize_t rc; + + if (len != UUID_STRING_LEN + 1) + return -EINVAL; + + rc = uuid_parse(buf, &temp); + if (rc) + return rc; + + if (uuid_is_null(&temp)) + return -EINVAL; + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + return rc; + + if (uuid_equal(&p->uuid, &temp)) + goto out; + + rc = -EBUSY; + if (p->state >= CXL_CONFIG_ACTIVE) + goto out; + + rc = bus_for_each_dev(&cxl_bus_type, NULL, &temp, is_dup); + if (rc < 0) + goto out; + + uuid_copy(&p->uuid, &temp); +out: + up_write(&cxl_region_rwsem); + + if (rc) + return rc; + return len; +} +static DEVICE_ATTR_RW(uuid); + +static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, + int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_region *cxlr = to_cxl_region(dev); + + if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM) + return 0; + return a->mode; +} + +static struct attribute *cxl_region_attrs[] = { + &dev_attr_uuid.attr, + NULL, +}; + +static const struct attribute_group cxl_region_group = { + .attrs = cxl_region_attrs, + .is_visible = cxl_region_visible, +}; + +static const struct attribute_group *region_groups[] = { + &cxl_base_attribute_group, + &cxl_region_group, + NULL, +}; + static void cxl_region_release(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); @@ -32,6 +149,7 @@ static void cxl_region_release(struct device *dev) static const struct device_type cxl_region_type = { .name = "cxl_region", .release = cxl_region_release, + .groups = region_groups }; bool is_cxl_region(struct device *dev) diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index c3696e76306a..cf2ece363015 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -295,18 +295,43 @@ struct cxl_root_decoder { struct cxl_switch_decoder cxlsd; }; +/* + * enum cxl_config_state - State machine for region configuration + * @CXL_CONFIG_IDLE: Any sysfs attribute can be written freely + * @CXL_CONFIG_ACTIVE: All targets have been added the region is now + * active + */ +enum cxl_config_state { + CXL_CONFIG_IDLE, + CXL_CONFIG_ACTIVE, +}; + +/** + * struct cxl_region_params - region settings + * @state: allow the driver to lockdown further parameter changes + * @uuid: unique id for persistent regions + * + * State transitions are protected by the cxl_region_rwsem + */ +struct cxl_region_params { + enum cxl_config_state state; + uuid_t uuid; +}; + /** * struct cxl_region - CXL region * @dev: This region's device * @id: This region's id. Id is globally unique across all regions * @mode: Endpoint decoder allocation / access mode * @type: Endpoint decoder target type + * @params: active + config params for the region */ struct cxl_region { struct device dev; int id; enum cxl_decoder_mode mode; enum cxl_decoder_type type; + struct cxl_region_params params; }; /** From patchwork Fri Jul 15 00:02:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918601 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92C0D7460 for ; Fri, 15 Jul 2022 00:04:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843473; x=1689379473; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=U+7MkLAHurf3f2Fjf7LARlOGChHtsfQEBW7wJ4w68qo=; b=etOJY1rAPKk8w5SqrzHLiw6aougWj31APBsAGBNv204zA9vEnYlnfPQc xT7MfNpLLfO+0YS5Swuok24UEH/KK0vsrwMXzHqcZbwpgd9zy1DvA6Yoz gEX9RuipXgs9An9f5naksF2Bvop7F/xFy7SIQ8yj1yfLUHe/oIhaxYeNS 0r7vKnVwjqoupfAki0D8Bx5VRJpSiOW9l5AmQwuuyBF/0RieLbtMtg5K+ ogYWsu7M5M3T3dil/RHuby8H3Wu12EGvQ+ZHslm9XG/c3RaYoVa7V13vX MdS/IoAHYejijwF/+iq2Cb8xadVs6FUgl6wZrn+8HAL87j676sYjOimLk A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="286401669" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="286401669" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:31 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="663985220" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:30 -0700 Subject: [PATCH v2 19/28] cxl/region: Add interleave geometry attributes From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , Jonathan Cameron , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:30 -0700 Message-ID: <165784335054.1758207.5368288630288141162.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Ben Widawsky Add ABI to allow the number of devices that comprise a region to be set as well as the interleave granularity for the region. Signed-off-by: Ben Widawsky [djbw: reword changelog] Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220624041950.559155-11-dan.j.williams@intel.com Signed-off-by: Dan Williams --- Documentation/ABI/testing/sysfs-bus-cxl | 21 +++++ drivers/cxl/core/region.c | 128 +++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 33 ++++++++ 3 files changed, 182 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 0760b8402c23..bfa42bcc8383 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -292,3 +292,24 @@ Description: (RW) Write a unique identifier for the region. This field must be set for persistent regions and it must not conflict with the UUID of another region. + + +What: /sys/bus/cxl/devices/regionZ/interleave_granularity +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) Set the number of consecutive bytes each device in the + interleave set will claim. The possible interleave granularity + values are determined by the CXL spec and the participating + devices. + + +What: /sys/bus/cxl/devices/regionZ/interleave_ways +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) Configures the number of devices participating in the + region is set by writing this value. Each device will provide + 1/interleave_ways of storage for the region. diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 22dccb4702e5..3289caa5d882 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include "core.h" @@ -21,6 +22,8 @@ * * Region configuration has ordering constraints. UUID may be set at any time * but is only visible for persistent regions. + * 1. Interleave granularity + * 2. Interleave size */ /* @@ -122,8 +125,129 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, return a->mode; } +static ssize_t interleave_ways_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + ssize_t rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + rc = sysfs_emit(buf, "%d\n", p->interleave_ways); + up_read(&cxl_region_rwsem); + + return rc; +} + +static ssize_t interleave_ways_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + int rc, val; + u8 iw; + + rc = kstrtoint(buf, 0, &val); + if (rc) + return rc; + + rc = ways_to_cxl(val, &iw); + if (rc) + return rc; + + /* + * Even for x3, x9, and x12 interleaves the region interleave must be a + * power of 2 multiple of the host bridge interleave. + */ + if (!is_power_of_2(val / cxld->interleave_ways) || + (val % cxld->interleave_ways)) { + dev_dbg(&cxlr->dev, "invalid interleave: %d\n", val); + return -EINVAL; + } + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + return rc; + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) { + rc = -EBUSY; + goto out; + } + + p->interleave_ways = val; +out: + up_write(&cxl_region_rwsem); + if (rc) + return rc; + return len; +} +static DEVICE_ATTR_RW(interleave_ways); + +static ssize_t interleave_granularity_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + ssize_t rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + rc = sysfs_emit(buf, "%d\n", p->interleave_granularity); + up_read(&cxl_region_rwsem); + + return rc; +} + +static ssize_t interleave_granularity_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + int rc, val; + u16 ig; + + rc = kstrtoint(buf, 0, &val); + if (rc) + return rc; + + rc = granularity_to_cxl(val, &ig); + if (rc) + return rc; + + /* region granularity must be >= root granularity */ + if (val < cxld->interleave_granularity) + return -EINVAL; + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + return rc; + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) { + rc = -EBUSY; + goto out; + } + + p->interleave_granularity = val; +out: + up_write(&cxl_region_rwsem); + if (rc) + return rc; + return len; +} +static DEVICE_ATTR_RW(interleave_granularity); + static struct attribute *cxl_region_attrs[] = { &dev_attr_uuid.attr, + &dev_attr_interleave_ways.attr, + &dev_attr_interleave_granularity.attr, NULL, }; @@ -216,6 +340,8 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, enum cxl_decoder_type type) { struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct cxl_region_params *p; struct cxl_region *cxlr; struct device *dev; int rc; @@ -223,8 +349,10 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, cxlr = cxl_region_alloc(cxlrd, id); if (IS_ERR(cxlr)) return cxlr; + p = &cxlr->params; cxlr->mode = mode; cxlr->type = type; + p->interleave_granularity = cxld->interleave_granularity; dev = &cxlr->dev; rc = dev_set_name(dev, "region%d", id); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index cf2ece363015..a4e65c102bed 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -7,6 +7,7 @@ #include #include #include +#include #include /** @@ -92,6 +93,31 @@ static inline int cxl_to_ways(u8 eniw, unsigned int *val) return 0; } +static inline int granularity_to_cxl(int g, u16 *ig) +{ + if (g > SZ_16K || g < 256 || !is_power_of_2(g)) + return -EINVAL; + *ig = ilog2(g) - 8; + return 0; +} + +static inline int ways_to_cxl(int ways, u8 *iw) +{ + if (ways > 16) + return -EINVAL; + if (is_power_of_2(ways)) { + *iw = ilog2(ways); + return 0; + } + if (ways % 3) + return -EINVAL; + ways /= 3; + if (!is_power_of_2(ways)) + return -EINVAL; + *iw = ilog2(ways) + 8; + return 0; +} + /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */ #define CXLDEV_CAP_ARRAY_OFFSET 0x0 #define CXLDEV_CAP_ARRAY_CAP_ID 0 @@ -298,11 +324,14 @@ struct cxl_root_decoder { /* * enum cxl_config_state - State machine for region configuration * @CXL_CONFIG_IDLE: Any sysfs attribute can be written freely + * @CXL_CONFIG_INTERLEAVE_ACTIVE: region size has been set, no more + * changes to interleave_ways or interleave_granularity * @CXL_CONFIG_ACTIVE: All targets have been added the region is now * active */ enum cxl_config_state { CXL_CONFIG_IDLE, + CXL_CONFIG_INTERLEAVE_ACTIVE, CXL_CONFIG_ACTIVE, }; @@ -310,12 +339,16 @@ enum cxl_config_state { * struct cxl_region_params - region settings * @state: allow the driver to lockdown further parameter changes * @uuid: unique id for persistent regions + * @interleave_ways: number of endpoints in the region + * @interleave_granularity: capacity each endpoint contributes to a stripe * * State transitions are protected by the cxl_region_rwsem */ struct cxl_region_params { enum cxl_config_state state; uuid_t uuid; + int interleave_ways; + int interleave_granularity; }; /** From patchwork Fri Jul 15 00:02:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918599 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE15F7460 for ; Fri, 15 Jul 2022 00:04:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843444; x=1689379444; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RcqA2cHy2KOsfTBn7OfvgHq5dSs6tJXyHh0qNEkK6Fg=; b=i3xRtXFehUvfw3+oRUihAkNusRIzFkKRnnIE4riPbW1BwB7bikCH0gAp UF9EE8YmtOXAxQM7S0iKFcAQQtfYUf7mCHWr8yy+pCfi0Yd+eSGTJCu7l 7GSE01LT9DpW6YsfqSuZQC8nveEpPCD2TdEMQx5J1Uxge5DypSbUUPkzE h6GbVmqCLKz5xfyU2Wo5ov/7RFKpPLkHf4smf3c+Gj+iYHBIoVwreg2YL 2r2ONeeXOBK/pUDi+vx2GtgcWehQTBkoxQ0y5W/4RFXK4QtukGrmvjA5M IaMi24Anpq0Xskqo9+K8xXn6ld+U23kkzYj0TOy2v8q/CFg3BXuwNPKq/ Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="265451077" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="265451077" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:37 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="772801623" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:36 -0700 Subject: [PATCH v2 20/28] cxl/region: Allocate HPA capacity to regions From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:36 -0700 Message-ID: <165784335630.1758207.420216490941955417.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 After a region's interleave parameters (ways and granularity) are set, add a way for regions to allocate HPA (host physical address space) from the free capacity in their parent root-decoder. The allocator for this capacity reuses the 'struct resource' based allocator used for CONFIG_DEVICE_PRIVATE. Once the tuple of "ways, granularity, and size" is set the region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE state which is a precursor to allowing endpoint decoders to be added to a region. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 29 ++++++ drivers/cxl/Kconfig | 3 + drivers/cxl/core/region.c | 150 +++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 2 4 files changed, 183 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index bfa42bcc8383..0c6c3da4da5a 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -313,3 +313,32 @@ Description: (RW) Configures the number of devices participating in the region is set by writing this value. Each device will provide 1/interleave_ways of storage for the region. + + +What: /sys/bus/cxl/devices/regionZ/size +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) System physical address space to be consumed by the region. + When written trigger the driver to allocate space out of the + parent root decoder's address space. When read the size of the + address space is reported and should match the span of the + region's resource attribute. Size shall be set after the + interleave configuration parameters. Once set it cannot be + changed, only freed by writing 0. The kernel makes no guarantees + that data is maintained over an address space freeing event, and + there is no guarantee that a free followed by an allocate + results in the same address being allocated. + + +What: /sys/bus/cxl/devices/regionZ/resource +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) A region is a contiguous partition of a CXL root decoder + address space. Region capacity is allocated by writing to the + size attribute, the resulting physical address space determined + by the driver is reflected here. It is therefore not useful to + read this before writing a value to the size attribute. diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index aa2728de419e..74c2cd069d9d 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -105,6 +105,9 @@ config CXL_SUSPEND config CXL_REGION bool default CXL_BUS + # For MAX_PHYSMEM_BITS + depends on SPARSEMEM select MEMREGION + select GET_FREE_REGION endif diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 3289caa5d882..b1e847827c6b 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -244,10 +244,152 @@ static ssize_t interleave_granularity_store(struct device *dev, } static DEVICE_ATTR_RW(interleave_granularity); +static ssize_t resource_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + u64 resource = -1ULL; + ssize_t rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + if (p->res) + resource = p->res->start; + rc = sysfs_emit(buf, "%#llx\n", resource); + up_read(&cxl_region_rwsem); + + return rc; +} +static DEVICE_ATTR_RO(resource); + +static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_region_params *p = &cxlr->params; + struct resource *res; + u32 remainder = 0; + + lockdep_assert_held_write(&cxl_region_rwsem); + + /* Nothing to do... */ + if (p->res && resource_size(res) == size) + return 0; + + /* To change size the old size must be freed first */ + if (p->res) + return -EBUSY; + + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) + return -EBUSY; + + /* ways, granularity and uuid (if PMEM) need to be set before HPA */ + if (!p->interleave_ways || !p->interleave_granularity || + (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid))) + return -ENXIO; + + div_u64_rem(size, SZ_256M * p->interleave_ways, &remainder); + if (remainder) + return -EINVAL; + + res = alloc_free_mem_region(cxlrd->res, size, SZ_256M, + dev_name(&cxlr->dev)); + if (IS_ERR(res)) { + dev_dbg(&cxlr->dev, "failed to allocate HPA: %ld\n", + PTR_ERR(res)); + return PTR_ERR(res); + } + + p->res = res; + p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; + + return 0; +} + +static void cxl_region_iomem_release(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + + if (device_is_registered(&cxlr->dev)) + lockdep_assert_held_write(&cxl_region_rwsem); + if (p->res) { + remove_resource(p->res); + kfree(p->res); + p->res = NULL; + } +} + +static int free_hpa(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + + lockdep_assert_held_write(&cxl_region_rwsem); + + if (!p->res) + return 0; + + if (p->state >= CXL_CONFIG_ACTIVE) + return -EBUSY; + + cxl_region_iomem_release(cxlr); + p->state = CXL_CONFIG_IDLE; + return 0; +} + +static ssize_t size_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + u64 val; + int rc; + + rc = kstrtou64(buf, 0, &val); + if (rc) + return rc; + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + return rc; + + if (val) + rc = alloc_hpa(cxlr, val); + else + rc = free_hpa(cxlr); + up_write(&cxl_region_rwsem); + + if (rc) + return rc; + + return len; +} + +static ssize_t size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + u64 size = 0; + ssize_t rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + if (p->res) + size = resource_size(p->res); + rc = sysfs_emit(buf, "%#llx\n", size); + up_read(&cxl_region_rwsem); + + return rc; +} +static DEVICE_ATTR_RW(size); + static struct attribute *cxl_region_attrs[] = { &dev_attr_uuid.attr, &dev_attr_interleave_ways.attr, &dev_attr_interleave_granularity.attr, + &dev_attr_resource.attr, + &dev_attr_size.attr, NULL, }; @@ -293,7 +435,11 @@ static struct cxl_region *to_cxl_region(struct device *dev) static void unregister_region(void *dev) { - device_unregister(dev); + struct cxl_region *cxlr = to_cxl_region(dev); + + device_del(dev); + cxl_region_iomem_release(cxlr); + put_device(dev); } static struct lock_class_key cxl_region_key; @@ -445,3 +591,5 @@ static ssize_t delete_region_store(struct device *dev, return len; } DEVICE_ATTR_WO(delete_region); + +MODULE_IMPORT_NS(CXL); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index a4e65c102bed..837bfa67f469 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -341,6 +341,7 @@ enum cxl_config_state { * @uuid: unique id for persistent regions * @interleave_ways: number of endpoints in the region * @interleave_granularity: capacity each endpoint contributes to a stripe + * @res: allocated iomem capacity for this region * * State transitions are protected by the cxl_region_rwsem */ @@ -349,6 +350,7 @@ struct cxl_region_params { uuid_t uuid; int interleave_ways; int interleave_granularity; + struct resource *res; }; /** From patchwork Fri Jul 15 00:02:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918600 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 481E07460 for ; Fri, 15 Jul 2022 00:04:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843451; x=1689379451; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=86ZN3Qodg4lER5Hi4rU+jkBv0qRvwnJVICGuPoHFcns=; b=GS5B0ezn3KDkr0jV8UUIl7RcYdEXa62BIQyjMk+1Fb0yOymHwoO5s52/ h2CUrdk5koBJlaRhIzi8OjqdsV1mdhUk+ynYiu2RLooyoc4A4kirCK8ik JZvXQkgPVYMzIFqaZ9pIwP7ent3DY9vXQSfnAg/imranmCPYNSu+rq+u/ 5KfuHW9EGO1uvnOJ2+dXWdSdz4iGt+M80KTRIaZ3QXgN/7by7xCxB0t5+ fsXHmOMeaS2/tvzAp278/HG00GQCyfvg3UNQDlQZFXqC7E0YM+/s+bElQ Pj3uAgt9m/1iEUELzkD8CeOtDTTpp4JJ7ZvWi6yV6pxQ9wDWvVMJLLXyv w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="265451122" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="265451122" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:42 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="623634781" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:42 -0700 Subject: [PATCH v2 21/28] cxl/region: Enable the assignment of endpoint decoders to regions From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:41 -0700 Message-ID: <165784336184.1758207.16403282029203949622.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The region provisioning process involves allocating DPA to a set of endpoint decoders, and HPA plus the region geometry to a region device. Then the decoder is assigned to the region. At this point several validation steps can be performed to validate that the decoder is suitable to participate in the region. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 19 ++ drivers/cxl/core/core.h | 6 + drivers/cxl/core/hdm.c | 13 + drivers/cxl/core/port.c | 9 + drivers/cxl/core/region.c | 282 +++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 11 + 6 files changed, 338 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 0c6c3da4da5a..94e19e24de8d 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -342,3 +342,22 @@ Description: size attribute, the resulting physical address space determined by the driver is reflected here. It is therefore not useful to read this before writing a value to the size attribute. + + +What: /sys/bus/cxl/devices/regionZ/target[0..N] +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) Write an endpoint decoder object name to 'targetX' where X + is the intended position of the endpoint device in the region + interleave and N is the 'interleave_ways' setting for the + region. ENXIO is returned if the write results in an impossible + to map decode scenario, like the endpoint is unreachable at that + position relative to the root decoder interleave. EBUSY is + returned if the position in the region is already occupied, or + if the region is not in a state to accept interleave + configuration changes. EINVAL is returned if the object name is + not an endpoint decoder. Once all positions have been + successfully written a final validation for decode conflicts is + performed before activating the region. diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 29272df7e212..a60ad9f656fd 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -12,9 +12,14 @@ extern struct attribute_group cxl_base_attribute_group; #ifdef CONFIG_CXL_REGION extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_delete_region; +extern struct device_attribute dev_attr_region; +void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled); #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr) #define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr), #else +static inline void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) +{ +} #define CXL_REGION_ATTR(x) NULL #define SET_CXL_REGION_ATTR(x) #endif @@ -34,6 +39,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); +extern struct rw_semaphore cxl_dpa_rwsem; int cxl_memdev_init(void); void cxl_memdev_exit(void); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 4a0325b02ca4..81645de1064f 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -17,7 +17,7 @@ * for enumerating these registers and capabilities. */ -static DECLARE_RWSEM(cxl_dpa_rwsem); +DECLARE_RWSEM(cxl_dpa_rwsem); static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, int *target_map) @@ -319,6 +319,11 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) rc = 0; goto out; } + if (cxled->cxld.region) { + dev_dbg(dev, "decoder assigned to: %s\n", + dev_name(&cxled->cxld.region->dev)); + goto out; + } if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { dev_dbg(dev, "decoder enabled\n"); rc = -EBUSY; @@ -395,6 +400,12 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) int rc; down_write(&cxl_dpa_rwsem); + if (cxled->cxld.region) { + dev_dbg(dev, "decoder attached to %s\n", + dev_name(&cxled->cxld.region->dev)); + goto out; + } + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { dev_dbg(dev, "decoder enabled\n"); rc = -EBUSY; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 89672b126b30..bd0673821d28 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -288,6 +288,7 @@ static struct attribute *cxl_decoder_base_attrs[] = { &dev_attr_locked.attr, &dev_attr_interleave_granularity.attr, &dev_attr_interleave_ways.attr, + SET_CXL_REGION_ATTR(region) NULL, }; @@ -1583,6 +1584,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port) if (!cxled) return ERR_PTR(-ENOMEM); + cxled->pos = -1; cxld = &cxled->cxld; rc = cxl_decoder_init(port, cxld); if (rc) { @@ -1687,6 +1689,13 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL); static void cxld_unregister(void *dev) { + struct cxl_endpoint_decoder *cxled; + + if (is_endpoint_decoder(dev)) { + cxled = to_cxl_endpoint_decoder(dev); + cxl_decoder_kill_region(cxled); + } + device_unregister(dev); } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index b1e847827c6b..871bfdbb9bc8 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -24,6 +24,7 @@ * but is only visible for persistent regions. * 1. Interleave granularity * 2. Interleave size + * 3. Decoder targets */ /* @@ -141,6 +142,8 @@ static ssize_t interleave_ways_show(struct device *dev, return rc; } +static const struct attribute_group *get_cxl_region_target_group(void); + static ssize_t interleave_ways_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) @@ -149,7 +152,7 @@ static ssize_t interleave_ways_store(struct device *dev, struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; - int rc, val; + int rc, val, save; u8 iw; rc = kstrtoint(buf, 0, &val); @@ -178,7 +181,11 @@ static ssize_t interleave_ways_store(struct device *dev, goto out; } + save = p->interleave_ways; p->interleave_ways = val; + rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group()); + if (rc) + p->interleave_ways = save; out: up_write(&cxl_region_rwsem); if (rc) @@ -398,9 +405,262 @@ static const struct attribute_group cxl_region_group = { .is_visible = cxl_region_visible, }; +static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) +{ + struct cxl_region_params *p = &cxlr->params; + struct cxl_endpoint_decoder *cxled; + int rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + + if (pos >= p->interleave_ways) { + dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, + p->interleave_ways); + rc = -ENXIO; + goto out; + } + + cxled = p->targets[pos]; + if (!cxled) + rc = sysfs_emit(buf, "\n"); + else + rc = sysfs_emit(buf, "%s\n", dev_name(&cxled->cxld.dev)); +out: + up_read(&cxl_region_rwsem); + + return rc; +} + +/* + * - Check that the given endpoint is attached to a host-bridge identified + * in the root interleave. + */ +static int cxl_region_attach(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos) +{ + struct cxl_region_params *p = &cxlr->params; + + if (cxled->mode == CXL_DECODER_DEAD) { + dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); + return -ENODEV; + } + + if (pos >= p->interleave_ways) { + dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, + p->interleave_ways); + return -ENXIO; + } + + if (p->targets[pos] == cxled) + return 0; + + if (p->targets[pos]) { + struct cxl_endpoint_decoder *cxled_target = p->targets[pos]; + struct cxl_memdev *cxlmd_target = cxled_to_memdev(cxled_target); + + dev_dbg(&cxlr->dev, "position %d already assigned to %s:%s\n", + pos, dev_name(&cxlmd_target->dev), + dev_name(&cxled_target->cxld.dev)); + return -EBUSY; + } + + p->targets[pos] = cxled; + cxled->pos = pos; + p->nr_targets++; + + return 0; +} + +static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) +{ + struct cxl_region *cxlr = cxled->cxld.region; + struct cxl_region_params *p; + + lockdep_assert_held_write(&cxl_region_rwsem); + + if (!cxlr) + return; + + p = &cxlr->params; + get_device(&cxlr->dev); + + if (cxled->pos < 0 || cxled->pos >= p->interleave_ways || + p->targets[cxled->pos] != cxled) { + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + + dev_WARN_ONCE(&cxlr->dev, 1, "expected %s:%s at position %d\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + cxled->pos); + goto out; + } + + p->targets[cxled->pos] = NULL; + p->nr_targets--; + + /* notify the region driver that one of its targets has deparated */ + up_write(&cxl_region_rwsem); + device_release_driver(&cxlr->dev); + down_write(&cxl_region_rwsem); +out: + put_device(&cxlr->dev); +} + +void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) +{ + down_write(&cxl_region_rwsem); + cxled->mode = CXL_DECODER_DEAD; + cxl_region_detach(cxled); + up_write(&cxl_region_rwsem); +} + +static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos) +{ + struct device *dev; + int rc; + + dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder); + if (!dev) + return -ENODEV; + + if (!is_endpoint_decoder(dev)) { + put_device(dev); + return -EINVAL; + } + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + goto out; + down_read(&cxl_dpa_rwsem); + rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos); + up_read(&cxl_dpa_rwsem); + up_write(&cxl_region_rwsem); +out: + put_device(dev); + return rc; +} + +static int detach_target(struct cxl_region *cxlr, int pos) +{ + struct cxl_region_params *p = &cxlr->params; + int rc; + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + return rc; + + if (pos >= p->interleave_ways) { + dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, + p->interleave_ways); + rc = -ENXIO; + goto out; + } + + if (!p->targets[pos]) { + rc = 0; + goto out; + } + + cxl_region_detach(p->targets[pos]); + rc = 0; +out: + up_write(&cxl_region_rwsem); + return rc; +} + +static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos, + size_t len) +{ + int rc; + + if (sysfs_streq(buf, "\n")) + rc = detach_target(cxlr, pos); + else + rc = attach_target(cxlr, buf, pos); + + if (rc < 0) + return rc; + return len; +} + +#define TARGET_ATTR_RW(n) \ +static ssize_t target##n##_show( \ + struct device *dev, struct device_attribute *attr, char *buf) \ +{ \ + return show_targetN(to_cxl_region(dev), buf, (n)); \ +} \ +static ssize_t target##n##_store(struct device *dev, \ + struct device_attribute *attr, \ + const char *buf, size_t len) \ +{ \ + return store_targetN(to_cxl_region(dev), buf, (n), len); \ +} \ +static DEVICE_ATTR_RW(target##n) + +TARGET_ATTR_RW(0); +TARGET_ATTR_RW(1); +TARGET_ATTR_RW(2); +TARGET_ATTR_RW(3); +TARGET_ATTR_RW(4); +TARGET_ATTR_RW(5); +TARGET_ATTR_RW(6); +TARGET_ATTR_RW(7); +TARGET_ATTR_RW(8); +TARGET_ATTR_RW(9); +TARGET_ATTR_RW(10); +TARGET_ATTR_RW(11); +TARGET_ATTR_RW(12); +TARGET_ATTR_RW(13); +TARGET_ATTR_RW(14); +TARGET_ATTR_RW(15); + +static struct attribute *target_attrs[] = { + &dev_attr_target0.attr, + &dev_attr_target1.attr, + &dev_attr_target2.attr, + &dev_attr_target3.attr, + &dev_attr_target4.attr, + &dev_attr_target5.attr, + &dev_attr_target6.attr, + &dev_attr_target7.attr, + &dev_attr_target8.attr, + &dev_attr_target9.attr, + &dev_attr_target10.attr, + &dev_attr_target11.attr, + &dev_attr_target12.attr, + &dev_attr_target13.attr, + &dev_attr_target14.attr, + &dev_attr_target15.attr, + NULL, +}; + +static umode_t cxl_region_target_visible(struct kobject *kobj, + struct attribute *a, int n) +{ + struct device *dev = kobj_to_dev(kobj); + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + + if (n < p->interleave_ways) + return a->mode; + return 0; +} + +static const struct attribute_group cxl_region_target_group = { + .attrs = target_attrs, + .is_visible = cxl_region_target_visible, +}; + +static const struct attribute_group *get_cxl_region_target_group(void) +{ + return &cxl_region_target_group; +} + static const struct attribute_group *region_groups[] = { &cxl_base_attribute_group, &cxl_region_group, + &cxl_region_target_group, NULL, }; @@ -560,6 +820,26 @@ static ssize_t create_pmem_region_store(struct device *dev, } DEVICE_ATTR_RW(create_pmem_region); +static ssize_t region_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_decoder *cxld = to_cxl_decoder(dev); + ssize_t rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + + if (cxld->region) + rc = sysfs_emit(buf, "%s\n", dev_name(&cxld->region->dev)); + else + rc = sysfs_emit(buf, "\n"); + up_read(&cxl_region_rwsem); + + return rc; +} +DEVICE_ATTR_RO(region); + static struct cxl_region * cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name) { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 837bfa67f469..95d74cf425a4 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -255,6 +255,7 @@ enum cxl_decoder_type { * @interleave_ways: number of cxl_dports in this decode * @interleave_granularity: data stride per dport * @target_type: accelerator vs expander (type2 vs type3) selector + * @region: currently assigned region for this decoder * @flags: memory type capabilities and locking */ struct cxl_decoder { @@ -264,14 +265,20 @@ struct cxl_decoder { int interleave_ways; int interleave_granularity; enum cxl_decoder_type target_type; + struct cxl_region *region; unsigned long flags; }; +/* + * CXL_DECODER_DEAD prevents endpoints from being reattached to regions + * while cxld_unregister() is running + */ enum cxl_decoder_mode { CXL_DECODER_NONE, CXL_DECODER_RAM, CXL_DECODER_PMEM, CXL_DECODER_MIXED, + CXL_DECODER_DEAD, }; /** @@ -280,12 +287,14 @@ enum cxl_decoder_mode { * @dpa_res: actively claimed DPA span of this decoder * @skip: offset into @dpa_res where @cxld.hpa_range maps * @mode: which memory type / access-mode-partition this decoder targets + * @pos: interleave position in @cxld.region */ struct cxl_endpoint_decoder { struct cxl_decoder cxld; struct resource *dpa_res; resource_size_t skip; enum cxl_decoder_mode mode; + int pos; }; /** @@ -351,6 +360,8 @@ struct cxl_region_params { int interleave_ways; int interleave_granularity; struct resource *res; + struct cxl_endpoint_decoder *targets[CXL_DECODER_MAX_INTERLEAVE]; + int nr_targets; }; /** From patchwork Fri Jul 15 00:02:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918603 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E83997460 for ; Fri, 15 Jul 2022 00:04:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843498; x=1689379498; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=h94bX3pkhvKPJBqJBcBqsOYdL1YyxVd+P+WmQ5R1eyk=; b=OuLEAWuFV7rs5Z1xb/IyK0Ybd6cK961XnRNSpYVlPA+JT1wWRCN5vdjm Gv/3YA8MDkd75Bj8ZIIoL5oeTIZebETVVsKND5do6TW6GhSqiXMsA5dJx q463Xng75YIILvYMeF9mymOjEPLxqbwJZUoO3ZoZR4G4FE2K878HhpKSi xYpA8W8TRLF+rgjHHSpfXq5FKGKb0oHeemymOPEUmpUw5JfIxSwoLeVdy xz7TOMkLz6CpbG2OplZe9K7Q2O5Vhrgq3jh92JtJ5uh2ICiUH/ZfiumMv fgL9rhQFONfKrVzbULZwA06TQR3ESD6JCIeRAd2mI/Q1JtHCekl38BSyH w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="286401819" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="286401819" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:47 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="772801733" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:47 -0700 Subject: [PATCH v2 22/28] cxl/acpi: Add a host-bridge index lookup mechanism From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Jonathan Cameron , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:47 -0700 Message-ID: <165784336732.1758207.3045854545395563239.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The ACPI CXL Fixed Memory Window Structure (CFMWS) defines multiple methods to determine which host bridge provides access to a given endpoint relative to that device's position in the interleave. The "Interleave Arithmetic" defines either a "standard modulo" / round-random algorithm, or "xormap" based algorithm which can be defined as a non-linear transform. Given that there are already more options beyond "standard modulo" and that "xormap" may turn out to be ACPI CXL specific, provide a callback for the region provisioning code to map endpoint positions back to expected host bridge id (cxl_dport target). For now just support the simple modulo math case and save the xormap for a follow-on change. Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220624041950.559155-14-dan.j.williams@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/port.c | 16 ++++++++++++++++ drivers/cxl/cxl.h | 2 ++ 2 files changed, 18 insertions(+) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index bd0673821d28..2f0b47db53da 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1421,6 +1421,20 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd, return rc; } +static struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos) +{ + struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd; + struct cxl_decoder *cxld = &cxlsd->cxld; + int iw; + + iw = cxld->interleave_ways; + if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets, + "misconfigured root decoder\n")) + return NULL; + + return cxlrd->cxlsd.target[pos % iw]; +} + static struct lock_class_key cxl_decoder_key; /** @@ -1510,6 +1524,8 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, return ERR_PTR(rc); } + cxlrd->calc_hb = cxl_hb_modulo; + cxld = &cxlsd->cxld; cxld->dev.type = &cxl_decoder_root_type; /* diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 95d74cf425a4..cd81e642e900 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -322,11 +322,13 @@ struct cxl_switch_decoder { * struct cxl_root_decoder - Static platform CXL address decoder * @res: host / parent resource for region allocations * @region_id: region id for next region provisioning event + * @calc_hb: which host bridge covers the n'th position by granularity * @cxlsd: base cxl switch decoder */ struct cxl_root_decoder { struct resource *res; atomic_t region_id; + struct cxl_dport *(*calc_hb)(struct cxl_root_decoder *cxlrd, int pos); struct cxl_switch_decoder cxlsd; }; From patchwork Fri Jul 15 00:02:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918605 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C80D17460 for ; Fri, 15 Jul 2022 00:05:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843505; x=1689379505; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AGGVqkzYjbpmuio4U2LN0wVdQD5CLNvX7/u550PTzko=; b=K/S3a8VA3nstSzBJJ3Z1EI8Syl8tA4mkIOkegmnavYh7mKo1H7fPig5l rVyAKz1nhxWk+IMzAxK9/1R6oaGhWEhbKbTzhm1+Hghfm+EbYt6JU/RSv dzycY3H7XcntPbxXvvFTXq33u/uYxmkjRU8pHIlx6auxZLrqYI7ZUS5+u ztcRjnS5xq7Or1CJhUL9t0RMZz4Lbzh8FA4UU5OF0N2CmfBXU1gWRx7b/ DWlSTd46VNxiqlLZ6r8DZMgaSYASTVeM5a7q7MZTwZRxdFvGA85IyfQ7G oG4qDfvVDX/hyDcayOsvO5sLa8/Ckh/sntw4vcn8bPvZD92tDzEUh0pdE w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="286401856" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="286401856" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:53 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="571302969" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:52 -0700 Subject: [PATCH v2 23/28] cxl/region: Attach endpoint decoders From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:52 -0700 Message-ID: <165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/core.h | 7 + drivers/cxl/core/port.c | 10 - drivers/cxl/core/region.c | 364 ++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxl.h | 20 ++ drivers/cxl/cxlmem.h | 5 + 5 files changed, 394 insertions(+), 12 deletions(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index a60ad9f656fd..6edd8174c2b5 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -41,6 +41,13 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); extern struct rw_semaphore cxl_dpa_rwsem; +bool is_switch_decoder(struct device *dev); +static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, + struct cxl_memdev *cxlmd) +{ + return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev); +} + int cxl_memdev_init(void); void cxl_memdev_exit(void); void cxl_mbox_init(void); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 2f0b47db53da..d234afc47e89 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -448,7 +448,7 @@ bool is_root_decoder(struct device *dev) } EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL); -static bool is_switch_decoder(struct device *dev) +bool is_switch_decoder(struct device *dev) { return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type; } @@ -504,6 +504,7 @@ static void cxl_port_release(struct device *dev) cxl_ep_remove(port, ep); xa_destroy(&port->endpoints); xa_destroy(&port->dports); + xa_destroy(&port->regions); ida_free(&cxl_port_ida, port->id); kfree(port); } @@ -635,6 +636,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, port->hdm_end = -1; xa_init(&port->dports); xa_init(&port->endpoints); + xa_init(&port->regions); device_initialize(dev); lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth); @@ -1109,12 +1111,6 @@ static void reap_dports(struct cxl_port *port) } } -static struct cxl_ep *cxl_ep_load(struct cxl_port *port, - struct cxl_memdev *cxlmd) -{ - return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev); -} - int devm_cxl_add_endpoint(struct cxl_memdev *cxlmd, struct cxl_dport *parent_dport) { diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 871bfdbb9bc8..8ac0c557f6aa 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -433,21 +433,294 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) return rc; } -/* - * - Check that the given endpoint is attached to a host-bridge identified - * in the root interleave. +static int match_free_decoder(struct device *dev, void *data) +{ + struct cxl_decoder *cxld; + int *id = data; + + if (!is_switch_decoder(dev)) + return 0; + + cxld = to_cxl_decoder(dev); + + /* enforce ordered allocation */ + if (cxld->id != *id) + return 0; + + if (!cxld->region) + return 1; + + (*id)++; + + return 0; +} + +static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port, + struct cxl_region *cxlr) +{ + struct device *dev; + int id = 0; + + dev = device_find_child(&port->dev, &id, match_free_decoder); + if (!dev) + return NULL; + /* + * This decoder is pinned registered as long as the endpoint decoder is + * registered, and endpoint decoder unregistration holds the + * cxl_region_rwsem over unregister events, so no need to hold on to + * this extra reference. + */ + put_device(dev); + return to_cxl_decoder(dev); +} + +static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port, + struct cxl_region *cxlr) +{ + struct cxl_region_ref *cxl_rr; + int rc; + + cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL); + if (!cxl_rr) + return NULL; + cxl_rr->port = port; + cxl_rr->region = cxlr; + xa_init(&cxl_rr->endpoints); + + rc = xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr, GFP_KERNEL); + if (rc) { + dev_dbg(&cxlr->dev, + "%s: failed to track region reference: %d\n", + dev_name(&port->dev), rc); + kfree(cxl_rr); + return NULL; + } + + return cxl_rr; +} + +static void free_region_ref(struct cxl_region_ref *cxl_rr) +{ + struct cxl_port *port = cxl_rr->port; + struct cxl_region *cxlr = cxl_rr->region; + struct cxl_decoder *cxld = cxl_rr->decoder; + + dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n"); + if (cxld->region == cxlr) { + cxld->region = NULL; + put_device(&cxlr->dev); + } + + xa_erase(&port->regions, (unsigned long)cxlr); + xa_destroy(&cxl_rr->endpoints); + kfree(cxl_rr); +} + +static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr, + struct cxl_endpoint_decoder *cxled) +{ + int rc; + struct cxl_port *port = cxl_rr->port; + struct cxl_region *cxlr = cxl_rr->region; + struct cxl_decoder *cxld = cxl_rr->decoder; + struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled)); + + rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep, + GFP_KERNEL); + if (rc) + return rc; + cxl_rr->nr_eps++; + + if (!cxld->region) { + cxld->region = cxlr; + get_device(&cxlr->dev); + } + + return 0; +} + +/** + * cxl_port_attach_region() - track a region's interest in a port by endpoint + * @port: port to add a new region reference 'struct cxl_region_ref' + * @cxlr: region to attach to @port + * @cxled: endpoint decoder used to create or further pin a region reference + * @pos: interleave position of @cxled in @cxlr + * + * The attach event is an opportunity to validate CXL decode setup + * constraints and record metadata needed for programming HDM decoders, + * in particular decoder target lists. + * + * The steps are: + * - validate that there are no other regions with a higher HPA already + * associated with @port + * - establish a region reference if one is not already present + * - additionally allocate a decoder instance that will host @cxlr on + * @port + * - pin the region reference by the endpoint + * - account for how many entries in @port's target list are needed to + * cover all of the added endpoints. */ +static int cxl_port_attach_region(struct cxl_port *port, + struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_ep *ep = cxl_ep_load(port, cxlmd); + struct cxl_region_ref *cxl_rr = NULL, *iter; + struct cxl_region_params *p = &cxlr->params; + struct cxl_decoder *cxld = NULL; + unsigned long index; + int rc = -EBUSY; + + lockdep_assert_held_write(&cxl_region_rwsem); + + xa_for_each(&port->regions, index, iter) { + struct cxl_region_params *ip = &iter->region->params; + + if (iter->region == cxlr) + cxl_rr = iter; + if (ip->res->start > p->res->start) { + dev_dbg(&cxlr->dev, + "%s: HPA order violation %s:%pr vs %pr\n", + dev_name(&port->dev), + dev_name(&iter->region->dev), ip->res, p->res); + return -EBUSY; + } + } + + if (cxl_rr) { + struct cxl_ep *ep_iter; + int found = 0; + + cxld = cxl_rr->decoder; + xa_for_each(&cxl_rr->endpoints, index, ep_iter) { + if (ep_iter == ep) + continue; + if (ep_iter->next == ep->next) { + found++; + break; + } + } + + /* + * If this is a new target or if this port is direct connected + * to this endpoint then add to the target count. + */ + if (!found || !ep->next) + cxl_rr->nr_targets++; + } else { + cxl_rr = alloc_region_ref(port, cxlr); + if (!cxl_rr) { + dev_dbg(&cxlr->dev, + "%s: failed to allocate region reference\n", + dev_name(&port->dev)); + return -ENOMEM; + } + } + + if (!cxld) { + if (port == cxled_to_port(cxled)) + cxld = &cxled->cxld; + else + cxld = cxl_region_find_decoder(port, cxlr); + if (!cxld) { + dev_dbg(&cxlr->dev, "%s: no decoder available\n", + dev_name(&port->dev)); + goto out_erase; + } + + if (cxld->region) { + dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n", + dev_name(&port->dev), dev_name(&cxld->dev), + dev_name(&cxld->region->dev)); + rc = -EBUSY; + goto out_erase; + } + + cxl_rr->decoder = cxld; + } + + rc = cxl_rr_ep_add(cxl_rr, cxled); + if (rc) { + dev_dbg(&cxlr->dev, + "%s: failed to track endpoint %s:%s reference\n", + dev_name(&port->dev), dev_name(&cxlmd->dev), + dev_name(&cxld->dev)); + goto out_erase; + } + + return 0; +out_erase: + if (cxl_rr->nr_eps == 0) + free_region_ref(cxl_rr); + return rc; +} + +static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port, + struct cxl_region *cxlr) +{ + return xa_load(&port->regions, (unsigned long)cxlr); +} + +static void cxl_port_detach_region(struct cxl_port *port, + struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled) +{ + struct cxl_region_ref *cxl_rr; + struct cxl_ep *ep; + + lockdep_assert_held_write(&cxl_region_rwsem); + + cxl_rr = cxl_rr_load(port, cxlr); + if (!cxl_rr) + return; + + ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled); + if (ep) { + struct cxl_ep *ep_iter; + unsigned long index; + int found = 0; + + cxl_rr->nr_eps--; + xa_for_each(&cxl_rr->endpoints, index, ep_iter) { + if (ep_iter->next == ep->next) { + found++; + break; + } + } + if (!found) + cxl_rr->nr_targets--; + } + + if (cxl_rr->nr_eps == 0) + free_region_ref(cxl_rr); +} + static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *ep_port, *root_port, *iter; struct cxl_region_params *p = &cxlr->params; + struct cxl_dport *dport; + int i, rc = -ENXIO; if (cxled->mode == CXL_DECODER_DEAD) { dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); return -ENODEV; } - if (pos >= p->interleave_ways) { + /* all full of members, or interleave config not established? */ + if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { + dev_dbg(&cxlr->dev, "region already active\n"); + return -EBUSY; + } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) { + dev_dbg(&cxlr->dev, "interleave config missing\n"); + return -ENXIO; + } + + if (pos < 0 || pos >= p->interleave_ways) { dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, p->interleave_ways); return -ENXIO; @@ -466,15 +739,90 @@ static int cxl_region_attach(struct cxl_region *cxlr, return -EBUSY; } + for (i = 0; i < p->interleave_ways; i++) { + struct cxl_endpoint_decoder *cxled_target; + struct cxl_memdev *cxlmd_target; + + cxled_target = p->targets[pos]; + if (!cxled_target) + continue; + + cxlmd_target = cxled_to_memdev(cxled_target); + if (cxlmd_target == cxlmd) { + dev_dbg(&cxlr->dev, + "%s already specified at position %d via: %s\n", + dev_name(&cxlmd->dev), pos, + dev_name(&cxled_target->cxld.dev)); + return -EBUSY; + } + } + + ep_port = cxled_to_port(cxled); + root_port = cxlrd_to_port(cxlrd); + dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge); + if (!dport) { + dev_dbg(&cxlr->dev, "%s:%s invalid target for %s\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + dev_name(cxlr->dev.parent)); + return -ENXIO; + } + + if (cxlrd->calc_hb(cxlrd, pos) != dport) { + dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + dev_name(&cxlrd->cxlsd.cxld.dev)); + return -ENXIO; + } + + if (cxled->cxld.target_type != cxlr->type) { + dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + cxled->cxld.target_type, cxlr->type); + return -ENXIO; + } + + if (!cxled->dpa_res) { + dev_dbg(&cxlr->dev, "%s:%s: missing DPA allocation.\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev)); + return -ENXIO; + } + + if (resource_size(cxled->dpa_res) * p->interleave_ways != + resource_size(p->res)) { + dev_dbg(&cxlr->dev, + "%s:%s: decoder-size-%#llx * ways-%d != region-size-%#llx\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + (u64)resource_size(cxled->dpa_res), p->interleave_ways, + (u64)resource_size(p->res)); + return -EINVAL; + } + + for (iter = ep_port; !is_cxl_root(iter); + iter = to_cxl_port(iter->dev.parent)) { + rc = cxl_port_attach_region(iter, cxlr, cxled, pos); + if (rc) + goto err; + } + p->targets[pos] = cxled; cxled->pos = pos; p->nr_targets++; + if (p->nr_targets == p->interleave_ways) + p->state = CXL_CONFIG_ACTIVE; + return 0; + +err: + for (iter = ep_port; !is_cxl_root(iter); + iter = to_cxl_port(iter->dev.parent)) + cxl_port_detach_region(iter, cxlr, cxled); + return rc; } static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) { + struct cxl_port *iter, *ep_port = cxled_to_port(cxled); struct cxl_region *cxlr = cxled->cxld.region; struct cxl_region_params *p; @@ -486,6 +834,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) p = &cxlr->params; get_device(&cxlr->dev); + for (iter = ep_port; !is_cxl_root(iter); + iter = to_cxl_port(iter->dev.parent)) + cxl_port_detach_region(iter, cxlr, cxled); + if (cxled->pos < 0 || cxled->pos >= p->interleave_ways || p->targets[cxled->pos] != cxled) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); @@ -496,10 +848,12 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) goto out; } + if (p->state == CXL_CONFIG_ACTIVE) + p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; p->targets[cxled->pos] = NULL; p->nr_targets--; - /* notify the region driver that one of its targets has deparated */ + /* notify the region driver that one of its targets has departed */ up_write(&cxl_region_rwsem); device_release_driver(&cxlr->dev); down_write(&cxl_region_rwsem); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index cd81e642e900..637768609a75 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -421,6 +421,7 @@ struct cxl_nvdimm { * @id: id for port device-name * @dports: cxl_dport instances referenced by decoders * @endpoints: cxl_ep instances, endpoints that are a descendant of this port + * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids * @hdm_end: track last allocated HDM decoder instance for allocation ordering @@ -435,6 +436,7 @@ struct cxl_port { int id; struct xarray dports; struct xarray endpoints; + struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; int hdm_end; @@ -476,6 +478,24 @@ struct cxl_ep { struct cxl_port *next; }; +/** + * struct cxl_region_ref - track a region's interest in a port + * @port: point in topology to install this reference + * @decoder: decoder assigned for @region in @port + * @region: region for this reference + * @endpoints: cxl_ep references for region members beneath @port + * @nr_eps: number of endpoints beneath @port + * @nr_targets: number of distinct targets needed to reach @nr_eps + */ +struct cxl_region_ref { + struct cxl_port *port; + struct cxl_decoder *decoder; + struct cxl_region *region; + struct xarray endpoints; + int nr_eps; + int nr_targets; +}; + /* * The platform firmware device hosting the root is also the top of the * CXL port topology. All other CXL ports have another CXL port as their diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index eee96016c3c7..a83bb6782d23 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -55,6 +55,11 @@ static inline struct cxl_port *cxled_to_port(struct cxl_endpoint_decoder *cxled) return to_cxl_port(cxled->cxld.dev.parent); } +static inline struct cxl_port *cxlrd_to_port(struct cxl_root_decoder *cxlrd) +{ + return to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); +} + static inline struct cxl_memdev * cxled_to_memdev(struct cxl_endpoint_decoder *cxled) { From patchwork Fri Jul 15 00:02:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918604 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FE357460 for ; Fri, 15 Jul 2022 00:05:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843501; x=1689379501; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9b5PpGGK2UxDyTUnFeAvPu4Z8fwgVuv06qsL9vaoOeg=; b=M7R+c8mQmjW/WTFdpN1/zh7CpbZ/4nxrMyERz8MsWUyjgYtrE5e2oo6T ZyN/97kgmUeGSc9Vq5NNP4nzNL8Ks2LAAGh4EWtDqVhWUVN+zndipopVW OlO/99KeYhngyjj2sMdxZHzJPhPx0+9OtoJI+LHK4SlHkS+8oywtYcWr4 RQDS6YR3WjmazlHCualZhssKaTYBMpS7GvLcWgzSXFE17FAA2KmciXmtR bk+IxM7DM1NDX68+4AOe15zUi6yq09D9XAzW6s+CXaZOewVDJVlAeIMe4 Sde348ZDmA1yezopo7AK5U+7ox/W992WR1bJqAOOlk579VM3+0art4Zdg A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="265451299" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="265451299" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:59 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="600310911" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:02:58 -0700 Subject: [PATCH v2 24/28] cxl/region: Program target lists From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:02:58 -0700 Message-ID: <165784337827.1758207.132121746122685208.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Once the region's interleave geometry (ways, granularity, size) is established and all the endpoint decoder targets are assigned, the next phase is to program all the intermediate decoders. Specifically, each CXL switch in the path between the endpoint and its CXL host-bridge (including the logical switch internal to the host-bridge) needs to have its decoders programmed and the target list order assigned. The difficulty in this implementation lies in determining which endpoint decoder ordering combinations are valid. Consider the cxl_test case of 2 host bridges, each of those host-bridges attached to 2 switches, and each of those switches attached to 2 endpoints for a potential 8-way interleave. The x2 interleave at the host-bridge level requires that all even numbered endpoint decoder positions be located on the "left" hand side of the topology tree, and the odd numbered positions on the other. The endpoints that are peers on the same switch need to have a position that can be routed with a dedicated address bit per-endpoint. See check_last_peer() for the details. Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/core.h | 4 + drivers/cxl/core/port.c | 4 - drivers/cxl/core/region.c | 262 +++++++++++++++++++++++++++++++++++++++++++-- drivers/cxl/cxl.h | 2 4 files changed, 260 insertions(+), 12 deletions(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 6edd8174c2b5..fcf14b8a3c87 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -42,9 +42,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); extern struct rw_semaphore cxl_dpa_rwsem; bool is_switch_decoder(struct device *dev); +struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev); static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, struct cxl_memdev *cxlmd) { + if (!port) + return NULL; + return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev); } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index d234afc47e89..215ce5e16986 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -146,8 +146,6 @@ static ssize_t emit_target_list(struct cxl_switch_decoder *cxlsd, char *buf) return offset; } -static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev); - static ssize_t target_list_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -472,7 +470,7 @@ struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev) } EXPORT_SYMBOL_NS_GPL(to_cxl_endpoint_decoder, CXL); -static struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev) +struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev) { if (dev_WARN_ONCE(dev, !is_switch_decoder(dev), "not a cxl_switch_decoder device\n")) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 8ac0c557f6aa..225340529fc3 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -485,6 +485,7 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port, return NULL; cxl_rr->port = port; cxl_rr->region = cxlr; + cxl_rr->nr_targets = 1; xa_init(&cxl_rr->endpoints); rc = xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr, GFP_KERNEL); @@ -525,10 +526,12 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr, struct cxl_decoder *cxld = cxl_rr->decoder; struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled)); - rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep, - GFP_KERNEL); - if (rc) - return rc; + if (ep) { + rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep, + GFP_KERNEL); + if (rc) + return rc; + } cxl_rr->nr_eps++; if (!cxld->region) { @@ -565,7 +568,7 @@ static int cxl_port_attach_region(struct cxl_port *port, struct cxl_endpoint_decoder *cxled, int pos) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct cxl_ep *ep = cxl_ep_load(port, cxlmd); + const struct cxl_ep *ep = cxl_ep_load(port, cxlmd); struct cxl_region_ref *cxl_rr = NULL, *iter; struct cxl_region_params *p = &cxlr->params; struct cxl_decoder *cxld = NULL; @@ -649,6 +652,16 @@ static int cxl_port_attach_region(struct cxl_port *port, goto out_erase; } + dev_dbg(&cxlr->dev, + "%s:%s %s add: %s:%s @ %d next: %s nr_eps: %d nr_targets: %d\n", + dev_name(port->uport), dev_name(&port->dev), + dev_name(&cxld->dev), dev_name(&cxlmd->dev), + dev_name(&cxled->cxld.dev), pos, + ep ? ep->next ? dev_name(ep->next->uport) : + dev_name(&cxlmd->dev) : + "none", + cxl_rr->nr_eps, cxl_rr->nr_targets); + return 0; out_erase: if (cxl_rr->nr_eps == 0) @@ -667,7 +680,7 @@ static void cxl_port_detach_region(struct cxl_port *port, struct cxl_endpoint_decoder *cxled) { struct cxl_region_ref *cxl_rr; - struct cxl_ep *ep; + struct cxl_ep *ep = NULL; lockdep_assert_held_write(&cxl_region_rwsem); @@ -675,7 +688,14 @@ static void cxl_port_detach_region(struct cxl_port *port, if (!cxl_rr) return; - ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled); + /* + * Endpoint ports do not carry cxl_ep references, and they + * never target more than one endpoint by definition + */ + if (cxl_rr->decoder == &cxled->cxld) + cxl_rr->nr_eps--; + else + ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled); if (ep) { struct cxl_ep *ep_iter; unsigned long index; @@ -696,6 +716,224 @@ static void cxl_port_detach_region(struct cxl_port *port, free_region_ref(cxl_rr); } +static int check_last_peer(struct cxl_endpoint_decoder *cxled, + struct cxl_ep *ep, struct cxl_region_ref *cxl_rr, + int distance) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_region *cxlr = cxl_rr->region; + struct cxl_region_params *p = &cxlr->params; + struct cxl_endpoint_decoder *cxled_peer; + struct cxl_port *port = cxl_rr->port; + struct cxl_memdev *cxlmd_peer; + struct cxl_ep *ep_peer; + int pos = cxled->pos; + + /* + * If this position wants to share a dport with the last endpoint mapped + * then that endpoint, at index 'position - distance', must also be + * mapped by this dport. + */ + if (pos < distance) { + dev_dbg(&cxlr->dev, "%s:%s: cannot host %s:%s at %d\n", + dev_name(port->uport), dev_name(&port->dev), + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos); + return -ENXIO; + } + cxled_peer = p->targets[pos - distance]; + cxlmd_peer = cxled_to_memdev(cxled_peer); + ep_peer = cxl_ep_load(port, cxlmd_peer); + if (ep->dport != ep_peer->dport) { + dev_dbg(&cxlr->dev, + "%s:%s: %s:%s pos %d mismatched peer %s:%s\n", + dev_name(port->uport), dev_name(&port->dev), + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos, + dev_name(&cxlmd_peer->dev), + dev_name(&cxled_peer->cxld.dev)); + return -ENXIO; + } + + return 0; +} + +static int cxl_port_setup_targets(struct cxl_port *port, + struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + int parent_iw, parent_ig, ig, iw, rc, inc = 0, pos = cxled->pos; + struct cxl_port *parent_port = to_cxl_port(port->dev.parent); + struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr); + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_ep *ep = cxl_ep_load(port, cxlmd); + struct cxl_region_params *p = &cxlr->params; + struct cxl_decoder *cxld = cxl_rr->decoder; + struct cxl_switch_decoder *cxlsd; + u16 eig, peig; + u8 eiw, peiw; + + /* + * While root level decoders support x3, x6, x12, switch level + * decoders only support powers of 2 up to x16. + */ + if (!is_power_of_2(cxl_rr->nr_targets)) { + dev_dbg(&cxlr->dev, "%s:%s: invalid target count %d\n", + dev_name(port->uport), dev_name(&port->dev), + cxl_rr->nr_targets); + return -EINVAL; + } + + cxlsd = to_cxl_switch_decoder(&cxld->dev); + if (cxl_rr->nr_targets_set) { + int i, distance; + + distance = p->nr_targets / cxl_rr->nr_targets; + for (i = 0; i < cxl_rr->nr_targets_set; i++) + if (ep->dport == cxlsd->target[i]) { + rc = check_last_peer(cxled, ep, cxl_rr, + distance); + if (rc) + return rc; + goto out_target_set; + } + goto add_target; + } + + if (is_cxl_root(parent_port)) { + parent_ig = cxlrd->cxlsd.cxld.interleave_granularity; + parent_iw = cxlrd->cxlsd.cxld.interleave_ways; + /* + * For purposes of address bit routing, use power-of-2 math for + * switch ports. + */ + if (!is_power_of_2(parent_iw)) + parent_iw /= 3; + } else { + struct cxl_region_ref *parent_rr; + struct cxl_decoder *parent_cxld; + + parent_rr = cxl_rr_load(parent_port, cxlr); + parent_cxld = parent_rr->decoder; + parent_ig = parent_cxld->interleave_granularity; + parent_iw = parent_cxld->interleave_ways; + } + + granularity_to_cxl(parent_ig, &peig); + ways_to_cxl(parent_iw, &peiw); + + iw = cxl_rr->nr_targets; + ways_to_cxl(iw, &eiw); + if (cxl_rr->nr_targets > 1) { + u32 address_bit = max(peig + peiw, eiw + peig); + + eig = address_bit - eiw + 1; + } else { + eiw = peiw; + eig = peig; + } + + rc = cxl_to_granularity(eig, &ig); + if (rc) { + dev_dbg(&cxlr->dev, "%s:%s: invalid interleave: %d\n", + dev_name(port->uport), dev_name(&port->dev), + 256 << eig); + return rc; + } + + cxld->interleave_ways = iw; + cxld->interleave_granularity = ig; + dev_dbg(&cxlr->dev, "%s:%s iw: %d ig: %d\n", dev_name(port->uport), + dev_name(&port->dev), iw, ig); +add_target: + if (cxl_rr->nr_targets_set == cxl_rr->nr_targets) { + dev_dbg(&cxlr->dev, + "%s:%s: targets full trying to add %s:%s at %d\n", + dev_name(port->uport), dev_name(&port->dev), + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos); + return -ENXIO; + } + cxlsd->target[cxl_rr->nr_targets_set] = ep->dport; + inc = 1; +out_target_set: + cxl_rr->nr_targets_set += inc; + dev_dbg(&cxlr->dev, "%s:%s target[%d] = %s for %s:%s @ %d\n", + dev_name(port->uport), dev_name(&port->dev), + cxl_rr->nr_targets_set - 1, dev_name(ep->dport->dport), + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos); + + return 0; +} + +static void cxl_port_reset_targets(struct cxl_port *port, + struct cxl_region *cxlr) +{ + struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr); + + /* + * After the last endpoint has been detached the entire cxl_rr may now + * be gone. + */ + if (cxl_rr) + cxl_rr->nr_targets_set = 0; +} + +static void cxl_region_teardown_targets(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + struct cxl_endpoint_decoder *cxled; + struct cxl_memdev *cxlmd; + struct cxl_port *iter; + struct cxl_ep *ep; + int i; + + for (i = 0; i < p->nr_targets; i++) { + cxled = p->targets[i]; + cxlmd = cxled_to_memdev(cxled); + + iter = cxled_to_port(cxled); + while (!is_cxl_root(to_cxl_port(iter->dev.parent))) + iter = to_cxl_port(iter->dev.parent); + + for (ep = cxl_ep_load(iter, cxlmd); iter; + iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) + cxl_port_reset_targets(iter, cxlr); + } +} + +static int cxl_region_setup_targets(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + struct cxl_endpoint_decoder *cxled; + struct cxl_memdev *cxlmd; + struct cxl_port *iter; + struct cxl_ep *ep; + int i, rc; + + for (i = 0; i < p->nr_targets; i++) { + cxled = p->targets[i]; + cxlmd = cxled_to_memdev(cxled); + + iter = cxled_to_port(cxled); + while (!is_cxl_root(to_cxl_port(iter->dev.parent))) + iter = to_cxl_port(iter->dev.parent); + + /* + * Descend the topology tree programming targets while + * looking for conflicts. + */ + for (ep = cxl_ep_load(iter, cxlmd); iter; + iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { + rc = cxl_port_setup_targets(iter, cxlr, cxled); + if (rc) { + cxl_region_teardown_targets(cxlr); + return rc; + } + } + } + + return 0; +} + static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { @@ -808,8 +1046,12 @@ static int cxl_region_attach(struct cxl_region *cxlr, cxled->pos = pos; p->nr_targets++; - if (p->nr_targets == p->interleave_ways) + if (p->nr_targets == p->interleave_ways) { + rc = cxl_region_setup_targets(cxlr); + if (rc) + goto err; p->state = CXL_CONFIG_ACTIVE; + } return 0; @@ -848,8 +1090,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) goto out; } - if (p->state == CXL_CONFIG_ACTIVE) + if (p->state == CXL_CONFIG_ACTIVE) { p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; + cxl_region_teardown_targets(cxlr); + } p->targets[cxled->pos] = NULL; p->nr_targets--; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 637768609a75..70862141209b 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -484,6 +484,7 @@ struct cxl_ep { * @decoder: decoder assigned for @region in @port * @region: region for this reference * @endpoints: cxl_ep references for region members beneath @port + * @nr_targets_set: track how many targets have been programmed during setup * @nr_eps: number of endpoints beneath @port * @nr_targets: number of distinct targets needed to reach @nr_eps */ @@ -492,6 +493,7 @@ struct cxl_region_ref { struct cxl_decoder *decoder; struct cxl_region *region; struct xarray endpoints; + int nr_targets_set; int nr_eps; int nr_targets; }; From patchwork Fri Jul 15 00:03:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918602 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F1A77460 for ; Fri, 15 Jul 2022 00:04:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843493; x=1689379493; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DGEzL9znCuaboJ+RyfY9Q0LRbffLG45ke94gDuGE16E=; b=lEoGrsfE3Kq3Mfu15p1NzYpekpuvJo/UZ5iLt/GhqhH31RTTN3pp32Ip bcJ+KRggkcTgsSGFRo1s7lQ/SKRrHOiKyTKwB2IdNe2eHXKghFG0rqPQb aCXV2lZEE5eSn4UInv+sS1J1CQy79GAe1AQ9a6rB8/vevKNslTl4GpEzO vxB7ylqTVAX3IhJrGOwl3CUTmmfV+uuOdv8cV+zxNR9zgN8uaYLTKMSa1 OYpon85VnWflrqXab3u92nqO7SQnwpoZ2ZbLiLDGQ97iuBUp9G6mPm7rt 6UZptyaJljeIE427b+8KXQiNwNrxewGqYlNN/wUWWtYzWi7r34aPY3seW A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="349627696" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="349627696" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:04 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="628897286" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:04 -0700 Subject: [PATCH v2 25/28] cxl/hdm: Commit decoder state to hardware From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:03:04 -0700 Message-ID: <165784338418.1758207.14659830845389904356.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 After all the soft validation of the region has completed, convey the region configuration to hardware while being careful to commit decoders in specification mandated order. In addition to programming the endpoint decoder base-address, interleave ways and granularity, the switch decoder target lists are also established. While the kernel can enforce spec-mandated commit order, it can not enforce spec-mandated reset order. For example, the kernel can't stop someone from removing an endpoint device that is occupying decoderN in a switch decoder where decoderN+1 is also committed. To reset decoderN, decoderN+1 must be torn down first. That "tear down the world" implementation is saved for a follow-on patch. Callback operations are provided for the 'commit' and 'reset' operations. While those callbacks may prove useful for CXL accelerators (Type-2 devices with memory) the primary motivation is to enable a simple way for cxl_test to intercept those operations. Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- Documentation/ABI/testing/sysfs-bus-cxl | 16 ++ drivers/cxl/core/hdm.c | 218 +++++++++++++++++++++++++++++++ drivers/cxl/core/port.c | 1 drivers/cxl/core/region.c | 194 ++++++++++++++++++++++++++-- drivers/cxl/cxl.h | 13 ++ tools/testing/cxl/test/cxl.c | 46 +++++++ 6 files changed, 477 insertions(+), 11 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 94e19e24de8d..2c42888a3df0 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -361,3 +361,19 @@ Description: not an endpoint decoder. Once all positions have been successfully written a final validation for decode conflicts is performed before activating the region. + + +What: /sys/bus/cxl/devices/regionZ/commit +Date: May, 2022 +KernelVersion: v5.20 +Contact: linux-cxl@vger.kernel.org +Description: + (RW) Write a boolean 'true' string value to this attribute to + trigger the region to transition from the software programmed + state to the actively decoding in hardware state. The commit + operation in addition to validating that the region is in proper + configured state, validates that the decoders are being + committed in spec mandated order (last committed decoder id + + 1), and checks that the hardware accepts the commit request. + Reading this value indicates whether the region is committed or + not. diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 81645de1064f..88edb8391fbd 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -129,6 +129,8 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port) return ERR_PTR(-ENXIO); } + dev_set_drvdata(dev, cxlhdm); + return cxlhdm; } EXPORT_SYMBOL_NS_GPL(devm_cxl_setup_hdm, CXL); @@ -462,6 +464,213 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); } +static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl) +{ + u16 eig; + u8 eiw; + + ways_to_cxl(cxld->interleave_ways, &eiw); + granularity_to_cxl(cxld->interleave_granularity, &eig); + + u32p_replace_bits(ctrl, eig, CXL_HDM_DECODER0_CTRL_IG_MASK); + u32p_replace_bits(ctrl, eiw, CXL_HDM_DECODER0_CTRL_IW_MASK); + *ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT; +} + +static void cxld_set_type(struct cxl_decoder *cxld, u32 *ctrl) +{ + u32p_replace_bits(ctrl, !!(cxld->target_type == 3), + CXL_HDM_DECODER0_CTRL_TYPE); +} + +static void cxld_set_hpa(struct cxl_decoder *cxld, u64 *base, u64 *size) +{ + struct cxl_region *cxlr = cxld->region; + struct cxl_region_params *p = &cxlr->params; + + cxld->hpa_range = (struct range) { + .start = p->res->start, + .end = p->res->end, + }; + + *base = p->res->start; + *size = resource_size(p->res); +} + +static void cxld_clear_hpa(struct cxl_decoder *cxld) +{ + cxld->hpa_range = (struct range) { + .start = 0, + .end = -1, + }; +} + +static int cxlsd_set_targets(struct cxl_switch_decoder *cxlsd, u64 *tgt) +{ + struct cxl_dport **t = &cxlsd->target[0]; + int ways = cxlsd->cxld.interleave_ways; + + if (dev_WARN_ONCE(&cxlsd->cxld.dev, + ways > 8 || ways > cxlsd->nr_targets, + "ways: %d overflows targets: %d\n", ways, + cxlsd->nr_targets)) + return -ENXIO; + + *tgt = FIELD_PREP(GENMASK(7, 0), t[0]->port_id); + if (ways > 1) + *tgt |= FIELD_PREP(GENMASK(15, 8), t[1]->port_id); + if (ways > 2) + *tgt |= FIELD_PREP(GENMASK(23, 16), t[2]->port_id); + if (ways > 3) + *tgt |= FIELD_PREP(GENMASK(31, 24), t[3]->port_id); + if (ways > 4) + *tgt |= FIELD_PREP(GENMASK_ULL(39, 32), t[4]->port_id); + if (ways > 5) + *tgt |= FIELD_PREP(GENMASK_ULL(47, 40), t[5]->port_id); + if (ways > 6) + *tgt |= FIELD_PREP(GENMASK_ULL(55, 48), t[6]->port_id); + if (ways > 7) + *tgt |= FIELD_PREP(GENMASK_ULL(63, 56), t[7]->port_id); + + return 0; +} + +/* + * Per CXL 2.0 8.2.5.12.20 Committing Decoder Programming, hardware must set + * committed or error within 10ms, but just be generous with 20ms to account for + * clock skew and other marginal behavior + */ +#define COMMIT_TIMEOUT_MS 20 +static int cxld_await_commit(void __iomem *hdm, int id) +{ + u32 ctrl; + int i; + + for (i = 0; i < COMMIT_TIMEOUT_MS; i++) { + ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); + if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMIT_ERROR, ctrl)) { + ctrl &= ~CXL_HDM_DECODER0_CTRL_COMMIT; + writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); + return -EIO; + } + if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl)) + return 0; + fsleep(1000); + } + + return -ETIMEDOUT; +} + +static int cxl_decoder_commit(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); + void __iomem *hdm = cxlhdm->regs.hdm_decoder; + int id = cxld->id, rc; + u64 base, size; + u32 ctrl; + + if (cxld->flags & CXL_DECODER_F_ENABLE) + return 0; + + if (port->commit_end + 1 != id) { + dev_dbg(&port->dev, + "%s: out of order commit, expected decoder%d.%d\n", + dev_name(&cxld->dev), port->id, port->commit_end + 1); + return -EBUSY; + } + + down_read(&cxl_dpa_rwsem); + /* common decoder settings */ + ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id)); + cxld_set_interleave(cxld, &ctrl); + cxld_set_type(cxld, &ctrl); + cxld_set_hpa(cxld, &base, &size); + + writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id)); + writel(lower_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); + writel(upper_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id)); + writel(lower_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id)); + + if (is_switch_decoder(&cxld->dev)) { + struct cxl_switch_decoder *cxlsd = + to_cxl_switch_decoder(&cxld->dev); + void __iomem *tl_hi = hdm + CXL_HDM_DECODER0_TL_HIGH(id); + void __iomem *tl_lo = hdm + CXL_HDM_DECODER0_TL_LOW(id); + u64 targets; + + rc = cxlsd_set_targets(cxlsd, &targets); + if (rc) { + dev_dbg(&port->dev, "%s: target configuration error\n", + dev_name(&cxld->dev)); + goto err; + } + + writel(upper_32_bits(targets), tl_hi); + writel(lower_32_bits(targets), tl_lo); + } else { + struct cxl_endpoint_decoder *cxled = + to_cxl_endpoint_decoder(&cxld->dev); + void __iomem *sk_hi = hdm + CXL_HDM_DECODER0_SKIP_HIGH(id); + void __iomem *sk_lo = hdm + CXL_HDM_DECODER0_SKIP_LOW(id); + + writel(upper_32_bits(cxled->skip), sk_hi); + writel(lower_32_bits(cxled->skip), sk_lo); + } + + writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); + up_read(&cxl_dpa_rwsem); + + port->commit_end++; + rc = cxld_await_commit(hdm, cxld->id); +err: + if (rc) { + dev_dbg(&port->dev, "%s: error %d committing decoder\n", + dev_name(&cxld->dev), rc); + cxld->reset(cxld); + return rc; + } + cxld->flags |= CXL_DECODER_F_ENABLE; + + return 0; +} + +static int cxl_decoder_reset(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); + void __iomem *hdm = cxlhdm->regs.hdm_decoder; + int id = cxld->id; + u32 ctrl; + + if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) + return 0; + + if (port->commit_end != id) { + dev_dbg(&port->dev, + "%s: out of order reset, expected decoder%d.%d\n", + dev_name(&cxld->dev), port->id, port->commit_end); + return -EBUSY; + } + + down_read(&cxl_dpa_rwsem); + ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); + ctrl &= ~CXL_HDM_DECODER0_CTRL_COMMIT; + writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); + + cxld_clear_hpa(cxld); + writel(0, hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id)); + writel(0, hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id)); + writel(0, hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id)); + writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); + up_read(&cxl_dpa_rwsem); + + port->commit_end--; + cxld->flags &= ~CXL_DECODER_F_ENABLE; + + return 0; +} + static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, int *target_map, void __iomem *hdm, int which, u64 *dpa_base) @@ -484,6 +693,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, base = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(which)); size = ioread64_hi_lo(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(which)); committed = !!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED); + cxld->commit = cxl_decoder_commit; + cxld->reset = cxl_decoder_reset; if (!committed) size = 0; @@ -507,6 +718,13 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, cxld->target_type = CXL_DECODER_EXPANDER; else cxld->target_type = CXL_DECODER_ACCELERATOR; + if (cxld->id != port->commit_end + 1) { + dev_warn(&port->dev, + "decoder%d.%d: Committed out of order\n", + port->id, cxld->id); + return -ENXIO; + } + port->commit_end = cxld->id; } else { /* unless / until type-2 drivers arrive, assume type-3 */ if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0) { diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 215ce5e16986..7ab9a98c5d4f 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -632,6 +632,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport, port->component_reg_phys = component_reg_phys; ida_init(&port->decoder_ida); port->hdm_end = -1; + port->commit_end = -1; xa_init(&port->dports); xa_init(&port->endpoints); xa_init(&port->regions); diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 225340529fc3..de794344d964 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -115,6 +115,173 @@ static ssize_t uuid_store(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_RW(uuid); +static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port, + struct cxl_region *cxlr) +{ + return xa_load(&port->regions, (unsigned long)cxlr); +} + +static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) +{ + struct cxl_region_params *p = &cxlr->params; + int i; + + for (i = count - 1; i >= 0; i--) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *iter = cxled_to_port(cxled); + struct cxl_ep *ep; + int rc; + + while (!is_cxl_root(to_cxl_port(iter->dev.parent))) + iter = to_cxl_port(iter->dev.parent); + + for (ep = cxl_ep_load(iter, cxlmd); iter; + iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { + struct cxl_region_ref *cxl_rr; + struct cxl_decoder *cxld; + + cxl_rr = cxl_rr_load(iter, cxlr); + cxld = cxl_rr->decoder; + rc = cxld->reset(cxld); + if (rc) + return rc; + } + + rc = cxled->cxld.reset(&cxled->cxld); + if (rc) + return rc; + } + + return 0; +} + +static int cxl_region_decode_commit(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + int i, rc; + + for (i = 0; i < p->nr_targets; i++) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_region_ref *cxl_rr; + struct cxl_decoder *cxld; + struct cxl_port *iter; + struct cxl_ep *ep; + + /* commit bottom up */ + for (iter = cxled_to_port(cxled); !is_cxl_root(iter); + iter = to_cxl_port(iter->dev.parent)) { + cxl_rr = cxl_rr_load(iter, cxlr); + cxld = cxl_rr->decoder; + rc = cxld->commit(cxld); + if (rc) + break; + } + + /* success, all decoders up to the root are programmed */ + if (is_cxl_root(iter)) + continue; + + /* programming @iter failed, teardown */ + for (ep = cxl_ep_load(iter, cxlmd); ep && iter; + iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { + cxl_rr = cxl_rr_load(iter, cxlr); + cxld = cxl_rr->decoder; + cxld->reset(cxld); + } + + cxled->cxld.reset(&cxled->cxld); + if (i == 0) + return rc; + break; + } + + if (i >= p->nr_targets) + return 0; + + /* undo the targets that were successfully committed */ + cxl_region_decode_reset(cxlr, i); + return rc; +} + +static ssize_t commit_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + bool commit; + ssize_t rc; + + rc = kstrtobool(buf, &commit); + if (rc) + return rc; + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + return rc; + + /* Already in the requested state? */ + if (commit && p->state >= CXL_CONFIG_COMMIT) + goto out; + if (!commit && p->state < CXL_CONFIG_COMMIT) + goto out; + + /* Not ready to commit? */ + if (commit && p->state < CXL_CONFIG_ACTIVE) { + rc = -ENXIO; + goto out; + } + + if (commit) + rc = cxl_region_decode_commit(cxlr); + else { + p->state = CXL_CONFIG_RESET_PENDING; + up_write(&cxl_region_rwsem); + device_release_driver(&cxlr->dev); + down_write(&cxl_region_rwsem); + + /* + * The lock was dropped, so need to revalidate that the reset is + * still pending. + */ + if (p->state == CXL_CONFIG_RESET_PENDING) + rc = cxl_region_decode_reset(cxlr, p->interleave_ways); + } + + if (rc) + goto out; + + if (commit) + p->state = CXL_CONFIG_COMMIT; + else if (p->state == CXL_CONFIG_RESET_PENDING) + p->state = CXL_CONFIG_ACTIVE; + +out: + up_write(&cxl_region_rwsem); + + if (rc) + return rc; + return len; +} + +static ssize_t commit_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + ssize_t rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) + return rc; + rc = sysfs_emit(buf, "%d\n", p->state >= CXL_CONFIG_COMMIT); + up_read(&cxl_region_rwsem); + + return rc; +} +static DEVICE_ATTR_RW(commit); + static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, int n) { @@ -393,6 +560,7 @@ static DEVICE_ATTR_RW(size); static struct attribute *cxl_region_attrs[] = { &dev_attr_uuid.attr, + &dev_attr_commit.attr, &dev_attr_interleave_ways.attr, &dev_attr_interleave_granularity.attr, &dev_attr_resource.attr, @@ -669,12 +837,6 @@ static int cxl_port_attach_region(struct cxl_port *port, return rc; } -static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port, - struct cxl_region *cxlr) -{ - return xa_load(&port->regions, (unsigned long)cxlr); -} - static void cxl_port_detach_region(struct cxl_port *port, struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled) @@ -1062,20 +1224,32 @@ static int cxl_region_attach(struct cxl_region *cxlr, return rc; } -static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) +static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) { struct cxl_port *iter, *ep_port = cxled_to_port(cxled); struct cxl_region *cxlr = cxled->cxld.region; struct cxl_region_params *p; + int rc = 0; lockdep_assert_held_write(&cxl_region_rwsem); if (!cxlr) - return; + return 0; p = &cxlr->params; get_device(&cxlr->dev); + if (p->state > CXL_CONFIG_ACTIVE) { + /* + * TODO: tear down all impacted regions if a device is + * removed out of order + */ + rc = cxl_region_decode_reset(cxlr, p->interleave_ways); + if (rc) + goto out; + p->state = CXL_CONFIG_ACTIVE; + } + for (iter = ep_port; !is_cxl_root(iter); iter = to_cxl_port(iter->dev.parent)) cxl_port_detach_region(iter, cxlr, cxled); @@ -1103,6 +1277,7 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) down_write(&cxl_region_rwsem); out: put_device(&cxlr->dev); + return rc; } void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) @@ -1160,8 +1335,7 @@ static int detach_target(struct cxl_region *cxlr, int pos) goto out; } - cxl_region_detach(p->targets[pos]); - rc = 0; + rc = cxl_region_detach(p->targets[pos]); out: up_write(&cxl_region_rwsem); return rc; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 70862141209b..a51709613c43 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -54,6 +54,7 @@ #define CXL_HDM_DECODER0_CTRL_LOCK BIT(8) #define CXL_HDM_DECODER0_CTRL_COMMIT BIT(9) #define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10) +#define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11) #define CXL_HDM_DECODER0_CTRL_TYPE BIT(12) #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24) #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28) @@ -257,7 +258,9 @@ enum cxl_decoder_type { * @target_type: accelerator vs expander (type2 vs type3) selector * @region: currently assigned region for this decoder * @flags: memory type capabilities and locking - */ + * @commit: device/decoder-type specific callback to commit settings to hw + * @reset: device/decoder-type specific callback to reset hw settings +*/ struct cxl_decoder { struct device dev; int id; @@ -267,6 +270,8 @@ struct cxl_decoder { enum cxl_decoder_type target_type; struct cxl_region *region; unsigned long flags; + int (*commit)(struct cxl_decoder *cxld); + int (*reset)(struct cxl_decoder *cxld); }; /* @@ -339,11 +344,15 @@ struct cxl_root_decoder { * changes to interleave_ways or interleave_granularity * @CXL_CONFIG_ACTIVE: All targets have been added the region is now * active + * @CXL_CONFIG_RESET_PENDING: see commit_store() + * @CXL_CONFIG_COMMIT: Soft-config has been committed to hardware */ enum cxl_config_state { CXL_CONFIG_IDLE, CXL_CONFIG_INTERLEAVE_ACTIVE, CXL_CONFIG_ACTIVE, + CXL_CONFIG_RESET_PENDING, + CXL_CONFIG_COMMIT, }; /** @@ -425,6 +434,7 @@ struct cxl_nvdimm { * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids * @hdm_end: track last allocated HDM decoder instance for allocation ordering + * @commit_end: cursor to track highest committed decoder for commit ordering * @component_reg_phys: component register capability base address (optional) * @dead: last ep has been removed, force port re-creation * @depth: How deep this port is relative to the root. depth 0 is the root. @@ -440,6 +450,7 @@ struct cxl_port { struct cxl_dport *parent_dport; struct ida decoder_ida; int hdm_end; + int commit_end; resource_size_t component_reg_phys; bool dead; unsigned int depth; diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 4dad0fa7ac4c..a072b2d3e726 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -429,6 +429,50 @@ static int map_targets(struct device *dev, void *data) return 0; } +static int mock_decoder_commit(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + int id = cxld->id; + + if (cxld->flags & CXL_DECODER_F_ENABLE) + return 0; + + dev_dbg(&port->dev, "%s commit\n", dev_name(&cxld->dev)); + if (port->commit_end + 1 != id) { + dev_dbg(&port->dev, + "%s: out of order commit, expected decoder%d.%d\n", + dev_name(&cxld->dev), port->id, port->commit_end + 1); + return -EBUSY; + } + + port->commit_end++; + cxld->flags |= CXL_DECODER_F_ENABLE; + + return 0; +} + +static int mock_decoder_reset(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + int id = cxld->id; + + if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) + return 0; + + dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev)); + if (port->commit_end != id) { + dev_dbg(&port->dev, + "%s: out of order reset, expected decoder%d.%d\n", + dev_name(&cxld->dev), port->id, port->commit_end); + return -EBUSY; + } + + port->commit_end--; + cxld->flags &= ~CXL_DECODER_F_ENABLE; + + return 0; +} + static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) { struct cxl_port *port = cxlhdm->port; @@ -482,6 +526,8 @@ static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) cxld->interleave_ways = min_not_zero(target_count, 1); cxld->interleave_granularity = SZ_4K; cxld->target_type = CXL_DECODER_EXPANDER; + cxld->commit = mock_decoder_commit; + cxld->reset = mock_decoder_reset; if (target_count) { rc = device_for_each_child(port->uport, &ctx, From patchwork Fri Jul 15 00:03:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918606 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6970B7460 for ; Fri, 15 Jul 2022 00:05:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843517; x=1689379517; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pDhRf3qBnQEd9BweVVOXaxp4SJT1O6WGOvb5VpLfORo=; b=SDGkp6URqdb5kbBieH89YbrRDimsb5D67eoAaUGNviLdltIq07X/wlM0 tnodyeHWG8KDMoS0kT2IitlXYUk4nrXy41A5PTr62fHC53sBZfVC8v5hf U0tL+dIPeObH9Htom7QtaIWRMAi4/ioK6Sj/nXk/BfhWIAKNrawXif3QD vTFfPy93Bg+XNTemybElX/MjfQrYMzCy4QHx1oGTkuvX/pCkuIKmomQCG vR1tqGe8QPee9iNA0+diCWaMLUagoGor2BUJd62JKzHyhxhk18LryFoaY /SqoLTJJbQO4cRnP0HhiDRsMx+aCFAkkp3e1KhSJslMPHs3X3RAUGex6t Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="268688416" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="268688416" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:10 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="685766925" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:10 -0700 Subject: [PATCH v2 26/28] cxl/region: Add region driver boiler plate From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , Jonathan Cameron , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:03:09 -0700 Message-ID: <165784338963.1758207.3908994719897882778.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The CXL region driver is responsible for routing fully formed CXL regions to one of libnvdimm, for persistent memory regions, device-dax for volatile memory regions, or just act as an enumeration placeholder if the region was setup and configuration locked by platform firmware. In the platform-firmware-setup case the expectation is that region is already accounted in the system memory map, i.e. already enabled as "System RAM". For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and take no further action. Given this driver is just a small / simple router, include it in the core rather than its own module. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220624041950.559155-18-dan.j.williams@intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/core.h | 12 ++++++++++++ drivers/cxl/core/port.c | 9 +++++++++ drivers/cxl/core/region.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxl.h | 1 + 4 files changed, 66 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index fcf14b8a3c87..391aadf9e7fa 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -13,14 +13,26 @@ extern struct attribute_group cxl_base_attribute_group; extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_delete_region; extern struct device_attribute dev_attr_region; +extern const struct device_type cxl_region_type; void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled); #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr) +#define CXL_REGION_TYPE(x) (&cxl_region_type) #define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr), +int cxl_region_init(void); +void cxl_region_exit(void); #else static inline void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) { } +static inline int cxl_region_init(void) +{ + return 0; +} +static inline void cxl_region_exit(void) +{ +} #define CXL_REGION_ATTR(x) NULL +#define CXL_REGION_TYPE(x) NULL #define SET_CXL_REGION_ATTR(x) #endif diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 7ab9a98c5d4f..194003525397 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -51,6 +51,8 @@ static int cxl_device_id(struct device *dev) } if (is_cxl_memdev(dev)) return CXL_DEVICE_MEMORY_EXPANDER; + if (dev->type == CXL_REGION_TYPE()) + return CXL_DEVICE_REGION; return 0; } @@ -1864,8 +1866,14 @@ static __init int cxl_core_init(void) if (rc) goto err_bus; + rc = cxl_region_init(); + if (rc) + goto err_region; + return 0; +err_region: + bus_unregister(&cxl_bus_type); err_bus: destroy_workqueue(cxl_bus_wq); err_wq: @@ -1875,6 +1883,7 @@ static __init int cxl_core_init(void) static void cxl_core_exit(void) { + cxl_region_exit(); bus_unregister(&cxl_bus_type); destroy_workqueue(cxl_bus_wq); cxl_memdev_exit(); diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index de794344d964..20871bdb6858 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1444,7 +1444,7 @@ static void cxl_region_release(struct device *dev) kfree(cxlr); } -static const struct device_type cxl_region_type = { +const struct device_type cxl_region_type = { .name = "cxl_region", .release = cxl_region_release, .groups = region_groups @@ -1644,4 +1644,47 @@ static ssize_t delete_region_store(struct device *dev, } DEVICE_ATTR_WO(delete_region); +static int cxl_region_probe(struct device *dev) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + int rc; + + rc = down_read_interruptible(&cxl_region_rwsem); + if (rc) { + dev_dbg(&cxlr->dev, "probe interrupted\n"); + return rc; + } + + if (p->state < CXL_CONFIG_COMMIT) { + dev_dbg(&cxlr->dev, "config state: %d\n", p->state); + rc = -ENXIO; + } + + /* + * From this point on any path that changes the region's state away from + * CXL_CONFIG_COMMIT is also responsible for releasing the driver. + */ + up_read(&cxl_region_rwsem); + + return rc; +} + +static struct cxl_driver cxl_region_driver = { + .name = "cxl_region", + .probe = cxl_region_probe, + .id = CXL_DEVICE_REGION, +}; + +int cxl_region_init(void) +{ + return cxl_driver_register(&cxl_region_driver); +} + +void cxl_region_exit(void) +{ + cxl_driver_unregister(&cxl_region_driver); +} + MODULE_IMPORT_NS(CXL); +MODULE_ALIAS_CXL(CXL_DEVICE_REGION); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index a51709613c43..9aedd471193a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -592,6 +592,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv); #define CXL_DEVICE_PORT 3 #define CXL_DEVICE_ROOT 4 #define CXL_DEVICE_MEMORY_EXPANDER 5 +#define CXL_DEVICE_REGION 6 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*") #define CXL_MODALIAS_FMT "cxl:t%d" From patchwork Fri Jul 15 00:03:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918608 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E57227460 for ; Fri, 15 Jul 2022 00:05:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843537; x=1689379537; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TgQ3wfpkvJiu1CZKehyIA3pvixwKsN+GwHLtVvvU30k=; b=KsxSUKyl8y9comsYSNKa3xedaH2AoCtujomZQnfX/rf0VeTZBKWQ6rqk 4SDqMtu1decR/RCn7KEG59QB9pRW78v1VBRVms+MlOgFqfM4pYCoVckXA sKk3jYN2TFvX6j7mD9dDJciE7bj5jrUD1JWBK1J5LJA+i+h1VDklntfP6 w596btWB87r9wgWVSOA3h43/S1lh1E/BX9uf2p2lguCDPYAwjlbqnas/Z Xim+ZJrVyV6f31D7YpN+XKoSD6IvZdXZEzuQ1uQIt1cCsOi0xWnTmUegu tFjFFQVOjJA0m6ZCr9nT/tSZiGoVZDiuOdJYh+wtPxuYYPjMzT+gnPam/ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="286402098" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="286402098" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:16 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="571303156" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:15 -0700 Subject: [PATCH v2 27/28] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge From: Dan Williams To: linux-cxl@vger.kernel.org Cc: hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:03:15 -0700 Message-ID: <165784339569.1758207.1557084545278004577.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Be careful to only disable cxl_pmem objects related to a given cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL domains and disables more than is expected. Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices") Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/cxl.h | 1 + drivers/cxl/pmem.c | 21 +++++++++++++++++---- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 9aedd471193a..a32093602df9 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -418,6 +418,7 @@ struct cxl_nvdimm_bridge { struct cxl_nvdimm { struct device dev; struct cxl_memdev *cxlmd; + struct cxl_nvdimm_bridge *bridge; }; /** diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index 0aaa70b4e0f7..b271f6e90b91 100644 --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -26,7 +26,10 @@ static void clear_exclusive(void *cxlds) static void unregister_nvdimm(void *nvdimm) { + struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm); + nvdimm_delete(nvdimm); + cxl_nvd->bridge = NULL; } static int cxl_nvdimm_probe(struct device *dev) @@ -66,6 +69,7 @@ static int cxl_nvdimm_probe(struct device *dev) } dev_set_drvdata(dev, nvdimm); + cxl_nvd->bridge = cxl_nvb; rc = devm_add_action_or_reset(dev, unregister_nvdimm, nvdimm); out: device_unlock(&cxl_nvb->dev); @@ -204,15 +208,23 @@ static bool online_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb) return cxl_nvb->nvdimm_bus != NULL; } -static int cxl_nvdimm_release_driver(struct device *dev, void *data) +static int cxl_nvdimm_release_driver(struct device *dev, void *cxl_nvb) { + struct cxl_nvdimm *cxl_nvd; + if (!is_cxl_nvdimm(dev)) return 0; + + cxl_nvd = to_cxl_nvdimm(dev); + if (cxl_nvd->bridge != cxl_nvb) + return 0; + device_release_driver(dev); return 0; } -static void offline_nvdimm_bus(struct nvdimm_bus *nvdimm_bus) +static void offline_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb, + struct nvdimm_bus *nvdimm_bus) { if (!nvdimm_bus) return; @@ -222,7 +234,8 @@ static void offline_nvdimm_bus(struct nvdimm_bus *nvdimm_bus) * nvdimm_bus_unregister() rips the nvdimm objects out from * underneath them. */ - bus_for_each_dev(&cxl_bus_type, NULL, NULL, cxl_nvdimm_release_driver); + bus_for_each_dev(&cxl_bus_type, NULL, cxl_nvb, + cxl_nvdimm_release_driver); nvdimm_bus_unregister(nvdimm_bus); } @@ -260,7 +273,7 @@ static void cxl_nvb_update_state(struct work_struct *work) dev_dbg(&cxl_nvb->dev, "rescan: %d\n", rc); } - offline_nvdimm_bus(victim_bus); + offline_nvdimm_bus(cxl_nvb, victim_bus); put_device(&cxl_nvb->dev); } From patchwork Fri Jul 15 00:03:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12918607 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21FFF7460 for ; Fri, 15 Jul 2022 00:05:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657843524; x=1689379524; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=smKpDNwDgoEnff0IcZhkKXyTjumqxb1p/y234UsFkk0=; b=N/5H6a0P+3gC0FJaQRnevY/2JnM1uEmS1qpvwJpTw7k7VASD2GpwR8oN 8XGphWToXyBMtXd9BuBH0hRqT6cnv1lbl6HukoLm6tCSGZHroQdoE1zCT FhBfEsdvI0rr9c/xoHivTwgWAQGme4kCLbMasrY0Rzxyt4v2k0yZVnYlL /heNin+K4OPYJJLnDnBcTIGXzc34FfX2xFCuT03hIX75gRqX+h2uaqpLz ZAx832CCTqzTMHxY6nZp1LEKrOb52aaZGpPL8JpqcWoAPGLwEI5PDWz6Z AIkidXrlsE/qSvnQkdzFgWi0XuFoKfHVx/HXsXM3ARKlqOieC6efD/G/k w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="265451462" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="265451462" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:21 -0700 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="571303179" Received: from jlcone-mobl1.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.2.90]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 17:03:21 -0700 Subject: [PATCH v2 28/28] cxl/region: Introduce cxl_pmem_region objects From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Ben Widawsky , hch@lst.de, nvdimm@lists.linux.dev, linux-pci@vger.kernel.org Date: Thu, 14 Jul 2022 17:03:21 -0700 Message-ID: <165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> References: <165784324066.1758207.15025479284039479071.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The LIBNVDIMM subsystem is a platform agnostic representation of system NVDIMM / persistent memory resources. To date, the CXL subsystem's interaction with LIBNVDIMM has been to register an nvdimm-bridge device and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM subsystem mechanics. With regions the approach is the same. Create a new cxl_pmem_region object to proxy CXL region details into a LIBNVDIMM definition. With this enabling LIBNVDIMM can partition CXL persistent memory regions with legacy namespace labels. A follow-on patch will add CXL region label and CXL namespace label support to persist region configurations across driver reload / system-reset events. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron --- drivers/cxl/core/core.h | 3 + drivers/cxl/core/pmem.c | 4 - drivers/cxl/core/port.c | 2 drivers/cxl/core/region.c | 142 +++++++++++++++++++++++++ drivers/cxl/cxl.h | 36 ++++++ drivers/cxl/pmem.c | 238 ++++++++++++++++++++++++++++++++++++++++++ drivers/nvdimm/region_devs.c | 28 ++++- include/linux/libnvdimm.h | 5 + 8 files changed, 446 insertions(+), 12 deletions(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 391aadf9e7fa..1d8f87be283f 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -13,11 +13,13 @@ extern struct attribute_group cxl_base_attribute_group; extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_delete_region; extern struct device_attribute dev_attr_region; +extern const struct device_type cxl_pmem_region_type; extern const struct device_type cxl_region_type; void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled); #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr) #define CXL_REGION_TYPE(x) (&cxl_region_type) #define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr), +#define CXL_PMEM_REGION_TYPE(x) (&cxl_pmem_region_type) int cxl_region_init(void); void cxl_region_exit(void); #else @@ -34,6 +36,7 @@ static inline void cxl_region_exit(void) #define CXL_REGION_ATTR(x) NULL #define CXL_REGION_TYPE(x) NULL #define SET_CXL_REGION_ATTR(x) +#define CXL_PMEM_REGION_TYPE(x) NULL #endif struct cxl_send_command; diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index bec7cfb54ebf..1d12a8206444 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -62,9 +62,9 @@ static int match_nvdimm_bridge(struct device *dev, void *data) return is_cxl_nvdimm_bridge(dev); } -struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd) +struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *start) { - struct cxl_port *port = find_cxl_root(&cxl_nvd->dev); + struct cxl_port *port = find_cxl_root(start); struct device *dev; if (!port) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 194003525397..af65491d878b 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -44,6 +44,8 @@ static int cxl_device_id(struct device *dev) return CXL_DEVICE_NVDIMM_BRIDGE; if (dev->type == &cxl_nvdimm_type) return CXL_DEVICE_NVDIMM; + if (dev->type == CXL_PMEM_REGION_TYPE()) + return CXL_DEVICE_PMEM_REGION; if (is_cxl_port(dev)) { if (is_cxl_root(to_cxl_port(dev))) return CXL_DEVICE_ROOT; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 20871bdb6858..23d1a34077be 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1644,6 +1644,139 @@ static ssize_t delete_region_store(struct device *dev, } DEVICE_ATTR_WO(delete_region); +static void cxl_pmem_region_release(struct device *dev) +{ + struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev); + int i; + + for (i = 0; i < cxlr_pmem->nr_mappings; i++) { + struct cxl_memdev *cxlmd = cxlr_pmem->mapping[i].cxlmd; + + put_device(&cxlmd->dev); + } + + kfree(cxlr_pmem); +} + +static const struct attribute_group *cxl_pmem_region_attribute_groups[] = { + &cxl_base_attribute_group, + NULL, +}; + +const struct device_type cxl_pmem_region_type = { + .name = "cxl_pmem_region", + .release = cxl_pmem_region_release, + .groups = cxl_pmem_region_attribute_groups, +}; + +bool is_cxl_pmem_region(struct device *dev) +{ + return dev->type == &cxl_pmem_region_type; +} +EXPORT_SYMBOL_NS_GPL(is_cxl_pmem_region, CXL); + +struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev) +{ + if (dev_WARN_ONCE(dev, !is_cxl_pmem_region(dev), + "not a cxl_pmem_region device\n")) + return NULL; + return container_of(dev, struct cxl_pmem_region, dev); +} +EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, CXL); + +static struct lock_class_key cxl_pmem_region_key; + +static struct cxl_pmem_region *cxl_pmem_region_alloc(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + struct cxl_pmem_region *cxlr_pmem; + struct device *dev; + int i; + + down_read(&cxl_region_rwsem); + if (p->state != CXL_CONFIG_COMMIT) { + cxlr_pmem = ERR_PTR(-ENXIO); + goto out; + } + + cxlr_pmem = kzalloc(struct_size(cxlr_pmem, mapping, p->nr_targets), + GFP_KERNEL); + if (!cxlr_pmem) { + cxlr_pmem = ERR_PTR(-ENOMEM); + goto out; + } + + cxlr_pmem->hpa_range.start = p->res->start; + cxlr_pmem->hpa_range.end = p->res->end; + + /* Snapshot the region configuration underneath the cxl_region_rwsem */ + cxlr_pmem->nr_mappings = p->nr_targets; + for (i = 0; i < p->nr_targets; i++) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; + + m->cxlmd = cxlmd; + get_device(&cxlmd->dev); + m->start = cxled->dpa_res->start; + m->size = resource_size(cxled->dpa_res); + m->position = i; + } + + dev = &cxlr_pmem->dev; + cxlr_pmem->cxlr = cxlr; + device_initialize(dev); + lockdep_set_class(&dev->mutex, &cxl_pmem_region_key); + device_set_pm_not_required(dev); + dev->parent = &cxlr->dev; + dev->bus = &cxl_bus_type; + dev->type = &cxl_pmem_region_type; +out: + up_read(&cxl_region_rwsem); + + return cxlr_pmem; +} + +static void cxlr_pmem_unregister(void *dev) +{ + device_unregister(dev); +} + +/** + * devm_cxl_add_pmem_region() - add a cxl_region-to-nd_region bridge + * @cxlr: parent CXL region for this pmem region bridge device + * + * Return: 0 on success negative error code on failure. + */ +static int devm_cxl_add_pmem_region(struct cxl_region *cxlr) +{ + struct cxl_pmem_region *cxlr_pmem; + struct device *dev; + int rc; + + cxlr_pmem = cxl_pmem_region_alloc(cxlr); + if (IS_ERR(cxlr_pmem)) + return PTR_ERR(cxlr_pmem); + + dev = &cxlr_pmem->dev; + rc = dev_set_name(dev, "pmem_region%d", cxlr->id); + if (rc) + goto err; + + rc = device_add(dev); + if (rc) + goto err; + + dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent), + dev_name(dev)); + + return devm_add_action_or_reset(&cxlr->dev, cxlr_pmem_unregister, dev); + +err: + put_device(dev); + return rc; +} + static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); @@ -1667,7 +1800,14 @@ static int cxl_region_probe(struct device *dev) */ up_read(&cxl_region_rwsem); - return rc; + switch (cxlr->mode) { + case CXL_DECODER_PMEM: + return devm_cxl_add_pmem_region(cxlr); + default: + dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", + cxlr->mode); + return -ENXIO; + } } static struct cxl_driver cxl_region_driver = { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index a32093602df9..8a484acbb32d 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -419,6 +419,25 @@ struct cxl_nvdimm { struct device dev; struct cxl_memdev *cxlmd; struct cxl_nvdimm_bridge *bridge; + struct cxl_pmem_region *region; +}; + +struct cxl_pmem_region_mapping { + struct cxl_memdev *cxlmd; + struct cxl_nvdimm *cxl_nvd; + u64 start; + u64 size; + int position; +}; + +struct cxl_pmem_region { + struct device dev; + struct cxl_region *cxlr; + struct nd_region *nd_region; + struct cxl_nvdimm_bridge *bridge; + struct range hpa_range; + int nr_mappings; + struct cxl_pmem_region_mapping mapping[]; }; /** @@ -594,6 +613,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv); #define CXL_DEVICE_ROOT 4 #define CXL_DEVICE_MEMORY_EXPANDER 5 #define CXL_DEVICE_REGION 6 +#define CXL_DEVICE_PMEM_REGION 7 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*") #define CXL_MODALIAS_FMT "cxl:t%d" @@ -605,7 +625,21 @@ struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev); bool is_cxl_nvdimm(struct device *dev); bool is_cxl_nvdimm_bridge(struct device *dev); int devm_cxl_add_nvdimm(struct device *host, struct cxl_memdev *cxlmd); -struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_nvdimm *cxl_nvd); +struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *dev); + +#ifdef CONFIG_CXL_REGION +bool is_cxl_pmem_region(struct device *dev); +struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev); +#else +static inline bool is_cxl_pmem_region(struct device *dev) +{ + return false; +} +static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev) +{ + return NULL; +} +#endif /* * Unit test builds overrides this to __weak, find the 'strong' version diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index b271f6e90b91..e69f99a0747d 100644 --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "cxlmem.h" #include "cxl.h" @@ -27,6 +28,19 @@ static void clear_exclusive(void *cxlds) static void unregister_nvdimm(void *nvdimm) { struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm); + struct cxl_nvdimm_bridge *cxl_nvb = cxl_nvd->bridge; + struct cxl_pmem_region *cxlr_pmem; + + device_lock(&cxl_nvb->dev); + cxlr_pmem = cxl_nvd->region; + dev_set_drvdata(&cxl_nvd->dev, NULL); + cxl_nvd->region = NULL; + device_unlock(&cxl_nvb->dev); + + if (cxlr_pmem) { + device_release_driver(&cxlr_pmem->dev); + put_device(&cxlr_pmem->dev); + } nvdimm_delete(nvdimm); cxl_nvd->bridge = NULL; @@ -42,7 +56,7 @@ static int cxl_nvdimm_probe(struct device *dev) struct nvdimm *nvdimm; int rc; - cxl_nvb = cxl_find_nvdimm_bridge(cxl_nvd); + cxl_nvb = cxl_find_nvdimm_bridge(dev); if (!cxl_nvb) return -ENXIO; @@ -223,6 +237,21 @@ static int cxl_nvdimm_release_driver(struct device *dev, void *cxl_nvb) return 0; } +static int cxl_pmem_region_release_driver(struct device *dev, void *cxl_nvb) +{ + struct cxl_pmem_region *cxlr_pmem; + + if (!is_cxl_pmem_region(dev)) + return 0; + + cxlr_pmem = to_cxl_pmem_region(dev); + if (cxlr_pmem->bridge != cxl_nvb) + return 0; + + device_release_driver(dev); + return 0; +} + static void offline_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb, struct nvdimm_bus *nvdimm_bus) { @@ -234,6 +263,8 @@ static void offline_nvdimm_bus(struct cxl_nvdimm_bridge *cxl_nvb, * nvdimm_bus_unregister() rips the nvdimm objects out from * underneath them. */ + bus_for_each_dev(&cxl_bus_type, NULL, cxl_nvb, + cxl_pmem_region_release_driver); bus_for_each_dev(&cxl_bus_type, NULL, cxl_nvb, cxl_nvdimm_release_driver); nvdimm_bus_unregister(nvdimm_bus); @@ -328,6 +359,203 @@ static struct cxl_driver cxl_nvdimm_bridge_driver = { .id = CXL_DEVICE_NVDIMM_BRIDGE, }; +static int match_cxl_nvdimm(struct device *dev, void *data) +{ + return is_cxl_nvdimm(dev); +} + +static void unregister_nvdimm_region(void *nd_region) +{ + struct cxl_nvdimm_bridge *cxl_nvb; + struct cxl_pmem_region *cxlr_pmem; + int i; + + cxlr_pmem = nd_region_provider_data(nd_region); + cxl_nvb = cxlr_pmem->bridge; + device_lock(&cxl_nvb->dev); + for (i = 0; i < cxlr_pmem->nr_mappings; i++) { + struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; + struct cxl_nvdimm *cxl_nvd = m->cxl_nvd; + + if (cxl_nvd->region) { + put_device(&cxlr_pmem->dev); + cxl_nvd->region = NULL; + } + } + device_unlock(&cxl_nvb->dev); + + nvdimm_region_delete(nd_region); +} + +static void cxlr_pmem_remove_resource(void *res) +{ + remove_resource(res); +} + +struct cxl_pmem_region_info { + u64 offset; + u64 serial; +}; + +static int cxl_pmem_region_probe(struct device *dev) +{ + struct nd_mapping_desc mappings[CXL_DECODER_MAX_INTERLEAVE]; + struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev); + struct cxl_region *cxlr = cxlr_pmem->cxlr; + struct cxl_pmem_region_info *info = NULL; + struct cxl_nvdimm_bridge *cxl_nvb; + struct nd_interleave_set *nd_set; + struct nd_region_desc ndr_desc; + struct cxl_nvdimm *cxl_nvd; + struct nvdimm *nvdimm; + struct resource *res; + int rc, i = 0; + + cxl_nvb = cxl_find_nvdimm_bridge(&cxlr_pmem->mapping[0].cxlmd->dev); + if (!cxl_nvb) { + dev_dbg(dev, "bridge not found\n"); + return -ENXIO; + } + cxlr_pmem->bridge = cxl_nvb; + + device_lock(&cxl_nvb->dev); + if (!cxl_nvb->nvdimm_bus) { + dev_dbg(dev, "nvdimm bus not found\n"); + rc = -ENXIO; + goto err; + } + + memset(&mappings, 0, sizeof(mappings)); + memset(&ndr_desc, 0, sizeof(ndr_desc)); + + res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL); + if (!res) { + rc = -ENOMEM; + goto err; + } + + res->name = "Persistent Memory"; + res->start = cxlr_pmem->hpa_range.start; + res->end = cxlr_pmem->hpa_range.end; + res->flags = IORESOURCE_MEM; + res->desc = IORES_DESC_PERSISTENT_MEMORY; + + rc = insert_resource(&iomem_resource, res); + if (rc) + goto err; + + rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res); + if (rc) + goto err; + + ndr_desc.res = res; + ndr_desc.provider_data = cxlr_pmem; + + ndr_desc.numa_node = memory_add_physaddr_to_nid(res->start); + ndr_desc.target_node = phys_to_target_node(res->start); + if (ndr_desc.target_node == NUMA_NO_NODE) { + ndr_desc.target_node = ndr_desc.numa_node; + dev_dbg(&cxlr->dev, "changing target node from %d to %d", + NUMA_NO_NODE, ndr_desc.target_node); + } + + nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL); + if (!nd_set) { + rc = -ENOMEM; + goto err; + } + + ndr_desc.memregion = cxlr->id; + set_bit(ND_REGION_CXL, &ndr_desc.flags); + set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags); + + info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL); + if (!info) { + rc = -ENOMEM; + goto err; + } + + for (i = 0; i < cxlr_pmem->nr_mappings; i++) { + struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; + struct cxl_memdev *cxlmd = m->cxlmd; + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct device *d; + + d = device_find_child(&cxlmd->dev, NULL, match_cxl_nvdimm); + if (!d) { + dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i, + dev_name(&cxlmd->dev)); + rc = -ENODEV; + goto err; + } + + /* safe to drop ref now with bridge lock held */ + put_device(d); + + cxl_nvd = to_cxl_nvdimm(d); + nvdimm = dev_get_drvdata(&cxl_nvd->dev); + if (!nvdimm) { + dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i, + dev_name(&cxlmd->dev)); + rc = -ENODEV; + goto err; + } + cxl_nvd->region = cxlr_pmem; + get_device(&cxlr_pmem->dev); + m->cxl_nvd = cxl_nvd; + mappings[i] = (struct nd_mapping_desc) { + .nvdimm = nvdimm, + .start = m->start, + .size = m->size, + .position = i, + }; + info[i].offset = m->start; + info[i].serial = cxlds->serial; + } + ndr_desc.num_mappings = cxlr_pmem->nr_mappings; + ndr_desc.mapping = mappings; + + /* + * TODO enable CXL labels which skip the need for 'interleave-set cookie' + */ + nd_set->cookie1 = + nd_fletcher64(info, sizeof(*info) * cxlr_pmem->nr_mappings, 0); + nd_set->cookie2 = nd_set->cookie1; + ndr_desc.nd_set = nd_set; + + cxlr_pmem->nd_region = + nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc); + if (IS_ERR(cxlr_pmem->nd_region)) { + rc = PTR_ERR(cxlr_pmem->nd_region); + goto err; + } + + rc = devm_add_action_or_reset(dev, unregister_nvdimm_region, + cxlr_pmem->nd_region); +out: + kfree(info); + device_unlock(&cxl_nvb->dev); + put_device(&cxl_nvb->dev); + + return rc; + +err: + dev_dbg(dev, "failed to create nvdimm region\n"); + for (i--; i >= 0; i--) { + nvdimm = mappings[i].nvdimm; + cxl_nvd = nvdimm_provider_data(nvdimm); + put_device(&cxl_nvd->region->dev); + cxl_nvd->region = NULL; + } + goto out; +} + +static struct cxl_driver cxl_pmem_region_driver = { + .name = "cxl_pmem_region", + .probe = cxl_pmem_region_probe, + .id = CXL_DEVICE_PMEM_REGION, +}; + /* * Return all bridges to the CXL_NVB_NEW state to invalidate any * ->state_work referring to the now destroyed cxl_pmem_wq. @@ -372,8 +600,14 @@ static __init int cxl_pmem_init(void) if (rc) goto err_nvdimm; + rc = cxl_driver_register(&cxl_pmem_region_driver); + if (rc) + goto err_region; + return 0; +err_region: + cxl_driver_unregister(&cxl_nvdimm_driver); err_nvdimm: cxl_driver_unregister(&cxl_nvdimm_bridge_driver); err_bridge: @@ -383,6 +617,7 @@ static __init int cxl_pmem_init(void) static __exit void cxl_pmem_exit(void) { + cxl_driver_unregister(&cxl_pmem_region_driver); cxl_driver_unregister(&cxl_nvdimm_driver); cxl_driver_unregister(&cxl_nvdimm_bridge_driver); destroy_cxl_pmem_wq(); @@ -394,3 +629,4 @@ module_exit(cxl_pmem_exit); MODULE_IMPORT_NS(CXL); MODULE_ALIAS_CXL(CXL_DEVICE_NVDIMM_BRIDGE); MODULE_ALIAS_CXL(CXL_DEVICE_NVDIMM); +MODULE_ALIAS_CXL(CXL_DEVICE_PMEM_REGION); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index d976260eca7a..473a71bbd9c9 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -133,7 +133,8 @@ static void nd_region_release(struct device *dev) put_device(&nvdimm->dev); } free_percpu(nd_region->lane); - memregion_free(nd_region->id); + if (!test_bit(ND_REGION_CXL, &nd_region->flags)) + memregion_free(nd_region->id); kfree(nd_region); } @@ -982,9 +983,14 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, if (!nd_region) return NULL; - nd_region->id = memregion_alloc(GFP_KERNEL); - if (nd_region->id < 0) - goto err_id; + /* CXL pre-assigns memregion ids before creating nvdimm regions */ + if (test_bit(ND_REGION_CXL, &ndr_desc->flags)) { + nd_region->id = ndr_desc->memregion; + } else { + nd_region->id = memregion_alloc(GFP_KERNEL); + if (nd_region->id < 0) + goto err_id; + } nd_region->lane = alloc_percpu(struct nd_percpu_lane); if (!nd_region->lane) @@ -1043,9 +1049,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, return nd_region; - err_percpu: - memregion_free(nd_region->id); - err_id: +err_percpu: + if (!test_bit(ND_REGION_CXL, &ndr_desc->flags)) + memregion_free(nd_region->id); +err_id: kfree(nd_region); return NULL; } @@ -1068,6 +1075,13 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus, } EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create); +void nvdimm_region_delete(struct nd_region *nd_region) +{ + if (nd_region) + nd_device_unregister(&nd_region->dev, ND_SYNC); +} +EXPORT_SYMBOL_GPL(nvdimm_region_delete); + int nvdimm_flush(struct nd_region *nd_region, struct bio *bio) { int rc = 0; diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index 0d61e07b6827..c74acfa1a3fe 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -59,6 +59,9 @@ enum { /* Platform provides asynchronous flush mechanism */ ND_REGION_ASYNC = 3, + /* Region was created by CXL subsystem */ + ND_REGION_CXL = 4, + /* mark newly adjusted resources as requiring a label update */ DPA_RESOURCE_ADJUSTED = 1 << 0, }; @@ -122,6 +125,7 @@ struct nd_region_desc { int numa_node; int target_node; unsigned long flags; + int memregion; struct device_node *of_node; int (*flush)(struct nd_region *nd_region, struct bio *bio); }; @@ -259,6 +263,7 @@ static inline struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, cmd_mask, num_flush, flush_wpq, NULL, NULL, NULL); } void nvdimm_delete(struct nvdimm *nvdimm); +void nvdimm_region_delete(struct nd_region *nd_region); const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd); const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd);