diff mbox series

[RFC] cxl/region: Allow out of order assembly of autodiscovered regions

Message ID 20240113050421.1622533-1-alison.schofield@intel.com
State Superseded
Headers show
Series [RFC] cxl/region: Allow out of order assembly of autodiscovered regions | expand

Commit Message

Alison Schofield Jan. 13, 2024, 5:04 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

Wonjae Lee,

Here is the RFC Patch I mentioned in this thread:
https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/

This passes the cxl-test suite, so hoping no regression, but it needs
to be tested with the required config: 2 memdevs connected to the same
port, each memdev belongs to a different auto-region. Repeated attempts
should hit the test case and emit this debug message upon success:
	dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");

Failure will be the HPA order violation message.

All reviewers & testers welcomed.

Begin commit log:
Autodiscovered regions can fail to assemble if they are not discovered
in HPA decode order. The user will see failure messages like:

[] cxl region0: endpoint5: HPA order violation region1
[] cxl region0: endpoint5: failed to allocate region reference

The check that is causing the failure helps the CXL driver enforce
a CXL spec mandate that decoders be committed in HPA order. The
check is needless for autodiscovered regions since their decoders
are already programmed. Trying to enforce order in the assembly of
these regions is useless because they are assembled once all their
member endpoints arrive, and there is no guarantee on the order in
which endpoints are discovered during probe.

Keep the existing check, but for autodiscovered regions, allow the
out of order assembly after a sanity check that the lowered numbered
decoder has the lower HPA starting address.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/region.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)


base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3

Comments

Dan Williams Jan. 18, 2024, 7:46 p.m. UTC | #1
A small quibble with the Subject: line, this looks like an "RFT"
(request-for-test) not an "RFC", because the code looks likely correct.
Given that testers sometimes disappear it would be nice to have a unit
test reproducer for this and not require a re-test on hardware.

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Wonjae Lee,
> 
> Here is the RFC Patch I mentioned in this thread:
> https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/
> 
> This passes the cxl-test suite, so hoping no regression, but it needs
> to be tested with the required config: 2 memdevs connected to the same
> port, each memdev belongs to a different auto-region. Repeated attempts
> should hit the test case and emit this debug message upon success:
> 	dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");

So one of the largest roadblocks for creating arbitrary region assembly
scenarios with cxl_test is the inability to save and restore decoder
settings.

The patch below adds support for recording decoder settings and skipping
the reset of those values when unloading the driver. Then, when the
driver is reloaded, it simulates the case of BIOS created CXL regions
prior to OS boot.

We can go after even finer grained cases once the uunit effort settles,
but in the meantime cxl_test can add auto-assembly regression tests with
the current topology.

With the below you can simply trigger "cxl {en,dis}able-memdev" in the
proper order to cause the violation.

-- >8 --
From 5839392a3dbb64dbbac0630d930b1f48a8ef7ea6 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Wed, 17 Jan 2024 20:56:20 -0800
Subject: [PATCH] tools/testing/cxl: Add decoder save/restore support

Record decoder values at init and mock_decoder_commit() time, and
restore them at the next invocation of mock_init_hdm_decoder(). Add 2
attributes to the cxl_test "cxl_acpi" device to optionally flush the
cache of topology decoder values, or disable updating the decoder at
mock_decoder_reset() time.

This enables replaying a saved decoder configuration when re-triggering
a topology scan by re-binding the cxl_acpi driver to "cxl_acpi.0" (the
cxl_test emulation of an ACPI0017 instance).

    # modprobe cxl_test
    # cxl list -RB -b cxl_test -u
    {
      "bus":"root3",
      "provider":"cxl_test",
      "regions:root3":[
        {
          "region":"region5",
          "resource":"0xf010000000",
          "size":"512.00 MiB (536.87 MB)",
          "type":"ram",
          "interleave_ways":2,
          "interleave_granularity":4096,
          "decode_state":"commit"
        }
      ]
    }
    # echo 1 > /sys/bus/platform/devices/cxl_acpi.0/decoder_registry_reset_disable
    # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/unbind
    # cxl list -RB -b cxl_test -u
    # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/bind
    # cxl list -RB -b cxl_test -u
    {
      "bus":"root3",
      "provider":"cxl_test",
      "regions:root3":[
        {
          "region":"region5",
          "resource":"0xf010000000",
          "size":"512.00 MiB (536.87 MB)",
          "type":"ram",
          "interleave_ways":2,
          "interleave_granularity":4096,
          "decode_state":"commit"
        }
      ]
    }

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/cxl/test/cxl.c | 268 +++++++++++++++++++++++++++++++++++
 1 file changed, 268 insertions(+)

diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index f4e517a0c774..3f333d6a57be 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -44,6 +44,9 @@ struct platform_device *cxl_mem_single[NR_MEM_SINGLE];
 static struct platform_device *cxl_rch[NR_CXL_RCH];
 static struct platform_device *cxl_rcd[NR_CXL_RCH];
 
+static DEFINE_XARRAY(decoder_registry);
+static bool decoder_registry_reset_disable;
+
 static inline bool is_multi_bridge(struct device *dev)
 {
 	int i;
@@ -660,6 +663,153 @@ static int map_targets(struct device *dev, void *data)
 	return 0;
 }
 
+static unsigned long cxld_registry_index(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+
+	/*
+	 * Upper nibble of a kernel pointer is 0xff, chop that to make
+	 * space for a cxl_decoder id which should be less than 128
+	 * given decoder count is a 4-bit field.
+	 *
+	 * While @port is reallocated each enumeration, @port->uport_dev
+	 * is stable.
+	 */
+	dev_WARN_ONCE(&port->dev, cxld->id >= 128,
+		      "decoder id:%d out of range\n", cxld->id);
+	return (((unsigned long) port->uport_dev) << 4) | cxld->id;
+}
+
+struct cxl_test_decoder {
+	union {
+		struct cxl_switch_decoder cxlsd;
+		struct cxl_endpoint_decoder cxled;
+	};
+	union {
+		struct cxl_dport *targets[CXL_DECODER_MAX_INTERLEAVE];
+		struct range dpa_range;
+	};
+};
+
+static struct cxl_test_decoder *cxld_registry_find(struct cxl_decoder *cxld)
+{
+	return xa_load(&decoder_registry, cxld_registry_index(cxld));
+}
+
+#define dbg_cxld(port, msg, cxld)                                                       \
+	do {                                                                            \
+		struct cxl_decoder *___d = (cxld);                                      \
+		dev_dbg((port)->uport_dev,                                              \
+			"decoder%d: %s range: %#llx-%#llx iw: %d ig: %d flags: %#lx\n", \
+			___d->id, msg, ___d->hpa_range.start,                           \
+			___d->hpa_range.end + 1, ___d->interleave_ways,                 \
+			___d->interleave_granularity, ___d->flags);                     \
+	} while (0)
+
+static int mock_decoder_commit(struct cxl_decoder *cxld);
+static int mock_decoder_reset(struct cxl_decoder *cxld);
+
+static void cxld_copy(struct cxl_decoder *a, struct cxl_decoder *b)
+{
+	a->id = b->id;
+	a->hpa_range = b->hpa_range;
+	a->interleave_ways = b->interleave_ways;
+	a->interleave_granularity = b->interleave_granularity;
+	a->target_type = b->target_type;
+	a->flags = b->flags;
+	a->commit = mock_decoder_commit;
+	a->reset = mock_decoder_reset;
+}
+
+static void cxld_registry_restore(struct cxl_decoder *cxld, struct cxl_test_decoder *td)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+
+	if (is_switch_decoder(&cxld->dev)) {
+		struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(&cxld->dev);
+
+		dbg_cxld(port, "restore", &td->cxlsd.cxld);
+		cxld_copy(cxld, &td->cxlsd.cxld);
+		WARN_ON(cxlsd->nr_targets != td->cxlsd.nr_targets);
+
+		/* convert saved dport devs to dports */
+		for (int i = 0; i < cxlsd->nr_targets; i++) {
+			struct cxl_dport *dport;
+
+			if (!td->cxlsd.target[i])
+				continue;
+			dport = cxl_find_dport_by_dev(
+				port, (struct device *)td->cxlsd.target[i]);
+			WARN_ON(!dport);
+			cxlsd->target[i] = dport;
+		}
+	} else {
+		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
+		dbg_cxld(port, "restore", &td->cxled.cxld);
+		cxld_copy(cxld, &td->cxled.cxld);
+		cxled->state = td->cxled.state;
+		cxled->skip = td->cxled.skip;
+		if (range_len(&td->dpa_range))
+			devm_cxl_dpa_reserve(cxled, td->dpa_range.start,
+					     range_len(&td->dpa_range),
+					     td->cxled.skip);
+		if (cxld->flags & CXL_DECODER_F_ENABLE)
+			port->commit_end = cxld->id;
+	}
+}
+
+static void __cxld_registry_save(struct cxl_test_decoder *td,
+				 struct cxl_decoder *cxld)
+{
+	if (is_switch_decoder(&cxld->dev)) {
+		struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(&cxld->dev);
+
+		cxld_copy(&td->cxlsd.cxld, cxld);
+		td->cxlsd.nr_targets = cxlsd->nr_targets;
+
+		/* save dport devs as a stable placeholder for dports */
+		for (int i = 0; i < cxlsd->nr_targets; i++) {
+			if (!cxlsd->target[i])
+				continue;
+			td->cxlsd.target[i] =
+				(struct cxl_dport *)cxlsd->target[i]->dport_dev;
+		}
+	} else {
+		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
+
+		cxld_copy(&td->cxled.cxld, cxld);
+		td->cxled.state = cxled->state;
+		td->cxled.skip = cxled->skip;
+		if (cxled->dpa_res) {
+			td->dpa_range.start = cxled->dpa_res->start;
+			td->dpa_range.end = cxled->dpa_res->end;
+		} else {
+			td->dpa_range.start = 0;
+			td->dpa_range.end = -1;
+		}
+	}
+}
+
+static void cxld_registry_save(struct cxl_test_decoder *td, struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+
+	dbg_cxld(port, "save", cxld);
+	__cxld_registry_save(td, cxld);
+}
+
+static void cxld_registry_update(struct cxl_decoder *cxld)
+{
+	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+	struct cxl_test_decoder *td = cxld_registry_find(cxld);
+
+	dev_WARN_ONCE(port->uport_dev, !td, "%s failed\n", __func__);
+
+	dbg_cxld(port, "update", cxld);
+	__cxld_registry_save(td, cxld);
+}
+
 static int mock_decoder_commit(struct cxl_decoder *cxld)
 {
 	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
@@ -679,6 +829,7 @@ static int mock_decoder_commit(struct cxl_decoder *cxld)
 
 	port->commit_end++;
 	cxld->flags |= CXL_DECODER_F_ENABLE;
+	cxld_registry_update(cxld);
 
 	return 0;
 }
@@ -701,10 +852,43 @@ static int mock_decoder_reset(struct cxl_decoder *cxld)
 
 	port->commit_end--;
 	cxld->flags &= ~CXL_DECODER_F_ENABLE;
+	if (decoder_registry_reset_disable)
+		dev_dbg(port->uport_dev, "decoder%d: skip registry update\n",
+			cxld->id);
+	else
+		cxld_registry_update(cxld);
 
 	return 0;
 }
 
+static void cxld_registry_invalidate(void)
+{
+	unsigned long index;
+	void *entry;
+
+	xa_for_each(&decoder_registry, index, entry) {
+		xa_erase(&decoder_registry, index);
+		kfree(entry);
+	}
+}
+
+static struct cxl_test_decoder *cxld_registry_new(struct cxl_decoder *cxld)
+{
+	struct cxl_test_decoder *td __free(kfree) = kzalloc(sizeof(*td), GFP_KERNEL);
+
+	if (!td)
+		return NULL;
+
+	if (xa_insert(&decoder_registry, cxld_registry_index(cxld), td,
+		      GFP_KERNEL)) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	cxld_registry_save(td, cxld);
+	return no_free_ptr(td);
+}
+
 static void default_mock_decoder(struct cxl_decoder *cxld)
 {
 	cxld->hpa_range = (struct range){
@@ -717,6 +901,9 @@ static void default_mock_decoder(struct cxl_decoder *cxld)
 	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
 	cxld->commit = mock_decoder_commit;
 	cxld->reset = mock_decoder_reset;
+
+	if (!cxld_registry_new(cxld))
+		dev_dbg(&cxld->dev, "failed to add to registry\n");
 }
 
 static int first_decoder(struct device *dev, void *data)
@@ -738,6 +925,7 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
 	struct cxl_endpoint_decoder *cxled;
 	struct cxl_switch_decoder *cxlsd;
 	struct cxl_port *port, *iter;
+	struct cxl_test_decoder *td;
 	const int size = SZ_512M;
 	struct cxl_memdev *cxlmd;
 	struct cxl_dport *dport;
@@ -767,6 +955,12 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
 		port = cxled_to_port(cxled);
 	}
 
+	td = cxld_registry_find(cxld);
+	if (td) {
+		cxld_registry_restore(cxld, td);
+		return;
+	}
+
 	/*
 	 * The first decoder on the first 2 devices on the first switch
 	 * attached to host-bridge0 mock a fake / static RAM region. All
@@ -795,6 +989,8 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
 	devm_cxl_dpa_reserve(cxled, 0, size / cxld->interleave_ways, 0);
 	cxld->commit = mock_decoder_commit;
 	cxld->reset = mock_decoder_reset;
+	if (!cxld_registry_new(cxld))
+		dev_dbg(&cxld->dev, "failed to add to registry\n");
 
 	/*
 	 * Now that endpoint decoder is set up, walk up the hierarchy
@@ -837,6 +1033,7 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
 			.start = base,
 			.end = base + size - 1,
 		};
+		cxld_registry_update(cxld);
 		put_device(dev);
 	}
 }
@@ -1233,6 +1430,73 @@ static void cxl_single_exit(void)
 	}
 }
 
+static ssize_t decoder_registry_invalidate_show(struct device *dev,
+						struct device_attribute *attr,
+						char *buf)
+{
+	unsigned long index;
+	bool empty = true;
+	void *entry;
+
+	xa_for_each(&decoder_registry, index, entry) {
+		empty = false;
+		break;
+	}
+
+	return sysfs_emit(buf, "%d\n", !empty);
+}
+
+static ssize_t decoder_registry_invalidate_store(struct device *dev,
+						 struct device_attribute *attr,
+						 const char *buf, size_t count)
+{
+	bool invalidate;
+	int rc;
+
+	rc = kstrtobool(buf, &invalidate);
+	if (rc)
+		return rc;
+
+	guard(device)(dev);
+
+	if (dev->driver)
+		return -EBUSY;
+
+	cxld_registry_invalidate();
+	return count;
+}
+
+static DEVICE_ATTR_RW(decoder_registry_invalidate);
+
+static ssize_t
+decoder_registry_reset_disable_show(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%d\n", decoder_registry_reset_disable);
+}
+
+static ssize_t
+decoder_registry_reset_disable_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t count)
+{
+	int rc;
+
+	rc = kstrtobool(buf, &decoder_registry_reset_disable);
+	if (rc)
+		return rc;
+	return count;
+}
+
+static DEVICE_ATTR_RW(decoder_registry_reset_disable);
+
+static struct attribute *cxl_acpi_attrs[] = {
+	&dev_attr_decoder_registry_invalidate.attr,
+	&dev_attr_decoder_registry_reset_disable.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(cxl_acpi);
+
 static __init int cxl_test_init(void)
 {
 	int rc, i;
@@ -1377,6 +1641,7 @@ static __init int cxl_test_init(void)
 
 	mock_companion(&acpi0017_mock, &cxl_acpi->dev);
 	acpi0017_mock.dev.bus = &platform_bus_type;
+	cxl_acpi->dev.groups = cxl_acpi_groups;
 
 	rc = platform_device_add(cxl_acpi);
 	if (rc)
@@ -1446,6 +1711,9 @@ static __exit void cxl_test_exit(void)
 	depopulate_all_mock_resources();
 	gen_pool_destroy(cxl_mock_pool);
 	unregister_cxl_mock_ops(&cxl_mock_ops);
+
+	cxld_registry_invalidate();
+	xa_destroy(&decoder_registry);
 }
 
 module_param(interleave_arithmetic, int, 0444);
Wonjae Lee Jan. 26, 2024, 8:51 a.m. UTC | #2
Thank you for your quick response and sending the patch!

I'm very sorry for the delay in replying to your mail. It took some time to
reproduce test and check the issue.

On Fri, Jan 12, 2024 at 09:04:21PM -0800, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Wonjae Lee,
>
> Here is the RFC Patch I mentioned in this thread:
> https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/
>
> This passes the cxl-test suite, so hoping no regression, but it needs
> to be tested with the required config: 2 memdevs connected to the same
> port, each memdev belongs to a different auto-region. Repeated attempts
> should hit the test case and emit this debug message upon success:
>  dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");
>
> Failure will be the HPA order violation message.
>
> All reviewers & testers welcomed.

After applying your patch to base-commit, I ran the same test I mentioned in
the previous thread, and during the boot process, a null pointer dereference
occurs as shown below:

thread: https://lore.kernel.org/linux-cxl/ZaFn0dfaJ8iOi1UW@aschofie-mobl2/T/#t
base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
dmesg log:
[] cxl region0: region sort successful
[] cxl region0: mem1:endpoint6 decoder6.0 add: mem1:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
[] cxl region0: pci0000:3d:port1 decoder1.0 add: mem1:decoder6.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
[] BUG: kernel NULL pointer dereference, address: 00000000000002e8
...
[] Call Trace:
...
[]  ? auto_order_ok+0x5f/0xb0 [cxl_core]
[]  cxl_region_attach_position+0x379/0x860 [cxl_core]
[]  cxl_region_attach+0x466/0x8b0 [cxl_core]
 
 
To be specific, cxld_a in auto_order_ok() is NULL:
    cxld_a = cxl_region_find_decoder(port, cxlr_a); /* cxld_a == NULL */
 
I've noticed that if I pass cxled as a function argument appropriately, and get
cxld_a as shown below, the region initialization succeeds with the log "cxl
region0: allow out of order region ref alloc" as you mention.
 
@@ -756,8 +756,8 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
        return to_cxl_decoder(dev);
 }
-bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
-                  struct cxl_region *cxlr_b)
+bool auto_order_ok(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
+                  struct cxl_region *cxlr_a, struct cxl_region *cxlr_b)
 {
        struct cxl_region_ref *cxl_rr;
        struct cxl_decoder *cxld_a, *cxld_b;
@@ -771,7 +771,10 @@ bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
        if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
                return false;

-       cxld_a = cxl_region_find_decoder(port, cxlr_a);
+       if (port == cxled_to_port(cxled))
+               cxld_a = &cxled->cxld;
+       else
+               cxld_a = cxl_region_find_decoder(port, cxlr_a);

I hope this helps.

Thanks,
Wonjae

> Begin commit log:
> Autodiscovered regions can fail to assemble if they are not discovered
> in HPA decode order. The user will see failure messages like:
>
> [] cxl region0: endpoint5: HPA order violation region1
> [] cxl region0: endpoint5: failed to allocate region reference
>
> The check that is causing the failure helps the CXL driver enforce
> a CXL spec mandate that decoders be committed in HPA order. The
> check is needless for autodiscovered regions since their decoders
> are already programmed. Trying to enforce order in the assembly of
> these regions is useless because they are assembled once all their
> member endpoints arrive, and there is no guarantee on the order in
> which endpoints are discovered during probe.
>
> Keep the existing check, but for autodiscovered regions, allow the
> out of order assembly after a sanity check that the lowered numbered
> decoder has the lower HPA starting address.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/region.c 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 0f05692bfec3..8770ebcae05d 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -753,6 +753,37 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
>  return to_cxl_decoder(dev);
>  }
>
> +bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> +        struct cxl_region *cxlr_b)
> +{
> + struct cxl_region_ref *cxl_rr;
> + struct cxl_decoder *cxld_a, *cxld_b;
> +
> + /*
> +  * Allow the out of order assembly of auto-discovered regions as
> +  * long as correct decoder programming order can be verified.
> +  *
> +  * Per CXL Spec 3.1 8.2.4.20.12 Committing Decoder Programming,
> +  * software must commit decoders in HPA order. Therefore it is
> +  * sufficient to sanity check that the lowered number decoder
> +  * has the lower HPA starting address.
> +  */
> + if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> +    return false;
> +
> + cxld_a = cxl_region_find_decoder(port, cxlr_a);
> + cxl_rr = cxl_rr_load(port, cxlr_b);
> + cxld_b = cxl_rr->decoder;
> +
> + if (cxld_b->id > cxld_a->id) {
> +    dev_dbg(&cxlr_a->dev,
> +        "allow out of order region ref alloc\n");
> +    return true;
> + }
> +
> + return false;
> +}
> +
>  static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
>                          struct cxl_region *cxlr)
>  {
> @@ -767,7 +798,8 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
>      if (!ip->res)
>          continue;
>
> -    if (ip->res->start > p->res->start) {
> +    if (ip->res->start > p->res->start &&
> +        (!auto_order_ok(port, cxlr, iter->region))) {
>          dev_dbg(&cxlr->dev,
>              "%s: HPA order violation %s:%pr vs %pr\n",
>              dev_name(&port->dev),
>
> base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> --
> 2.37.3
>
Alison Schofield Jan. 30, 2024, 4:37 a.m. UTC | #3
On Fri, Jan 26, 2024 at 05:51:08PM +0900, Wonjae Lee wrote:
> Thank you for your quick response and sending the patch!
> 
> I'm very sorry for the delay in replying to your mail. It took some time to
> reproduce test and check the issue.
> 
> On Fri, Jan 12, 2024 at 09:04:21PM -0800, alison.schofield@intel.com wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > Wonjae Lee,
> >
> > Here is the RFC Patch I mentioned in this thread:
> > https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/
> >
> > This passes the cxl-test suite, so hoping no regression, but it needs
> > to be tested with the required config: 2 memdevs connected to the same
> > port, each memdev belongs to a different auto-region. Repeated attempts
> > should hit the test case and emit this debug message upon success:
> >  dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");
> >
> > Failure will be the HPA order violation message.
> >
> > All reviewers & testers welcomed.
> 
> After applying your patch to base-commit, I ran the same test I mentioned in
> the previous thread, and during the boot process, a null pointer dereference
> occurs as shown below:
> 
> thread: https://lore.kernel.org/linux-cxl/ZaFn0dfaJ8iOi1UW@aschofie-mobl2/T/#t
> base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> dmesg log:
> [] cxl region0: region sort successful
> [] cxl region0: mem1:endpoint6 decoder6.0 add: mem1:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [] cxl region0: pci0000:3d:port1 decoder1.0 add: mem1:decoder6.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
> [] BUG: kernel NULL pointer dereference, address: 00000000000002e8
> ...
> [] Call Trace:
> ...
> []  ? auto_order_ok+0x5f/0xb0 [cxl_core]
> []  cxl_region_attach_position+0x379/0x860 [cxl_core]
> []  cxl_region_attach+0x466/0x8b0 [cxl_core]
>  
>  
> To be specific, cxld_a in auto_order_ok() is NULL:
>     cxld_a = cxl_region_find_decoder(port, cxlr_a); /* cxld_a == NULL */
>  
> I've noticed that if I pass cxled as a function argument appropriately, and get
> cxld_a as shown below, the region initialization succeeds with the log "cxl
> region0: allow out of order region ref alloc" as you mention.
>  
> @@ -756,8 +756,8 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
>         return to_cxl_decoder(dev);
>  }
> -bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> -                  struct cxl_region *cxlr_b)
> +bool auto_order_ok(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
> +                  struct cxl_region *cxlr_a, struct cxl_region *cxlr_b)
>  {
>         struct cxl_region_ref *cxl_rr;
>         struct cxl_decoder *cxld_a, *cxld_b;
> @@ -771,7 +771,10 @@ bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
>         if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
>                 return false;
> 
> -       cxld_a = cxl_region_find_decoder(port, cxlr_a);
> +       if (port == cxled_to_port(cxled))
> +               cxld_a = &cxled->cxld;
> +       else
> +               cxld_a = cxl_region_find_decoder(port, cxlr_a);
> 
> I hope this helps.

Yes, this is very helpful! Glad to have you trying it out.

I think at this point we can call alloc_region_ref() with a known cxled
of this port and simply pass it on. I'm going to skip this if...else pattern,
and simply get cxld_a = &cxled->cxld;

I think there may be opportunity for refinements in cxl_region_find_decoder()
and it's descendants, but it's not directly tied to this patch.

Thanks,
Alison

> 
> Thanks,
> Wonjae
> 
> > Begin commit log:
> > Autodiscovered regions can fail to assemble if they are not discovered
> > in HPA decode order. The user will see failure messages like:
> >
> > [] cxl region0: endpoint5: HPA order violation region1
> > [] cxl region0: endpoint5: failed to allocate region reference
> >
> > The check that is causing the failure helps the CXL driver enforce
> > a CXL spec mandate that decoders be committed in HPA order. The
> > check is needless for autodiscovered regions since their decoders
> > are already programmed. Trying to enforce order in the assembly of
> > these regions is useless because they are assembled once all their
> > member endpoints arrive, and there is no guarantee on the order in
> > which endpoints are discovered during probe.
> >
> > Keep the existing check, but for autodiscovered regions, allow the
> > out of order assembly after a sanity check that the lowered numbered
> > decoder has the lower HPA starting address.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  drivers/cxl/core/region.c 34 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 0f05692bfec3..8770ebcae05d 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -753,6 +753,37 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> >  return to_cxl_decoder(dev);
> >  }
> >
> > +bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > +        struct cxl_region *cxlr_b)
> > +{
> > + struct cxl_region_ref *cxl_rr;
> > + struct cxl_decoder *cxld_a, *cxld_b;
> > +
> > + /*
> > +  * Allow the out of order assembly of auto-discovered regions as
> > +  * long as correct decoder programming order can be verified.
> > +  *
> > +  * Per CXL Spec 3.1 8.2.4.20.12 Committing Decoder Programming,
> > +  * software must commit decoders in HPA order. Therefore it is
> > +  * sufficient to sanity check that the lowered number decoder
> > +  * has the lower HPA starting address.
> > +  */
> > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> > +    return false;
> > +
> > + cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > + cxl_rr = cxl_rr_load(port, cxlr_b);
> > + cxld_b = cxl_rr->decoder;
> > +
> > + if (cxld_b->id > cxld_a->id) {
> > +    dev_dbg(&cxlr_a->dev,
> > +        "allow out of order region ref alloc\n");
> > +    return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> >  static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> >                          struct cxl_region *cxlr)
> >  {
> > @@ -767,7 +798,8 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> >      if (!ip->res)
> >          continue;
> >
> > -    if (ip->res->start > p->res->start) {
> > +    if (ip->res->start > p->res->start &&
> > +        (!auto_order_ok(port, cxlr, iter->region))) {
> >          dev_dbg(&cxlr->dev,
> >              "%s: HPA order violation %s:%pr vs %pr\n",
> >              dev_name(&port->dev),
> >
> > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > --
> > 2.37.3
> >
Wonjae Lee Jan. 31, 2024, 1:02 a.m. UTC | #4
On Mon, Jan 29, 2024 at 08:37:24PM -0800, Alison Schofield wrote:
> On Fri, Jan 26, 2024 at 05:51:08PM +0900, Wonjae Lee wrote:
> > Thank you for your quick response and sending the patch!
> >
> > I'm very sorry for the delay in replying to your mail. It took some time to
> > reproduce test and check the issue.
> >
> > On Fri, Jan 12, 2024 at 09:04:21PM -0800, alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > Wonjae Lee,
> > >
> > > Here is the RFC Patch I mentioned in this thread:
> > > https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/
> > >
> > > This passes the cxl-test suite, so hoping no regression, but it needs
> > > to be tested with the required config: 2 memdevs connected to the same
> > > port, each memdev belongs to a different auto-region. Repeated attempts
> > > should hit the test case and emit this debug message upon success:
> > >  dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");
> > >
> > > Failure will be the HPA order violation message.
> > >
> > > All reviewers & testers welcomed.
> >
> > After applying your patch to base-commit, I ran the same test I mentioned in
> > the previous thread, and during the boot process, a null pointer dereference
> > occurs as shown below:
> >
> > thread: https://lore.kernel.org/linux-cxl/ZaFn0dfaJ8iOi1UW@aschofie-mobl2/T/#t
> > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > dmesg log:
> > [] cxl region0: region sort successful
> > [] cxl region0: mem1:endpoint6 decoder6.0 add: mem1:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> > [] cxl region0: pci0000:3d:port1 decoder1.0 add: mem1:decoder6.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
> > [] BUG: kernel NULL pointer dereference, address: 00000000000002e8
> > ...
> > [] Call Trace:
> > ...
> > []  ? auto_order_ok+0x5f/0xb0 [cxl_core]
> > []  cxl_region_attach_position+0x379/0x860 [cxl_core]
> > []  cxl_region_attach+0x466/0x8b0 [cxl_core]
> >
> >
> > To be specific, cxld_a in auto_order_ok() is NULL:
> >     cxld_a = cxl_region_find_decoder(port, cxlr_a); /* cxld_a == NULL */
> >
> > I've noticed that if I pass cxled as a function argument appropriately, and get
> > cxld_a as shown below, the region initialization succeeds with the log "cxl
> > region0: allow out of order region ref alloc" as you mention.
> >
> > @@ -756,8 +756,8 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> >         return to_cxl_decoder(dev);
> >  }
> > -bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > -                  struct cxl_region *cxlr_b)
> > +bool auto_order_ok(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
> > +                  struct cxl_region *cxlr_a, struct cxl_region *cxlr_b)
> >  {
> >         struct cxl_region_ref *cxl_rr;
> >         struct cxl_decoder *cxld_a, *cxld_b;
> > @@ -771,7 +771,10 @@ bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> >         if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> >                 return false;
> >
> > -       cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > +       if (port == cxled_to_port(cxled))
> > +               cxld_a = &cxled->cxld;
> > +       else
> > +               cxld_a = cxl_region_find_decoder(port, cxlr_a);
> >
> > I hope this helps.
>
> Yes, this is very helpful! Glad to have you trying it out.
>
> I think at this point we can call alloc_region_ref() with a known cxled
> of this port and simply pass it on. I'm going to skip this if...else pattern,
> and simply get cxld_a = &cxled->cxld;
>
> I think there may be opportunity for refinements in cxl_region_find_decoder()
> and it's descendants, but it's not directly tied to this patch.
>
> Thanks,
> Alison

I'm glad it was useful. I agree to change it in the direction you said.
Looking forward to the patch :)

Thanks,
Wonjae

>
> >
> > Thanks,
> > Wonjae
> >
> > > Begin commit log:
> > > Autodiscovered regions can fail to assemble if they are not discovered
> > > in HPA decode order. The user will see failure messages like:
> > >
> > > [] cxl region0: endpoint5: HPA order violation region1
> > > [] cxl region0: endpoint5: failed to allocate region reference
> > >
> > > The check that is causing the failure helps the CXL driver enforce
> > > a CXL spec mandate that decoders be committed in HPA order. The
> > > check is needless for autodiscovered regions since their decoders
> > > are already programmed. Trying to enforce order in the assembly of
> > > these regions is useless because they are assembled once all their
> > > member endpoints arrive, and there is no guarantee on the order in
> > > which endpoints are discovered during probe.
> > >
> > > Keep the existing check, but for autodiscovered regions, allow the
> > > out of order assembly after a sanity check that the lowered numbered
> > > decoder has the lower HPA starting address.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  drivers/cxl/core/region.c 34 +++++++++++++++++++++++++++++++++-
> > >  1 file changed, 33 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index 0f05692bfec3..8770ebcae05d 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -753,6 +753,37 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> > >  return to_cxl_decoder(dev);
> > >  }
> > >
> > > +bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > > +        struct cxl_region *cxlr_b)
> > > +{
> > > + struct cxl_region_ref *cxl_rr;
> > > + struct cxl_decoder *cxld_a, *cxld_b;
> > > +
> > > + /*
> > > +  * Allow the out of order assembly of auto-discovered regions as
> > > +  * long as correct decoder programming order can be verified.
> > > +  *
> > > +  * Per CXL Spec 3.1 8.2.4.20.12 Committing Decoder Programming,
> > > +  * software must commit decoders in HPA order. Therefore it is
> > > +  * sufficient to sanity check that the lowered number decoder
> > > +  * has the lower HPA starting address.
> > > +  */
> > > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> > > +    return false;
> > > +
> > > + cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > > + cxl_rr = cxl_rr_load(port, cxlr_b);
> > > + cxld_b = cxl_rr->decoder;
> > > +
> > > + if (cxld_b->id > cxld_a->id) {
> > > +    dev_dbg(&cxlr_a->dev,
> > > +        "allow out of order region ref alloc\n");
> > > +    return true;
> > > + }
> > > +
> > > + return false;
> > > +}
> > > +
> > >  static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > >                          struct cxl_region *cxlr)
> > >  {
> > > @@ -767,7 +798,8 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > >      if (!ip->res)
> > >          continue;
> > >
> > > -    if (ip->res->start > p->res->start) {
> > > +    if (ip->res->start > p->res->start &&
> > > +        (!auto_order_ok(port, cxlr, iter->region))) {
> > >          dev_dbg(&cxlr->dev,
> > >              "%s: HPA order violation %s:%pr vs %pr\n",
> > >              dev_name(&port->dev),
> > >
> > > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > > --
> > > 2.37.3
> > >
>
Alison Schofield Feb. 8, 2024, 8:52 p.m. UTC | #5
On Thu, Jan 18, 2024 at 11:46:44AM -0800, Dan Williams wrote:

- snip to decoder replay patch -

> 
> So one of the largest roadblocks for creating arbitrary region assembly
> scenarios with cxl_test is the inability to save and restore decoder
> settings.
> 
> The patch below adds support for recording decoder settings and skipping
> the reset of those values when unloading the driver. Then, when the
> driver is reloaded, it simulates the case of BIOS created CXL regions
> prior to OS boot.
> 
> We can go after even finer grained cases once the uunit effort settles,
> but in the meantime cxl_test can add auto-assembly regression tests with
> the current topology.
> 
> With the below you can simply trigger "cxl {en,dis}able-memdev" in the
> proper order to cause the violation.
> 
> -- >8 --
> From 5839392a3dbb64dbbac0630d930b1f48a8ef7ea6 Mon Sep 17 00:00:00 2001
> From: Dan Williams <dan.j.williams@intel.com>
> Date: Wed, 17 Jan 2024 20:56:20 -0800
> Subject: [PATCH] tools/testing/cxl: Add decoder save/restore support
> 
> Record decoder values at init and mock_decoder_commit() time, and
> restore them at the next invocation of mock_init_hdm_decoder(). Add 2
> attributes to the cxl_test "cxl_acpi" device to optionally flush the
> cache of topology decoder values, or disable updating the decoder at
> mock_decoder_reset() time.
> 
> This enables replaying a saved decoder configuration when re-triggering
> a topology scan by re-binding the cxl_acpi driver to "cxl_acpi.0" (the
> cxl_test emulation of an ACPI0017 instance).

Hi Dan,

Sorry it's taken a while to come back on this. I did find it useful
in testing the auto-assembly order issue as you suggested. 

I didn't use this one: &dev_attr_decoder_registry_invalidate.attr,
I just reloaded the cxl-test module to do same.

This I used:  &dev_attr_decoder_registry_reset_disable.attr,
with your decoders state fixup to set CXL_DECODER_STATE_AUTO,
and a work-around to avoid pmem_probe failures on pmem region
auto create. 

More generally, I'm wondering about the implementation of the
'registry_save'. Here it continuously updates during all
cxl-test usage. Did you consider only creating the registry upon
user request and then at the next mock_init_hmd_decoder() look
for and use that registry if it exists.

Usage would be: 
# echo 1 > /sys/bus/platform/devices/cxl_acpi.0/decoder_registry_create
# echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/unbind
# echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/bind

And then maybe, for folks who like to acpi/unbind,bind, rather
than reload module, we could offer a decoder_registry_remove attr.

Am I missing something regarding the need to keep it updated
on the fly? 

Alison

> 
>     # modprobe cxl_test
>     # cxl list -RB -b cxl_test -u
>     {
>       "bus":"root3",
>       "provider":"cxl_test",
>       "regions:root3":[
>         {
>           "region":"region5",
>           "resource":"0xf010000000",
>           "size":"512.00 MiB (536.87 MB)",
>           "type":"ram",
>           "interleave_ways":2,
>           "interleave_granularity":4096,
>           "decode_state":"commit"
>         }
>       ]
>     }
>     # echo 1 > /sys/bus/platform/devices/cxl_acpi.0/decoder_registry_reset_disable
>     # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/unbind
>     # cxl list -RB -b cxl_test -u
>     # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/bind
>     # cxl list -RB -b cxl_test -u
>     {
>       "bus":"root3",
>       "provider":"cxl_test",
>       "regions:root3":[
>         {
>           "region":"region5",
>           "resource":"0xf010000000",
>           "size":"512.00 MiB (536.87 MB)",
>           "type":"ram",
>           "interleave_ways":2,
>           "interleave_granularity":4096,
>           "decode_state":"commit"
>         }
>       ]
>     }
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  tools/testing/cxl/test/cxl.c | 268 +++++++++++++++++++++++++++++++++++
>  1 file changed, 268 insertions(+)
> 
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index f4e517a0c774..3f333d6a57be 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -44,6 +44,9 @@ struct platform_device *cxl_mem_single[NR_MEM_SINGLE];
>  static struct platform_device *cxl_rch[NR_CXL_RCH];
>  static struct platform_device *cxl_rcd[NR_CXL_RCH];
>  
> +static DEFINE_XARRAY(decoder_registry);
> +static bool decoder_registry_reset_disable;
> +
>  static inline bool is_multi_bridge(struct device *dev)
>  {
>  	int i;
> @@ -660,6 +663,153 @@ static int map_targets(struct device *dev, void *data)
>  	return 0;
>  }
>  
> +static unsigned long cxld_registry_index(struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +
> +	/*
> +	 * Upper nibble of a kernel pointer is 0xff, chop that to make
> +	 * space for a cxl_decoder id which should be less than 128
> +	 * given decoder count is a 4-bit field.
> +	 *
> +	 * While @port is reallocated each enumeration, @port->uport_dev
> +	 * is stable.
> +	 */
> +	dev_WARN_ONCE(&port->dev, cxld->id >= 128,
> +		      "decoder id:%d out of range\n", cxld->id);
> +	return (((unsigned long) port->uport_dev) << 4) | cxld->id;
> +}
> +
> +struct cxl_test_decoder {
> +	union {
> +		struct cxl_switch_decoder cxlsd;
> +		struct cxl_endpoint_decoder cxled;
> +	};
> +	union {
> +		struct cxl_dport *targets[CXL_DECODER_MAX_INTERLEAVE];
> +		struct range dpa_range;
> +	};
> +};
> +
> +static struct cxl_test_decoder *cxld_registry_find(struct cxl_decoder *cxld)
> +{
> +	return xa_load(&decoder_registry, cxld_registry_index(cxld));
> +}
> +
> +#define dbg_cxld(port, msg, cxld)                                                       \
> +	do {                                                                            \
> +		struct cxl_decoder *___d = (cxld);                                      \
> +		dev_dbg((port)->uport_dev,                                              \
> +			"decoder%d: %s range: %#llx-%#llx iw: %d ig: %d flags: %#lx\n", \
> +			___d->id, msg, ___d->hpa_range.start,                           \
> +			___d->hpa_range.end + 1, ___d->interleave_ways,                 \
> +			___d->interleave_granularity, ___d->flags);                     \
> +	} while (0)
> +
> +static int mock_decoder_commit(struct cxl_decoder *cxld);
> +static int mock_decoder_reset(struct cxl_decoder *cxld);
> +
> +static void cxld_copy(struct cxl_decoder *a, struct cxl_decoder *b)
> +{
> +	a->id = b->id;
> +	a->hpa_range = b->hpa_range;
> +	a->interleave_ways = b->interleave_ways;
> +	a->interleave_granularity = b->interleave_granularity;
> +	a->target_type = b->target_type;
> +	a->flags = b->flags;
> +	a->commit = mock_decoder_commit;
> +	a->reset = mock_decoder_reset;
> +}
> +
> +static void cxld_registry_restore(struct cxl_decoder *cxld, struct cxl_test_decoder *td)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +
> +	if (is_switch_decoder(&cxld->dev)) {
> +		struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(&cxld->dev);
> +
> +		dbg_cxld(port, "restore", &td->cxlsd.cxld);
> +		cxld_copy(cxld, &td->cxlsd.cxld);
> +		WARN_ON(cxlsd->nr_targets != td->cxlsd.nr_targets);
> +
> +		/* convert saved dport devs to dports */
> +		for (int i = 0; i < cxlsd->nr_targets; i++) {
> +			struct cxl_dport *dport;
> +
> +			if (!td->cxlsd.target[i])
> +				continue;
> +			dport = cxl_find_dport_by_dev(
> +				port, (struct device *)td->cxlsd.target[i]);
> +			WARN_ON(!dport);
> +			cxlsd->target[i] = dport;
> +		}
> +	} else {
> +		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +
> +		dbg_cxld(port, "restore", &td->cxled.cxld);
> +		cxld_copy(cxld, &td->cxled.cxld);
> +		cxled->state = td->cxled.state;
> +		cxled->skip = td->cxled.skip;
> +		if (range_len(&td->dpa_range))
> +			devm_cxl_dpa_reserve(cxled, td->dpa_range.start,
> +					     range_len(&td->dpa_range),
> +					     td->cxled.skip);
> +		if (cxld->flags & CXL_DECODER_F_ENABLE)
> +			port->commit_end = cxld->id;
> +	}
> +}
> +
> +static void __cxld_registry_save(struct cxl_test_decoder *td,
> +				 struct cxl_decoder *cxld)
> +{
> +	if (is_switch_decoder(&cxld->dev)) {
> +		struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(&cxld->dev);
> +
> +		cxld_copy(&td->cxlsd.cxld, cxld);
> +		td->cxlsd.nr_targets = cxlsd->nr_targets;
> +
> +		/* save dport devs as a stable placeholder for dports */
> +		for (int i = 0; i < cxlsd->nr_targets; i++) {
> +			if (!cxlsd->target[i])
> +				continue;
> +			td->cxlsd.target[i] =
> +				(struct cxl_dport *)cxlsd->target[i]->dport_dev;
> +		}
> +	} else {
> +		struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
> +
> +		cxld_copy(&td->cxled.cxld, cxld);
> +		td->cxled.state = cxled->state;
> +		td->cxled.skip = cxled->skip;
> +		if (cxled->dpa_res) {
> +			td->dpa_range.start = cxled->dpa_res->start;
> +			td->dpa_range.end = cxled->dpa_res->end;
> +		} else {
> +			td->dpa_range.start = 0;
> +			td->dpa_range.end = -1;
> +		}
> +	}
> +}
> +
> +static void cxld_registry_save(struct cxl_test_decoder *td, struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +
> +	dbg_cxld(port, "save", cxld);
> +	__cxld_registry_save(td, cxld);
> +}
> +
> +static void cxld_registry_update(struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +	struct cxl_test_decoder *td = cxld_registry_find(cxld);
> +
> +	dev_WARN_ONCE(port->uport_dev, !td, "%s failed\n", __func__);
> +
> +	dbg_cxld(port, "update", cxld);
> +	__cxld_registry_save(td, cxld);
> +}
> +
>  static int mock_decoder_commit(struct cxl_decoder *cxld)
>  {
>  	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> @@ -679,6 +829,7 @@ static int mock_decoder_commit(struct cxl_decoder *cxld)
>  
>  	port->commit_end++;
>  	cxld->flags |= CXL_DECODER_F_ENABLE;
> +	cxld_registry_update(cxld);
>  
>  	return 0;
>  }
> @@ -701,10 +852,43 @@ static int mock_decoder_reset(struct cxl_decoder *cxld)
>  
>  	port->commit_end--;
>  	cxld->flags &= ~CXL_DECODER_F_ENABLE;
> +	if (decoder_registry_reset_disable)
> +		dev_dbg(port->uport_dev, "decoder%d: skip registry update\n",
> +			cxld->id);
> +	else
> +		cxld_registry_update(cxld);
>  
>  	return 0;
>  }
>  
> +static void cxld_registry_invalidate(void)
> +{
> +	unsigned long index;
> +	void *entry;
> +
> +	xa_for_each(&decoder_registry, index, entry) {
> +		xa_erase(&decoder_registry, index);
> +		kfree(entry);
> +	}
> +}
> +
> +static struct cxl_test_decoder *cxld_registry_new(struct cxl_decoder *cxld)
> +{
> +	struct cxl_test_decoder *td __free(kfree) = kzalloc(sizeof(*td), GFP_KERNEL);
> +
> +	if (!td)
> +		return NULL;
> +
> +	if (xa_insert(&decoder_registry, cxld_registry_index(cxld), td,
> +		      GFP_KERNEL)) {
> +		WARN_ON(1);
> +		return NULL;
> +	}
> +
> +	cxld_registry_save(td, cxld);
> +	return no_free_ptr(td);
> +}
> +
>  static void default_mock_decoder(struct cxl_decoder *cxld)
>  {
>  	cxld->hpa_range = (struct range){
> @@ -717,6 +901,9 @@ static void default_mock_decoder(struct cxl_decoder *cxld)
>  	cxld->target_type = CXL_DECODER_HOSTONLYMEM;
>  	cxld->commit = mock_decoder_commit;
>  	cxld->reset = mock_decoder_reset;
> +
> +	if (!cxld_registry_new(cxld))
> +		dev_dbg(&cxld->dev, "failed to add to registry\n");
>  }
>  
>  static int first_decoder(struct device *dev, void *data)
> @@ -738,6 +925,7 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
>  	struct cxl_endpoint_decoder *cxled;
>  	struct cxl_switch_decoder *cxlsd;
>  	struct cxl_port *port, *iter;
> +	struct cxl_test_decoder *td;
>  	const int size = SZ_512M;
>  	struct cxl_memdev *cxlmd;
>  	struct cxl_dport *dport;
> @@ -767,6 +955,12 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
>  		port = cxled_to_port(cxled);
>  	}
>  
> +	td = cxld_registry_find(cxld);
> +	if (td) {
> +		cxld_registry_restore(cxld, td);
> +		return;
> +	}
> +
>  	/*
>  	 * The first decoder on the first 2 devices on the first switch
>  	 * attached to host-bridge0 mock a fake / static RAM region. All
> @@ -795,6 +989,8 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
>  	devm_cxl_dpa_reserve(cxled, 0, size / cxld->interleave_ways, 0);
>  	cxld->commit = mock_decoder_commit;
>  	cxld->reset = mock_decoder_reset;
> +	if (!cxld_registry_new(cxld))
> +		dev_dbg(&cxld->dev, "failed to add to registry\n");
>  
>  	/*
>  	 * Now that endpoint decoder is set up, walk up the hierarchy
> @@ -837,6 +1033,7 @@ static void mock_init_hdm_decoder(struct cxl_decoder *cxld)
>  			.start = base,
>  			.end = base + size - 1,
>  		};
> +		cxld_registry_update(cxld);
>  		put_device(dev);
>  	}
>  }
> @@ -1233,6 +1430,73 @@ static void cxl_single_exit(void)
>  	}
>  }
>  
> +static ssize_t decoder_registry_invalidate_show(struct device *dev,
> +						struct device_attribute *attr,
> +						char *buf)
> +{
> +	unsigned long index;
> +	bool empty = true;
> +	void *entry;
> +
> +	xa_for_each(&decoder_registry, index, entry) {
> +		empty = false;
> +		break;
> +	}
> +
> +	return sysfs_emit(buf, "%d\n", !empty);
> +}
> +
> +static ssize_t decoder_registry_invalidate_store(struct device *dev,
> +						 struct device_attribute *attr,
> +						 const char *buf, size_t count)
> +{
> +	bool invalidate;
> +	int rc;
> +
> +	rc = kstrtobool(buf, &invalidate);
> +	if (rc)
> +		return rc;
> +
> +	guard(device)(dev);
> +
> +	if (dev->driver)
> +		return -EBUSY;
> +
> +	cxld_registry_invalidate();
> +	return count;
> +}
> +
> +static DEVICE_ATTR_RW(decoder_registry_invalidate);
> +
> +static ssize_t
> +decoder_registry_reset_disable_show(struct device *dev,
> +				    struct device_attribute *attr, char *buf)
> +{
> +	return sysfs_emit(buf, "%d\n", decoder_registry_reset_disable);
> +}
> +
> +static ssize_t
> +decoder_registry_reset_disable_store(struct device *dev,
> +				     struct device_attribute *attr,
> +				     const char *buf, size_t count)
> +{
> +	int rc;
> +
> +	rc = kstrtobool(buf, &decoder_registry_reset_disable);
> +	if (rc)
> +		return rc;
> +	return count;
> +}
> +
> +static DEVICE_ATTR_RW(decoder_registry_reset_disable);
> +
> +static struct attribute *cxl_acpi_attrs[] = {
> +	&dev_attr_decoder_registry_invalidate.attr,
> +	&dev_attr_decoder_registry_reset_disable.attr,
> +	NULL
> +};
> +ATTRIBUTE_GROUPS(cxl_acpi);
> +
>  static __init int cxl_test_init(void)
>  {
>  	int rc, i;
> @@ -1377,6 +1641,7 @@ static __init int cxl_test_init(void)
>  
>  	mock_companion(&acpi0017_mock, &cxl_acpi->dev);
>  	acpi0017_mock.dev.bus = &platform_bus_type;
> +	cxl_acpi->dev.groups = cxl_acpi_groups;
>  
>  	rc = platform_device_add(cxl_acpi);
>  	if (rc)
> @@ -1446,6 +1711,9 @@ static __exit void cxl_test_exit(void)
>  	depopulate_all_mock_resources();
>  	gen_pool_destroy(cxl_mock_pool);
>  	unregister_cxl_mock_ops(&cxl_mock_ops);
> +
> +	cxld_registry_invalidate();
> +	xa_destroy(&decoder_registry);
>  }
>  
>  module_param(interleave_arithmetic, int, 0444);
> -- 
> 2.43.0
Dan Williams Feb. 8, 2024, 10:57 p.m. UTC | #6
Alison Schofield wrote:
> 
> On Thu, Jan 18, 2024 at 11:46:44AM -0800, Dan Williams wrote:
> 
> - snip to decoder replay patch -
> 
> > 
> > So one of the largest roadblocks for creating arbitrary region assembly
> > scenarios with cxl_test is the inability to save and restore decoder
> > settings.
> > 
> > The patch below adds support for recording decoder settings and skipping
> > the reset of those values when unloading the driver. Then, when the
> > driver is reloaded, it simulates the case of BIOS created CXL regions
> > prior to OS boot.
> > 
> > We can go after even finer grained cases once the uunit effort settles,
> > but in the meantime cxl_test can add auto-assembly regression tests with
> > the current topology.
> > 
> > With the below you can simply trigger "cxl {en,dis}able-memdev" in the
> > proper order to cause the violation.
> > 
> > -- >8 --
> > From 5839392a3dbb64dbbac0630d930b1f48a8ef7ea6 Mon Sep 17 00:00:00 2001
> > From: Dan Williams <dan.j.williams@intel.com>
> > Date: Wed, 17 Jan 2024 20:56:20 -0800
> > Subject: [PATCH] tools/testing/cxl: Add decoder save/restore support
> > 
> > Record decoder values at init and mock_decoder_commit() time, and
> > restore them at the next invocation of mock_init_hdm_decoder(). Add 2
> > attributes to the cxl_test "cxl_acpi" device to optionally flush the
> > cache of topology decoder values, or disable updating the decoder at
> > mock_decoder_reset() time.
> > 
> > This enables replaying a saved decoder configuration when re-triggering
> > a topology scan by re-binding the cxl_acpi driver to "cxl_acpi.0" (the
> > cxl_test emulation of an ACPI0017 instance).
> 
> Hi Dan,
> 
> Sorry it's taken a while to come back on this. I did find it useful
> in testing the auto-assembly order issue as you suggested. 
> 
> I didn't use this one: &dev_attr_decoder_registry_invalidate.attr,
> I just reloaded the cxl-test module to do same.

Makes sense, that can go.

> This I used:  &dev_attr_decoder_registry_reset_disable.attr,
> with your decoders state fixup to set CXL_DECODER_STATE_AUTO,
> and a work-around to avoid pmem_probe failures on pmem region
> auto create. 
> 
> More generally, I'm wondering about the implementation of the
> 'registry_save'. Here it continuously updates during all
> cxl-test usage. Did you consider only creating the registry upon
> user request and then at the next mock_init_hmd_decoder() look
> for and use that registry if it exists.
> 
> Usage would be: 
> # echo 1 > /sys/bus/platform/devices/cxl_acpi.0/decoder_registry_create

You mean wait to snapshot decoder state when triggered?

> # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/unbind
> # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/bind
> 
> And then maybe, for folks who like to acpi/unbind,bind, rather
> than reload module, we could offer a decoder_registry_remove attr.
> 
> Am I missing something regarding the need to keep it updated
> on the fly? 

I can see it being an alternate way, but not sure how to weigh one
approach vs the other. Is the dynamic update getting in the way of a
test case you are thinking about? Otherwise it seemed easy to reason
that the registry is always on, but only takes effect when
registry_reset_disable is set, and cxl_acpi rebinds the test device.
Alison Schofield Feb. 9, 2024, 12:23 a.m. UTC | #7
On Thu, Feb 08, 2024 at 02:57:44PM -0800, Dan Williams wrote:
> Alison Schofield wrote:
> > 
> > On Thu, Jan 18, 2024 at 11:46:44AM -0800, Dan Williams wrote:
> > 
> > - snip to decoder replay patch -
> > 
> > > 
> > > So one of the largest roadblocks for creating arbitrary region assembly
> > > scenarios with cxl_test is the inability to save and restore decoder
> > > settings.
> > > 
> > > The patch below adds support for recording decoder settings and skipping
> > > the reset of those values when unloading the driver. Then, when the
> > > driver is reloaded, it simulates the case of BIOS created CXL regions
> > > prior to OS boot.
> > > 
> > > We can go after even finer grained cases once the uunit effort settles,
> > > but in the meantime cxl_test can add auto-assembly regression tests with
> > > the current topology.
> > > 
> > > With the below you can simply trigger "cxl {en,dis}able-memdev" in the
> > > proper order to cause the violation.
> > > 
> > > -- >8 --
> > > From 5839392a3dbb64dbbac0630d930b1f48a8ef7ea6 Mon Sep 17 00:00:00 2001
> > > From: Dan Williams <dan.j.williams@intel.com>
> > > Date: Wed, 17 Jan 2024 20:56:20 -0800
> > > Subject: [PATCH] tools/testing/cxl: Add decoder save/restore support
> > > 
> > > Record decoder values at init and mock_decoder_commit() time, and
> > > restore them at the next invocation of mock_init_hdm_decoder(). Add 2
> > > attributes to the cxl_test "cxl_acpi" device to optionally flush the
> > > cache of topology decoder values, or disable updating the decoder at
> > > mock_decoder_reset() time.
> > > 
> > > This enables replaying a saved decoder configuration when re-triggering
> > > a topology scan by re-binding the cxl_acpi driver to "cxl_acpi.0" (the
> > > cxl_test emulation of an ACPI0017 instance).
> > 
> > Hi Dan,
> > 
> > Sorry it's taken a while to come back on this. I did find it useful
> > in testing the auto-assembly order issue as you suggested. 
> > 
> > I didn't use this one: &dev_attr_decoder_registry_invalidate.attr,
> > I just reloaded the cxl-test module to do same.
> 
> Makes sense, that can go.
> 
> > This I used:  &dev_attr_decoder_registry_reset_disable.attr,
> > with your decoders state fixup to set CXL_DECODER_STATE_AUTO,
> > and a work-around to avoid pmem_probe failures on pmem region
> > auto create. 
> > 
> > More generally, I'm wondering about the implementation of the
> > 'registry_save'. Here it continuously updates during all
> > cxl-test usage. Did you consider only creating the registry upon
> > user request and then at the next mock_init_hmd_decoder() look
> > for and use that registry if it exists.
> > 
> > Usage would be: 
> > # echo 1 > /sys/bus/platform/devices/cxl_acpi.0/decoder_registry_create
> 
> You mean wait to snapshot decoder state when triggered?
Yes.

> 
> > # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/unbind
> > # echo cxl_acpi.0 > /sys/bus/platform/drivers/cxl_acpi/bind
> > 
> > And then maybe, for folks who like to acpi/unbind,bind, rather
> > than reload module, we could offer a decoder_registry_remove attr.
> > 
> > Am I missing something regarding the need to keep it updated
> > on the fly? 
> 
> I can see it being an alternate way, but not sure how to weigh one
> approach vs the other. Is the dynamic update getting in the way of a
> test case you are thinking about? Otherwise it seemed easy to reason
> that the registry is always on, but only takes effect when
> registry_reset_disable is set, and cxl_acpi rebinds the test device.

The constant work of updating the registry caught my attention, but
no impact that I know of. The dynamic approach is more intrusive and
impacts the normal path needlessly. A snapshot approach limits much
of the impact to users of the new feature.
Dan Williams Feb. 9, 2024, 5:25 a.m. UTC | #8
Alison Schofield wrote:
> > I can see it being an alternate way, but not sure how to weigh one
> > approach vs the other. Is the dynamic update getting in the way of a
> > test case you are thinking about? Otherwise it seemed easy to reason
> > that the registry is always on, but only takes effect when
> > registry_reset_disable is set, and cxl_acpi rebinds the test device.
> 
> The constant work of updating the registry caught my attention, but
> no impact that I know of. The dynamic approach is more intrusive and
> impacts the normal path needlessly. A snapshot approach limits much
> of the impact to users of the new feature.
> 

It does mean more infrastructure to walk the entire topology and save
off all of the decoders. However, this is straightforward because
decoders are devices on the CXL bus.

I am not worried about the intrusiveness as much as how easy it is to
conceptualize new tests. I do think it is easier to conceptualize that
the flow is

"get the decoders the way you want them, snapshot them, ..., rebind",

...rather than the current:

"get the decoders the way you want them, disable-reset, rebind"

Notice the "..." in the first flow where it is a bit more forgiving if
you do some decoder operations between the snapshot step and the rebind
step. Where the latter flow requires that you know the side effects of
how rebind tries to reset the decoder state.

It still ends up with the all same complexity on the read side, but
should be easier to craft new static configurations without needing to
worry about the write side of the registry.
diff mbox series

Patch

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 0f05692bfec3..8770ebcae05d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -753,6 +753,37 @@  static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
 	return to_cxl_decoder(dev);
 }
 
+bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
+		   struct cxl_region *cxlr_b)
+{
+	struct cxl_region_ref *cxl_rr;
+	struct cxl_decoder *cxld_a, *cxld_b;
+
+	/*
+	 * Allow the out of order assembly of auto-discovered regions as
+	 * long as correct decoder programming order can be verified.
+	 *
+	 * Per CXL Spec 3.1 8.2.4.20.12 Committing Decoder Programming,
+	 * software must commit decoders in HPA order. Therefore it is
+	 * sufficient to sanity check that the lowered number decoder
+	 * has the lower HPA starting address.
+	 */
+	if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
+		return false;
+
+	cxld_a = cxl_region_find_decoder(port, cxlr_a);
+	cxl_rr = cxl_rr_load(port, cxlr_b);
+	cxld_b = cxl_rr->decoder;
+
+	if (cxld_b->id > cxld_a->id) {
+		dev_dbg(&cxlr_a->dev,
+			"allow out of order region ref alloc\n");
+		return true;
+	}
+
+	return false;
+}
+
 static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
 					       struct cxl_region *cxlr)
 {
@@ -767,7 +798,8 @@  static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
 		if (!ip->res)
 			continue;
 
-		if (ip->res->start > p->res->start) {
+		if (ip->res->start > p->res->start &&
+		    (!auto_order_ok(port, cxlr, iter->region))) {
 			dev_dbg(&cxlr->dev,
 				"%s: HPA order violation %s:%pr vs %pr\n",
 				dev_name(&port->dev),