diff mbox series

[v12,01/20] cxl/port: Fix release of RCD endpoints

Message ID 20231018171713.1883517-2-rrichter@amd.com
State New, archived
Headers show
Series cxl/pci: Add support for RCH RAS error handling | expand

Commit Message

Robert Richter Oct. 18, 2023, 5:16 p.m. UTC
Binding and unbindind RCD endpoints (e.g. mem0 device) caused the
corresponding endpoint not being released. Reason for that is the
wrong port discovered for RCD endpoints. See cxl_mem_probe() for
proper endpoint parent detection. Fix delete_endpoint() respectively.

Fixes: 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration")
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/core/port.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Dan Williams Oct. 27, 2023, 3:46 a.m. UTC | #1
Robert Richter wrote:
> Binding and unbindind RCD endpoints (e.g. mem0 device) caused the
> corresponding endpoint not being released. Reason for that is the
> wrong port discovered for RCD endpoints. See cxl_mem_probe() for
> proper endpoint parent detection. Fix delete_endpoint() respectively.
> 
> Fixes: 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration")
> Signed-off-by: Robert Richter <rrichter@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

This patch causes cxl-topology.sh to crash with use-after-free
signatures. I notice that this path should just be using
endpoint->dev.parent and is missing reference counting on the parent.

I will replace this path with the following:

-- >8 --
Subject: cxl/port: Fix delete_endpoint() vs parent unregistration race

From: Dan Williams <dan.j.williams@intel.com>

The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of
ports (struct cxl_port objects) between an endpoint and the root of a
CXL topology. Each port including the endpoint port is attached to the
cxl_port driver.

Given that setup it follows that when either any port in that lineage
goes through a cxl_port ->remove() event, or the memdev goes through a
cxl_mem ->remove() event. The hierarchy below the removed port, or the
entire hierarchy if the memdev is removed needs to come down.

The delete_endpoint() callback is careful to check whether it is being
called to tear down the hierarchy, or if it is only being called to
teardown the memdev because an ancestor port is going through
->remove().

That care needs to take the device_lock() of the endpoint's parent.
Which requires 2 bugs to be fixed:

1/ A reference on the parent is needed to prevent use-after-free
   scenarios like this signature:

    BUG: spinlock bad magic on CPU#0, kworker/u56:0/11
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Workqueue: cxl_port detach_memdev [cxl_core]
    RIP: 0010:spin_bug+0x65/0xa0
    Call Trace:
      do_raw_spin_lock+0x69/0xa0
     __mutex_lock+0x695/0xb80
     delete_endpoint+0xad/0x150 [cxl_core]
     devres_release_all+0xb8/0x110
     device_unbind_cleanup+0xe/0x70
     device_release_driver_internal+0x1d2/0x210
     detach_memdev+0x15/0x20 [cxl_core]
     process_one_work+0x1e3/0x4c0
     worker_thread+0x1dd/0x3d0

2/ In the case of RCH topologies, the parent device that needs to be
   locked is not always @port->dev as returned by cxl_mem_find_port(), use
   endpoint->dev.parent instead.

Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: <stable@vger.kernel.org>
Reported-by: Robert Richter <rrichter@amd.com>
Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/port.c |   13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 5ba606c6e03f..6bbaca5ac60d 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1227,13 +1227,7 @@ static void delete_endpoint(void *data)
 {
 	struct cxl_memdev *cxlmd = data;
 	struct cxl_port *endpoint = cxlmd->endpoint;
-	struct cxl_port *parent_port;
-	struct device *parent;
-
-	parent_port = cxl_mem_find_port(cxlmd, NULL);
-	if (!parent_port)
-		goto out;
-	parent = &parent_port->dev;
+	struct device *parent = endpoint->dev.parent;
 
 	device_lock(parent);
 	if (parent->driver && !endpoint->dead) {
@@ -1243,15 +1237,16 @@ static void delete_endpoint(void *data)
 	}
 	cxlmd->endpoint = NULL;
 	device_unlock(parent);
-	put_device(parent);
-out:
 	put_device(&endpoint->dev);
+	put_device(parent);
 }
 
 int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
 {
+	struct device *parent = endpoint->dev.parent;
 	struct device *dev = &cxlmd->dev;
 
+	get_device(parent);
 	get_device(&endpoint->dev);
 	cxlmd->endpoint = endpoint;
 	cxlmd->depth = endpoint->depth;
Robert Richter Oct. 27, 2023, 10:55 p.m. UTC | #2
Dan,

On 26.10.23 20:46:33, Dan Williams wrote:
> Robert Richter wrote:
> > Binding and unbindind RCD endpoints (e.g. mem0 device) caused the
> > corresponding endpoint not being released. Reason for that is the
> > wrong port discovered for RCD endpoints. See cxl_mem_probe() for
> > proper endpoint parent detection. Fix delete_endpoint() respectively.
> > 
> > Fixes: 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration")
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> This patch causes cxl-topology.sh to crash with use-after-free
> signatures. I notice that this path should just be using
> endpoint->dev.parent and is missing reference counting on the parent.
> 
> I will replace this path with the following:
> 
> -- >8 --
> Subject: cxl/port: Fix delete_endpoint() vs parent unregistration race
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of
> ports (struct cxl_port objects) between an endpoint and the root of a
> CXL topology. Each port including the endpoint port is attached to the
> cxl_port driver.
> 
> Given that setup it follows that when either any port in that lineage
> goes through a cxl_port ->remove() event, or the memdev goes through a
> cxl_mem ->remove() event. The hierarchy below the removed port, or the
> entire hierarchy if the memdev is removed needs to come down.
> 
> The delete_endpoint() callback is careful to check whether it is being
> called to tear down the hierarchy, or if it is only being called to
> teardown the memdev because an ancestor port is going through
> ->remove().
> 
> That care needs to take the device_lock() of the endpoint's parent.
> Which requires 2 bugs to be fixed:
> 
> 1/ A reference on the parent is needed to prevent use-after-free
>    scenarios like this signature:
> 
>     BUG: spinlock bad magic on CPU#0, kworker/u56:0/11
>     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
>     Workqueue: cxl_port detach_memdev [cxl_core]
>     RIP: 0010:spin_bug+0x65/0xa0
>     Call Trace:
>       do_raw_spin_lock+0x69/0xa0
>      __mutex_lock+0x695/0xb80
>      delete_endpoint+0xad/0x150 [cxl_core]
>      devres_release_all+0xb8/0x110
>      device_unbind_cleanup+0xe/0x70
>      device_release_driver_internal+0x1d2/0x210
>      detach_memdev+0x15/0x20 [cxl_core]
>      process_one_work+0x1e3/0x4c0
>      worker_thread+0x1dd/0x3d0
> 
> 2/ In the case of RCH topologies, the parent device that needs to be
>    locked is not always @port->dev as returned by cxl_mem_find_port(), use
>    endpoint->dev.parent instead.
> 
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> Cc: <stable@vger.kernel.org>
> Reported-by: Robert Richter <rrichter@amd.com>
> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I have tested your patch.

Still the endpoints are not properly released after unbinding/binding
mem0:

root@onyx7628host:~# find /sys/devices/ -name uport -ls
   258978      0 lrwxrwxrwx   1 root     root            0 Oct 27 21:27 /sys/devices/platform/ACPI0017:00/root0/endpoint1/uport -> ../../../../pci0000:7f/0000:7f:00.0/mem0
   258553      0 lrwxrwxrwx   1 root     root            0 Oct 27 21:27 /sys/devices/platform/ACPI0017:00/root0/uport -> ../../ACPI0017:00
   259363      0 lrwxrwxrwx   1 root     root            0 Oct 27 21:27 /sys/devices/platform/ACPI0017:00/root0/endpoint2/uport -> ../../../../pci0000:7f/0000:7f:00.0/mem0

Here the DEVRES logs:

[   47.756076] cxl_acpi ACPI0017:00: DEVRES ADD 00000000b864173a cxl_acpi_lock_reset_class (16 bytes)
[   47.776194] cxl_acpi ACPI0017:00: DEVRES ADD 00000000f2889cb8 devm_kzalloc_release (64 bytes)
[   47.795361] cxl_acpi ACPI0017:00: DEVRES ADD 00000000456b3844 unregister_port (16 bytes)
[   47.813519] cxl_acpi ACPI0017:00: DEVRES ADD 00000000c531ceef cxl_unlink_uport (16 bytes)
[   47.868008] cxl_acpi ACPI0017:00: DEVRES ADD 0000000034c233ab devm_kzalloc_release (160 bytes)
[   47.979283] cxl_acpi ACPI0017:00: DEVRES ADD 0000000028cfab8c cxl_dport_remove (16 bytes)
[   47.997641] cxl_acpi ACPI0017:00: DEVRES ADD 00000000fce74ad0 cxl_dport_unlink (16 bytes)
[   48.026608] cxl_acpi ACPI0017:00: DEVRES ADD 0000000087096c98 remove_cxl_resources (16 bytes)
[   48.467055] cxl_pci 0000:7f:00.0: DEVRES ADD 00000000439965a5 pcim_release (8 bytes)
[   48.492875] cxl_pci 0000:7f:00.0: DEVRES ADD 000000000352019d devm_kzalloc_release (688 bytes)
[   48.593561] cxl_pci 0000:7f:00.0: DEVRES ADD 0000000093b19b3d devm_region_release (24 bytes)
[   48.612509] cxl_pci 0000:7f:00.0: DEVRES ADD 00000000f2a2b9ff devm_ioremap_release (8 bytes)
[   48.631460] cxl_pci 0000:7f:00.0: DEVRES ADD 00000000a4df0705 devm_region_release (24 bytes)
[   48.650416] cxl_pci 0000:7f:00.0: DEVRES ADD 00000000a35318c5 devm_ioremap_release (8 bytes)
[   48.669359] cxl_pci 0000:7f:00.0: DEVRES ADD 000000000411c084 devm_region_release (24 bytes)
[   48.688302] cxl_pci 0000:7f:00.0: DEVRES ADD 000000003c2ff619 devm_ioremap_release (8 bytes)
[   48.851787] cxl_pci 0000:7f:00.0: DEVRES ADD 0000000092a86a29 devm_region_release (24 bytes)
[   48.918272] cxl_pci 0000:7f:00.0: DEVRES ADD 0000000029f3d56b devm_ioremap_release (8 bytes)
[   48.943092] cxl_pci 0000:7f:00.0: DEVRES ADD 000000003a7c75bf devm_attr_group_remove (8 bytes)
[   48.962428] cxl_pci 0000:7f:00.0: DEVRES ADD 0000000038e85a6f msi_device_data_release (104 bytes)
[   48.991467] cxl_pci 0000:7f:00.0: DEVRES ADD 00000000fb668702 pcim_msi_release (16 bytes)
[   49.433956] cxl_pci 0000:7f:00.0: DEVRES ADD 00000000ceeb66f6 put_sanitize (16 bytes)
[   49.458405] cxl_pci 0000:7f:00.0: DEVRES ADD 000000000732acf5 cxl_memdev_unregister (16 bytes)
[   49.477751] cxl_pci 0000:7f:00.0: DEVRES ADD 0000000062830d65 free_event_buf (16 bytes)
[   50.641360] cxl_mem mem0: DEVRES ADD 0000000053ef2f50 devm_kzalloc_release (16 bytes)
[   50.658961] cxl_mem mem0: DEVRES ADD 00000000fba70c16 remove_debugfs (16 bytes)
[   50.783295] cxl_port endpoint1: DEVRES ADD 00000000adbc4e00 devm_kzalloc_release (104 bytes)
[   50.819057] cxl_port endpoint1: DEVRES ADD 000000000cad4cbf schedule_detach (16 bytes)
[   50.845695] cxl_acpi ACPI0017:00: DEVRES ADD 00000000bd0ee305 unregister_port (16 bytes)
[   50.863853] cxl_acpi ACPI0017:00: DEVRES ADD 0000000093e8b578 cxl_unlink_uport (16 bytes)
[   50.882224] cxl_acpi ACPI0017:00: DEVRES ADD 00000000afea8cbc cxl_unlink_parent_dport (16 bytes)
[   50.921823] cxl_mem mem0: DEVRES ADD 00000000acbd776c delete_endpoint (16 bytes)
[   50.938423] cxl_mem mem0: DEVRES ADD 00000000c504d947 enable_suspend (16 bytes)
[ 1187.191399] cxl_mem mem0: DEVRES REL 00000000c504d947 enable_suspend (16 bytes)
[ 1187.207813] cxl_mem mem0: DEVRES REL 00000000acbd776c delete_endpoint (16 bytes)
[ 1187.224426] cxl_mem mem0: DEVRES REL 00000000fba70c16 remove_debugfs (16 bytes)
[ 1187.240845] cxl_mem mem0: DEVRES REL 0000000053ef2f50 devm_kzalloc_release (16 bytes)
[ 1194.935890] cxl_mem mem0: DEVRES ADD 0000000049747689 devm_kzalloc_release (16 bytes)
[ 1194.953479] cxl_mem mem0: DEVRES ADD 0000000025dc7c28 remove_debugfs (16 bytes)
[ 1195.085016] cxl_port endpoint2: DEVRES ADD 00000000777769c7 devm_kzalloc_release (104 bytes)
[ 1195.135858] cxl_port endpoint2: DEVRES ADD 000000007b7aea26 schedule_detach (16 bytes)
[ 1195.162482] cxl_acpi ACPI0017:00: DEVRES ADD 00000000499f1a7a unregister_port (16 bytes)
[ 1195.180650] cxl_acpi ACPI0017:00: DEVRES ADD 0000000065d85ed8 cxl_unlink_uport (16 bytes)
[ 1195.199010] cxl_acpi ACPI0017:00: DEVRES ADD 000000003b0c1f03 cxl_unlink_parent_dport (16 bytes)
[ 1195.229479] cxl_mem mem0: DEVRES ADD 000000009973885e delete_endpoint (16 bytes)
[ 1195.246100] cxl_mem mem0: DEVRES ADD 00000000512c0cc7 enable_suspend (16 bytes)

delete_endpoint() is called here, but the uport etc. is not unbound.
Which means this is not true:

	if (parent->driver && !endpoint->dead) {
		...

I don't remember this with my patch. The parent is there different, so
that could be the reason.

I could not yet look into more detail but wanted to let you know. Will
continue.

Thanks,

-Robert
Dan Williams Oct. 28, 2023, 12:32 a.m. UTC | #3
Robert Richter wrote:
> Dan,
[..]
> 
> delete_endpoint() is called here, but the uport etc. is not unbound.
> Which means this is not true:
> 
> 	if (parent->driver && !endpoint->dead) {
> 		...
> 
> I don't remember this with my patch. The parent is there different, so
> that could be the reason.
> 
> I could not yet look into more detail but wanted to let you know. Will
> continue.

Apologies, I didn't have that regression going, I think I see the issue.
Thanks for the heads up.
Dan Williams Oct. 28, 2023, 1:39 a.m. UTC | #4
Dan Williams wrote:
> Robert Richter wrote:
> > Dan,
> [..]
> > 
> > delete_endpoint() is called here, but the uport etc. is not unbound.
> > Which means this is not true:
> > 
> > 	if (parent->driver && !endpoint->dead) {
> > 		...
> > 
> > I don't remember this with my patch. The parent is there different, so
> > that could be the reason.
> > 
> > I could not yet look into more detail but wanted to let you know. Will
> > continue.
> 
> Apologies, I didn't have that regression going, I think I see the issue.
> Thanks for the heads up.

Here is the incremental fix on top of the lifetime fix:

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 6230ddfc0be8..0fe915ec2cc2 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1217,30 +1217,39 @@ static struct device *grandparent(struct device *dev)
 	return NULL;
 }
 
+static struct device *endpoint_host(struct cxl_port *endpoint)
+{
+	struct cxl_port *port = to_cxl_port(endpoint->dev.parent);
+
+	if (is_cxl_root(port))
+		return port->uport_dev;
+	return &port->dev;
+}
+
 static void delete_endpoint(void *data)
 {
 	struct cxl_memdev *cxlmd = data;
 	struct cxl_port *endpoint = cxlmd->endpoint;
-	struct device *parent = endpoint->dev.parent;
+	struct device *host = endpoint_host(endpoint);
 
-	device_lock(parent);
-	if (parent->driver && !endpoint->dead) {
-		devm_release_action(parent, cxl_unlink_parent_dport, endpoint);
-		devm_release_action(parent, cxl_unlink_uport, endpoint);
-		devm_release_action(parent, unregister_port, endpoint);
+	device_lock(host);
+	if (host->driver && !endpoint->dead) {
+		devm_release_action(host, cxl_unlink_parent_dport, endpoint);
+		devm_release_action(host, cxl_unlink_uport, endpoint);
+		devm_release_action(host, unregister_port, endpoint);
 	}
 	cxlmd->endpoint = NULL;
-	device_unlock(parent);
+	device_unlock(host);
 	put_device(&endpoint->dev);
-	put_device(parent);
+	put_device(host);
 }
 
 int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
 {
-	struct device *parent = endpoint->dev.parent;
+	struct device *host = endpoint_host(endpoint);
 	struct device *dev = &cxlmd->dev;
 
-	get_device(parent);
+	get_device(host);
 	get_device(&endpoint->dev);
 	cxlmd->endpoint = endpoint;
 	cxlmd->depth = endpoint->depth;


---

...and here is the new regression test so I don't mess up and miss this
again:

diff --git a/cxl/memdev.c b/cxl/memdev.c
index d76a4d86a40a..81dfd4c25b25 100644
--- a/cxl/memdev.c
+++ b/cxl/memdev.c
@@ -752,6 +752,8 @@ static int memdev_action(int argc, const char **argv, struct cxl_ctx *ctx,
                        if (end[0] == 0)
                                continue;
                } else {
+                       unsigned long domain, bus, dev, func;
+
                        if (strcmp(argv[i], "all") == 0) {
                                argc = 1;
                                break;
@@ -760,6 +762,12 @@ static int memdev_action(int argc, const char **argv, struct cxl_ctx *ctx,
                                continue;
                        if (sscanf(argv[i], "%lu", &id) == 1)
                                continue;
+                       if (sscanf(argv[i], "%lx:%lx:%lx.%lx", &domain, &bus, &dev, &func))
+                               continue;
+                       if (sscanf(argv[i], "cxl_mem.%lu", &id))
+                               continue;
+                       if (sscanf(argv[i], "cxl_rcd.%lu", &id))
+                               continue;
                }
 
                log_err(&ml, "'%s' is not a valid memdev %s\n", argv[i],
diff --git a/test/cxl-topology.sh b/test/cxl-topology.sh
index 89d01a89ccb1..0320887a953b 100644
--- a/test/cxl-topology.sh
+++ b/test/cxl-topology.sh
@@ -120,6 +120,13 @@ count=$(jq "map(select(.pmem_size == $pmem_size)) | length" <<< $json)
 ((bridges == 2 && count == 8 || bridges == 3 && count == 10 ||
   bridges == 4 && count == 11)) || err "$LINENO"
 
+# check rcd endpoints disappear when disabling the memdev
+m=$($CXL list -M -b cxl_test | jq -r ".[].host" | grep rcd)
+ep=$($CXL list -E -m $m | jq -r ".[].endpoint")
+$CXL disable-memdev $m --force
+check=$($CXL list -E -e $ep | jq -r ".[].endpoint")
+[ -z "$check" ] || err "$LINENO"
+$CXL enable-memdev $m
Robert Richter Oct. 29, 2023, 4:17 p.m. UTC | #5
On 27.10.23 18:39:28, Dan Williams wrote:
> Dan Williams wrote:
> > Robert Richter wrote:
> > > Dan,
> > [..]
> > > 
> > > delete_endpoint() is called here, but the uport etc. is not unbound.
> > > Which means this is not true:
> > > 
> > > 	if (parent->driver && !endpoint->dead) {
> > > 		...
> > > 
> > > I don't remember this with my patch. The parent is there different, so
> > > that could be the reason.
> > > 
> > > I could not yet look into more detail but wanted to let you know. Will
> > > continue.
> > 
> > Apologies, I didn't have that regression going, I think I see the issue.
> > Thanks for the heads up.
> 
> Here is the incremental fix on top of the lifetime fix:
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 6230ddfc0be8..0fe915ec2cc2 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1217,30 +1217,39 @@ static struct device *grandparent(struct device *dev)
>  	return NULL;
>  }
>  
> +static struct device *endpoint_host(struct cxl_port *endpoint)
> +{
> +	struct cxl_port *port = to_cxl_port(endpoint->dev.parent);
> +
> +	if (is_cxl_root(port))
> +		return port->uport_dev;
> +	return &port->dev;
> +}

Yes, that works.

Reviewed-by: Robert Richter <rrichter@amd.com>
Tested-by: Robert Richter <rrichter@amd.com>

Thanks,

-Robert
diff mbox series

Patch

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 7ca01a834e18..d4572a02989a 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1222,12 +1222,17 @@  static void delete_endpoint(void *data)
 	struct cxl_memdev *cxlmd = data;
 	struct cxl_port *endpoint = cxlmd->endpoint;
 	struct cxl_port *parent_port;
+	struct cxl_dport *dport;
 	struct device *parent;
 
-	parent_port = cxl_mem_find_port(cxlmd, NULL);
+	parent_port = cxl_mem_find_port(cxlmd, &dport);
 	if (!parent_port)
 		goto out;
-	parent = &parent_port->dev;
+
+	if (dport->rch)
+		parent = parent_port->uport_dev;
+	else
+		parent = &parent_port->dev;
 
 	device_lock(parent);
 	if (parent->driver && !endpoint->dead) {