Message ID | 167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | CXL RAM and the 'Soft Reserved' => 'System RAM' default | expand |
On Fri, 10 Feb 2023 01:05:27 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > Testing of ram region support [1], stimulates a long standing bug in > cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to > inability to walk ports after dports have been unregistered. That > results in a failure to re-register a memdev after the port is > re-enabled leading to a crash like the following: > > cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 > general protection fault, ... > [..] > RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] > dev_name at include/linux/device.h:700 > (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 > (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 > [..] > Call Trace: > <TASK> > attach_target+0x39a/0x760 [cxl_core] > ? __mutex_unlock_slowpath+0x3a/0x290 > cxl_add_to_region+0xb8/0x340 [cxl_core] > ? lockdep_hardirqs_on+0x7d/0x100 > discover_region+0x4b/0x80 [cxl_port] > ? __pfx_discover_region+0x10/0x10 [cxl_port] > device_for_each_child+0x58/0x90 > cxl_port_probe+0x10e/0x130 [cxl_port] > cxl_bus_probe+0x17/0x50 [cxl_core] > > Change the port ancestry walk to be by depth rather than by dport. This > ensures that even if a port has unregistered its dports a deferred > memdev cleanup will still be able to cleanup the memdev's interest in > that port. > > The parent_port->dev.driver check is only needed for determining if the > bottom up removal beat the top-down removal, but cxl_ep_remove() can > always proceed. Why can cxl_ep_remove() always proceed? What stops it racing? Is it that we are holding a reference to the port at the time of the call so the release callback can't be called until we drop that? Anyhow, good to have a little more detail on the 'why' in the patch description (particularly for those reading this when half asleep like me ;) > > Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") > Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/core/memdev.c | 1 + > drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++-------------------- > drivers/cxl/cxlmem.h | 2 ++ > 3 files changed, 35 insertions(+), 26 deletions(-) > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c > index a74a93310d26..3a8bc2b06047 100644 > --- a/drivers/cxl/core/memdev.c > +++ b/drivers/cxl/core/memdev.c > @@ -246,6 +246,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, > if (rc < 0) > goto err; > cxlmd->id = rc; > + cxlmd->depth = -1; > > dev = &cxlmd->dev; > device_initialize(dev); > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 410c036c09fa..317bcf4dbd9d 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1207,6 +1207,7 @@ int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint) > > get_device(&endpoint->dev); > dev_set_drvdata(dev, endpoint); > + cxlmd->depth = endpoint->depth; > return devm_add_action_or_reset(dev, delete_endpoint, cxlmd); > } > EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL); > @@ -1241,50 +1242,55 @@ static void reap_dports(struct cxl_port *port) > } > } > > +struct detach_ctx { > + struct cxl_memdev *cxlmd; > + int depth; > +}; > static void cxl_detach_ep(void *data) > { > struct cxl_memdev *cxlmd = data; > - struct device *iter; > > - for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) { > - struct device *dport_dev = grandparent(iter); > + for (int i = cxlmd->depth - 1; i >= 1; i--) { > struct cxl_port *port, *parent_port; > + struct detach_ctx ctx = { > + .cxlmd = cxlmd, > + .depth = i, > + }; > + struct device *dev; > struct cxl_ep *ep; > bool died = false; > > - if (!dport_dev) > - break; > - > - port = find_cxl_port(dport_dev, NULL); > - if (!port) > - continue; > - > - if (is_cxl_root(port)) { > - put_device(&port->dev); > + dev = bus_find_device(&cxl_bus_type, NULL, &ctx, > + port_has_memdev); > + if (!dev) > continue; > - } > + port = to_cxl_port(dev); > > parent_port = to_cxl_port(port->dev.parent); > device_lock(&parent_port->dev); > - if (!parent_port->dev.driver) { > - /* > - * The bottom-up race to delete the port lost to a > - * top-down port disable, give up here, because the > - * parent_port ->remove() will have cleaned up all > - * descendants. > - */ > - device_unlock(&parent_port->dev); > - put_device(&port->dev); > - continue; > - } > - > device_lock(&port->dev); > ep = cxl_ep_load(port, cxlmd); > dev_dbg(&cxlmd->dev, "disconnect %s from %s\n", > ep ? dev_name(ep->ep) : "", dev_name(&port->dev)); > cxl_ep_remove(port, ep); > if (ep && !port->dead && xa_empty(&port->endpoints) && > - !is_cxl_root(parent_port)) { > + !is_cxl_root(parent_port) && parent_port->dev.driver) { > /* > * This was the last ep attached to a dynamically > * enumerated port. Block new cxl_add_ep() and garbage
Jonathan Cameron wrote: > On Fri, 10 Feb 2023 01:05:27 -0800 > Dan Williams <dan.j.williams@intel.com> wrote: > > > Testing of ram region support [1], stimulates a long standing bug in > > cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to > > inability to walk ports after dports have been unregistered. That > > results in a failure to re-register a memdev after the port is > > re-enabled leading to a crash like the following: > > > > cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 > > general protection fault, ... > > [..] > > RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] > > dev_name at include/linux/device.h:700 > > (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 > > (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 > > [..] > > Call Trace: > > <TASK> > > attach_target+0x39a/0x760 [cxl_core] > > ? __mutex_unlock_slowpath+0x3a/0x290 > > cxl_add_to_region+0xb8/0x340 [cxl_core] > > ? lockdep_hardirqs_on+0x7d/0x100 > > discover_region+0x4b/0x80 [cxl_port] > > ? __pfx_discover_region+0x10/0x10 [cxl_port] > > device_for_each_child+0x58/0x90 > > cxl_port_probe+0x10e/0x130 [cxl_port] > > cxl_bus_probe+0x17/0x50 [cxl_core] > > > > Change the port ancestry walk to be by depth rather than by dport. This > > ensures that even if a port has unregistered its dports a deferred > > memdev cleanup will still be able to cleanup the memdev's interest in > > that port. > > > > The parent_port->dev.driver check is only needed for determining if the > > bottom up removal beat the top-down removal, but cxl_ep_remove() can > > always proceed. > > Why can cxl_ep_remove() always proceed? What stops it racing? > Is it that we are holding a reference to the port at the time of the > call so the release callback can't be called until we drop that? Right, as long as a port reference is held then the cxl_ep_remove() at cxl_port_release() can not race this one from memdev removal. The result of cxl_ep_load() is guaranteed to stay stable until the subsequent put_device(). > Anyhow, good to have a little more detail on the 'why' in the patch > description (particularly for those reading this when half asleep like me ;) Long day for you, I appreciate it!
On Fri, 2023-02-10 at 01:05 -0800, Dan Williams wrote: > Testing of ram region support [1], stimulates a long standing bug in > cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to > inability to walk ports after dports have been unregistered. That > results in a failure to re-register a memdev after the port is > re-enabled leading to a crash like the following: > > cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 > general protection fault, ... > [..] > RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] > dev_name at include/linux/device.h:700 > (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 > (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 > [..] > Call Trace: > <TASK> > attach_target+0x39a/0x760 [cxl_core] > ? __mutex_unlock_slowpath+0x3a/0x290 > cxl_add_to_region+0xb8/0x340 [cxl_core] > ? lockdep_hardirqs_on+0x7d/0x100 > discover_region+0x4b/0x80 [cxl_port] > ? __pfx_discover_region+0x10/0x10 [cxl_port] > device_for_each_child+0x58/0x90 > cxl_port_probe+0x10e/0x130 [cxl_port] > cxl_bus_probe+0x17/0x50 [cxl_core] > > Change the port ancestry walk to be by depth rather than by dport. This > ensures that even if a port has unregistered its dports a deferred > memdev cleanup will still be able to cleanup the memdev's interest in > that port. > > The parent_port->dev.driver check is only needed for determining if the > bottom up removal beat the top-down removal, but cxl_ep_remove() can > always proceed. > > Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") > Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/cxl/core/memdev.c | 1 + > drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++-------------------- > drivers/cxl/cxlmem.h | 2 ++ > 3 files changed, 35 insertions(+), 26 deletions(-) Looks good, Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c > index a74a93310d26..3a8bc2b06047 100644 > --- a/drivers/cxl/core/memdev.c > +++ b/drivers/cxl/core/memdev.c > @@ -246,6 +246,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, > if (rc < 0) > goto err; > cxlmd->id = rc; > + cxlmd->depth = -1; > > dev = &cxlmd->dev; > device_initialize(dev); > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 410c036c09fa..317bcf4dbd9d 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1207,6 +1207,7 @@ int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint) > > get_device(&endpoint->dev); > dev_set_drvdata(dev, endpoint); > + cxlmd->depth = endpoint->depth; > return devm_add_action_or_reset(dev, delete_endpoint, cxlmd); > } > EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL); > @@ -1241,50 +1242,55 @@ static void reap_dports(struct cxl_port *port) > } > } > > +struct detach_ctx { > + struct cxl_memdev *cxlmd; > + int depth; > +}; > + > +static int port_has_memdev(struct device *dev, const void *data) > +{ > + const struct detach_ctx *ctx = data; > + struct cxl_port *port; > + > + if (!is_cxl_port(dev)) > + return 0; > + > + port = to_cxl_port(dev); > + if (port->depth != ctx->depth) > + return 0; > + > + return !!cxl_ep_load(port, ctx->cxlmd); > +} > + > static void cxl_detach_ep(void *data) > { > struct cxl_memdev *cxlmd = data; > - struct device *iter; > > - for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) { > - struct device *dport_dev = grandparent(iter); > + for (int i = cxlmd->depth - 1; i >= 1; i--) { > struct cxl_port *port, *parent_port; > + struct detach_ctx ctx = { > + .cxlmd = cxlmd, > + .depth = i, > + }; > + struct device *dev; > struct cxl_ep *ep; > bool died = false; > > - if (!dport_dev) > - break; > - > - port = find_cxl_port(dport_dev, NULL); > - if (!port) > - continue; > - > - if (is_cxl_root(port)) { > - put_device(&port->dev); > + dev = bus_find_device(&cxl_bus_type, NULL, &ctx, > + port_has_memdev); > + if (!dev) > continue; > - } > + port = to_cxl_port(dev); > > parent_port = to_cxl_port(port->dev.parent); > device_lock(&parent_port->dev); > - if (!parent_port->dev.driver) { > - /* > - * The bottom-up race to delete the port lost to a > - * top-down port disable, give up here, because the > - * parent_port ->remove() will have cleaned up all > - * descendants. > - */ > - device_unlock(&parent_port->dev); > - put_device(&port->dev); > - continue; > - } > - > device_lock(&port->dev); > ep = cxl_ep_load(port, cxlmd); > dev_dbg(&cxlmd->dev, "disconnect %s from %s\n", > ep ? dev_name(ep->ep) : "", dev_name(&port->dev)); > cxl_ep_remove(port, ep); > if (ep && !port->dead && xa_empty(&port->endpoints) && > - !is_cxl_root(parent_port)) { > + !is_cxl_root(parent_port) && parent_port->dev.driver) { > /* > * This was the last ep attached to a dynamically > * enumerated port. Block new cxl_add_ep() and garbage > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index ab138004f644..c9da3c699a21 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -38,6 +38,7 @@ > * @cxl_nvb: coordinate removal of @cxl_nvd if present > * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem > * @id: id number of this memdev instance. > + * @depth: endpoint port depth > */ > struct cxl_memdev { > struct device dev; > @@ -47,6 +48,7 @@ struct cxl_memdev { > struct cxl_nvdimm_bridge *cxl_nvb; > struct cxl_nvdimm *cxl_nvd; > int id; > + int depth; > }; > > static inline struct cxl_memdev *to_cxl_memdev(struct device *dev) >
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index a74a93310d26..3a8bc2b06047 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -246,6 +246,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, if (rc < 0) goto err; cxlmd->id = rc; + cxlmd->depth = -1; dev = &cxlmd->dev; device_initialize(dev); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 410c036c09fa..317bcf4dbd9d 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1207,6 +1207,7 @@ int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint) get_device(&endpoint->dev); dev_set_drvdata(dev, endpoint); + cxlmd->depth = endpoint->depth; return devm_add_action_or_reset(dev, delete_endpoint, cxlmd); } EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL); @@ -1241,50 +1242,55 @@ static void reap_dports(struct cxl_port *port) } } +struct detach_ctx { + struct cxl_memdev *cxlmd; + int depth; +}; + +static int port_has_memdev(struct device *dev, const void *data) +{ + const struct detach_ctx *ctx = data; + struct cxl_port *port; + + if (!is_cxl_port(dev)) + return 0; + + port = to_cxl_port(dev); + if (port->depth != ctx->depth) + return 0; + + return !!cxl_ep_load(port, ctx->cxlmd); +} + static void cxl_detach_ep(void *data) { struct cxl_memdev *cxlmd = data; - struct device *iter; - for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) { - struct device *dport_dev = grandparent(iter); + for (int i = cxlmd->depth - 1; i >= 1; i--) { struct cxl_port *port, *parent_port; + struct detach_ctx ctx = { + .cxlmd = cxlmd, + .depth = i, + }; + struct device *dev; struct cxl_ep *ep; bool died = false; - if (!dport_dev) - break; - - port = find_cxl_port(dport_dev, NULL); - if (!port) - continue; - - if (is_cxl_root(port)) { - put_device(&port->dev); + dev = bus_find_device(&cxl_bus_type, NULL, &ctx, + port_has_memdev); + if (!dev) continue; - } + port = to_cxl_port(dev); parent_port = to_cxl_port(port->dev.parent); device_lock(&parent_port->dev); - if (!parent_port->dev.driver) { - /* - * The bottom-up race to delete the port lost to a - * top-down port disable, give up here, because the - * parent_port ->remove() will have cleaned up all - * descendants. - */ - device_unlock(&parent_port->dev); - put_device(&port->dev); - continue; - } - device_lock(&port->dev); ep = cxl_ep_load(port, cxlmd); dev_dbg(&cxlmd->dev, "disconnect %s from %s\n", ep ? dev_name(ep->ep) : "", dev_name(&port->dev)); cxl_ep_remove(port, ep); if (ep && !port->dead && xa_empty(&port->endpoints) && - !is_cxl_root(parent_port)) { + !is_cxl_root(parent_port) && parent_port->dev.driver) { /* * This was the last ep attached to a dynamically * enumerated port. Block new cxl_add_ep() and garbage diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index ab138004f644..c9da3c699a21 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -38,6 +38,7 @@ * @cxl_nvb: coordinate removal of @cxl_nvd if present * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem * @id: id number of this memdev instance. + * @depth: endpoint port depth */ struct cxl_memdev { struct device dev; @@ -47,6 +48,7 @@ struct cxl_memdev { struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_nvdimm *cxl_nvd; int id; + int depth; }; static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/core/memdev.c | 1 + drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++-------------------- drivers/cxl/cxlmem.h | 2 ++ 3 files changed, 35 insertions(+), 26 deletions(-)