Message ID | 20240813-fix-notifiers-v1-1-efd23a18688d@intel.com |
---|---|
State | Superseded |
Headers | show |
Series | cxl/region: Remove lock from memory notifier callback | expand |
Ira Weiny wrote: > In testing Dynamic Capacity Device (DCD) support, a lockdep splat > revealed an ABBA issue between the memory notifiers and the DCD extent > processing code.[0] Changing the lock ordering within DCD proved > difficult because regions must be stable while searching for the proper > region and then the device lock must be held to properly notify the DAX > region driver of memory changes. > > Dan points out in the thread that notifiers should be able to trust that > it is safe to access static data. Region data is static once the device > is realized and until it's destruction. Thus it is better to manage the > notifiers within the region driver. > > Remove the need for a lock by ensuring the notifiers are active only > during the region's lifetime. > > Link: https://lore.kernel.org/all/66b4cf539a79b_a36e829416@iweiny-mobl.notmuch/ [0] > Cc: Huang, Ying <ying.huang@intel.com> > Suggested-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > --- > drivers/cxl/core/region.c | 31 ++++++++++++++++++++----------- > 1 file changed, 20 insertions(+), 11 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 21ad5f242875..971a314b6b0e 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c [..] > @@ -2396,7 +2394,6 @@ static int cxl_region_nid(struct cxl_region *cxlr) > struct cxl_region_params *p = &cxlr->params; > struct resource *res; > > - guard(rwsem_read)(&cxl_region_rwsem); > res = p->res; > if (!res) > return NUMA_NO_NODE; The cxl_region_nid() helper is now completely unnecessary because not only is a lock not needed to read cxl_region_params, but p->res is guaranteed to be non-NULL. cxl_region_nid() also needs to be killed so that nothing else tries to use it that might *need* the lock.
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 21ad5f242875..971a314b6b0e 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2313,8 +2313,6 @@ static void unregister_region(void *_cxlr) struct cxl_region_params *p = &cxlr->params; int i; - unregister_memory_notifier(&cxlr->memory_notifier); - unregister_mt_adistance_algorithm(&cxlr->adist_notifier); device_del(&cxlr->dev); /* @@ -2396,7 +2394,6 @@ static int cxl_region_nid(struct cxl_region *cxlr) struct cxl_region_params *p = &cxlr->params; struct resource *res; - guard(rwsem_read)(&cxl_region_rwsem); res = p->res; if (!res) return NUMA_NO_NODE; @@ -2484,14 +2481,6 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, if (rc) goto err; - cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; - cxlr->memory_notifier.priority = CXL_CALLBACK_PRI; - register_memory_notifier(&cxlr->memory_notifier); - - cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance; - cxlr->adist_notifier.priority = 100; - register_mt_adistance_algorithm(&cxlr->adist_notifier); - rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr); if (rc) return ERR_PTR(rc); @@ -3386,6 +3375,14 @@ static int is_system_ram(struct resource *res, void *arg) return 1; } +static void shutdown_notifiers(void *_cxlr) +{ + struct cxl_region *cxlr = _cxlr; + + unregister_memory_notifier(&cxlr->memory_notifier); + unregister_mt_adistance_algorithm(&cxlr->adist_notifier); +} + static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); @@ -3418,6 +3415,18 @@ static int cxl_region_probe(struct device *dev) out: up_read(&cxl_region_rwsem); + if (rc) + return rc; + + cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; + cxlr->memory_notifier.priority = CXL_CALLBACK_PRI; + register_memory_notifier(&cxlr->memory_notifier); + + cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance; + cxlr->adist_notifier.priority = 100; + register_mt_adistance_algorithm(&cxlr->adist_notifier); + + rc = devm_add_action_or_reset(&cxlr->dev, shutdown_notifiers, cxlr); if (rc) return rc;
In testing Dynamic Capacity Device (DCD) support, a lockdep splat revealed an ABBA issue between the memory notifiers and the DCD extent processing code.[0] Changing the lock ordering within DCD proved difficult because regions must be stable while searching for the proper region and then the device lock must be held to properly notify the DAX region driver of memory changes. Dan points out in the thread that notifiers should be able to trust that it is safe to access static data. Region data is static once the device is realized and until it's destruction. Thus it is better to manage the notifiers within the region driver. Remove the need for a lock by ensuring the notifiers are active only during the region's lifetime. Link: https://lore.kernel.org/all/66b4cf539a79b_a36e829416@iweiny-mobl.notmuch/ [0] Cc: Huang, Ying <ying.huang@intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- drivers/cxl/core/region.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) --- base-commit: afdab700f65e14070d8ab92175544b1c62b8bf03 change-id: 20240813-fix-notifiers-99c350b044a2 Best regards,