diff mbox series

[v2] cxl/region: Remove lock from memory notifier callback

Message ID 20240814-fix-notifiers-v2-1-6bab38192c7c@intel.com
State Superseded
Headers show
Series [v2] cxl/region: Remove lock from memory notifier callback | expand

Commit Message

Ira Weiny Aug. 14, 2024, 5:49 p.m. UTC
In testing Dynamic Capacity Device (DCD) support, a lockdep splat
revealed an ABBA issue between the memory notifiers and the DCD extent
processing code.[0]  Changing the lock ordering within DCD proved
difficult because regions must be stable while searching for the proper
region and then the device lock must be held to properly notify the DAX
region driver of memory changes.

Dan points out in the thread that notifiers should be able to trust that
it is safe to access static data.  Region data is static once the device
is realized and until it's destruction.  Thus it is better to manage the
notifiers within the region driver.

Remove the need for a lock by ensuring the notifiers are active only
during the region's lifetime.

Furthermore, remove cxl_region_nid() because resource can't be NULL
while the region is stable.

Link: https://lore.kernel.org/all/66b4cf539a79b_a36e829416@iweiny-mobl.notmuch/ [0]
Cc: Huang, Ying <ying.huang@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
Changes in v2:
- [djbw: remove cxl_region_nid()]
- Link to v1: https://patch.msgid.link/20240813-fix-notifiers-v1-1-efd23a18688d@intel.com
---
 drivers/cxl/core/region.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)


---
base-commit: 6b0f8db921abf0520081d779876d3a41069dab95
change-id: 20240813-fix-notifiers-99c350b044a2

Best regards,

Comments

Davidlohr Bueso Aug. 15, 2024, 4:17 p.m. UTC | #1
On Wed, 14 Aug 2024, Ira Weiny wrote:

>In testing Dynamic Capacity Device (DCD) support, a lockdep splat
>revealed an ABBA issue between the memory notifiers and the DCD extent
>processing code.[0]  Changing the lock ordering within DCD proved
>difficult because regions must be stable while searching for the proper
>region and then the device lock must be held to properly notify the DAX
>region driver of memory changes.
>
>Dan points out in the thread that notifiers should be able to trust that
>it is safe to access static data.  Region data is static once the device
>is realized and until it's destruction.  Thus it is better to manage the
>notifiers within the region driver.
>
>Remove the need for a lock by ensuring the notifiers are active only
>during the region's lifetime.

Agreed, this is better.

>Furthermore, remove cxl_region_nid() because resource can't be NULL
>while the region is stable.

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Jonathan Cameron Aug. 27, 2024, 4:31 p.m. UTC | #2
On Wed, 14 Aug 2024 12:49:39 -0500
Ira Weiny <ira.weiny@intel.com> wrote:

> In testing Dynamic Capacity Device (DCD) support, a lockdep splat
> revealed an ABBA issue between the memory notifiers and the DCD extent
> processing code.[0]  Changing the lock ordering within DCD proved
> difficult because regions must be stable while searching for the proper
> region and then the device lock must be held to properly notify the DAX
> region driver of memory changes.
> 
> Dan points out in the thread that notifiers should be able to trust that
> it is safe to access static data.  Region data is static once the device
> is realized and until it's destruction.  Thus it is better to manage the
> notifiers within the region driver.
> 
> Remove the need for a lock by ensuring the notifiers are active only
> during the region's lifetime.
> 
> Furthermore, remove cxl_region_nid() because resource can't be NULL
> while the region is stable.
> 
> Link: https://lore.kernel.org/all/66b4cf539a79b_a36e829416@iweiny-mobl.notmuch/ [0]
> Cc: Huang, Ying <ying.huang@intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Seems a sensible cleanup irrespective of the bug / future issue.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Huang, Ying Sept. 4, 2024, 2:14 a.m. UTC | #3
Hi, Ira,

Ira Weiny <ira.weiny@intel.com> writes:

[snip]

> @@ -2391,18 +2389,6 @@ static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid)
>  	return true;
>  }
>  
> -static int cxl_region_nid(struct cxl_region *cxlr)
> -{
> -	struct cxl_region_params *p = &cxlr->params;
> -	struct resource *res;
> -
> -	guard(rwsem_read)(&cxl_region_rwsem);
> -	res = p->res;
> -	if (!res)
> -		return NUMA_NO_NODE;
> -	return phys_to_target_node(res->start);
> -}
> -
>  static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
>  					  unsigned long action, void *arg)
>  {
> @@ -2415,7 +2401,7 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
>  	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
>  		return NOTIFY_DONE;
>  
> -	region_nid = cxl_region_nid(cxlr);
> +	region_nid = phys_to_target_node(cxlr->params.res->start);

Better to add some comments about why we don't need to hold
cxl_region_rwsem to access cxlr->params.res here?

Otherwise, LGTM, Thanks!  Feel free to add

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

in the future versions.

>  	if (nid != region_nid)
>  		return NOTIFY_DONE;
>  
> @@ -2434,7 +2420,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb,
>  	int *adist = data;
>  	int region_nid;
>  
> -	region_nid = cxl_region_nid(cxlr);
> +	region_nid = phys_to_target_node(cxlr->params.res->start);
>  	if (nid != region_nid)
>  		return NOTIFY_OK;
>  

[snip]

--
Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 21ad5f242875..588add3536c3 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2313,8 +2313,6 @@  static void unregister_region(void *_cxlr)
 	struct cxl_region_params *p = &cxlr->params;
 	int i;
 
-	unregister_memory_notifier(&cxlr->memory_notifier);
-	unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
 	device_del(&cxlr->dev);
 
 	/*
@@ -2391,18 +2389,6 @@  static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid)
 	return true;
 }
 
-static int cxl_region_nid(struct cxl_region *cxlr)
-{
-	struct cxl_region_params *p = &cxlr->params;
-	struct resource *res;
-
-	guard(rwsem_read)(&cxl_region_rwsem);
-	res = p->res;
-	if (!res)
-		return NUMA_NO_NODE;
-	return phys_to_target_node(res->start);
-}
-
 static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
 					  unsigned long action, void *arg)
 {
@@ -2415,7 +2401,7 @@  static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
 	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
 		return NOTIFY_DONE;
 
-	region_nid = cxl_region_nid(cxlr);
+	region_nid = phys_to_target_node(cxlr->params.res->start);
 	if (nid != region_nid)
 		return NOTIFY_DONE;
 
@@ -2434,7 +2420,7 @@  static int cxl_region_calculate_adistance(struct notifier_block *nb,
 	int *adist = data;
 	int region_nid;
 
-	region_nid = cxl_region_nid(cxlr);
+	region_nid = phys_to_target_node(cxlr->params.res->start);
 	if (nid != region_nid)
 		return NOTIFY_OK;
 
@@ -2484,14 +2470,6 @@  static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
 	if (rc)
 		goto err;
 
-	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
-	cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
-	register_memory_notifier(&cxlr->memory_notifier);
-
-	cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
-	cxlr->adist_notifier.priority = 100;
-	register_mt_adistance_algorithm(&cxlr->adist_notifier);
-
 	rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
 	if (rc)
 		return ERR_PTR(rc);
@@ -3386,6 +3364,14 @@  static int is_system_ram(struct resource *res, void *arg)
 	return 1;
 }
 
+static void shutdown_notifiers(void *_cxlr)
+{
+	struct cxl_region *cxlr = _cxlr;
+
+	unregister_memory_notifier(&cxlr->memory_notifier);
+	unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
+}
+
 static int cxl_region_probe(struct device *dev)
 {
 	struct cxl_region *cxlr = to_cxl_region(dev);
@@ -3418,6 +3404,18 @@  static int cxl_region_probe(struct device *dev)
 out:
 	up_read(&cxl_region_rwsem);
 
+	if (rc)
+		return rc;
+
+	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
+	cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
+	register_memory_notifier(&cxlr->memory_notifier);
+
+	cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
+	cxlr->adist_notifier.priority = 100;
+	register_mt_adistance_algorithm(&cxlr->adist_notifier);
+
+	rc = devm_add_action_or_reset(&cxlr->dev, shutdown_notifiers, cxlr);
 	if (rc)
 		return rc;