diff mbox series

[RT] nvdimm: make lane acquirement RT aware

Message ID 20190306095709.23138-1-yongxin.liu@windriver.com (mailing list archive)
State New, archived
Headers show
Series [RT] nvdimm: make lane acquirement RT aware | expand

Commit Message

Yongxin Liu March 6, 2019, 9:57 a.m. UTC
Currently, nvdimm driver isn't RT compatible.
nd_region_acquire_lane() disables preemption with get_cpu() which
causes "scheduling while atomic" spews on RT, when using fio to test
pmem as block device.

In this change, we replace get_cpu/put_cpu with local_lock_cpu/
local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
Due to preemption on RT, this lock can avoid race condition for the
same lane on the same CPU. When CPU number is greater than the lane
number, lane can be shared among CPUs. "ndl_lock->lock" is used to
protect the lane in this situation.

This patch is derived from Dan Williams and Pankaj Gupta's proposal from
https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg13359.html
and https://www.spinics.net/lists/linux-rt-users/msg20280.html.
Many thanks to them.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
---
 drivers/nvdimm/region_devs.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

Comments

Dan Williams March 6, 2019, 4:35 p.m. UTC | #1
On Wed, Mar 6, 2019 at 2:05 AM Yongxin Liu <yongxin.liu@windriver.com> wrote:
>
> Currently, nvdimm driver isn't RT compatible.
> nd_region_acquire_lane() disables preemption with get_cpu() which
> causes "scheduling while atomic" spews on RT, when using fio to test
> pmem as block device.
>
> In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> Due to preemption on RT, this lock can avoid race condition for the
> same lane on the same CPU. When CPU number is greater than the lane
> number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> protect the lane in this situation.
>
> This patch is derived from Dan Williams and Pankaj Gupta's proposal from
> https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg13359.html
> and https://www.spinics.net/lists/linux-rt-users/msg20280.html.
> Many thanks to them.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Pankaj Gupta <pagupta@redhat.com>
> Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
> Cc: linux-nvdimm <linux-nvdimm@lists.01.org>
> Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>

Looks ok to me in concept.

Acked-by: Dan Williams <dan.j.williams@intel.com>
Sebastian Andrzej Siewior March 7, 2019, 2:33 p.m. UTC | #2
On 2019-03-06 17:57:09 [+0800], Yongxin Liu wrote:
> In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> Due to preemption on RT, this lock can avoid race condition for the
> same lane on the same CPU. When CPU number is greater than the lane
> number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> protect the lane in this situation.

so what was the reason that get_cpu() can't be replaced with
raw_smp_processor_id()?

Sebastian
Yongxin Liu March 8, 2019, 12:07 a.m. UTC | #3
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Thursday, March 7, 2019 22:34
> To: Liu, Yongxin
> Cc: linux-kernel@vger.kernel.org; linux-rt-users@vger.kernel.org;
> tglx@linutronix.de; rostedt@goodmis.org; dan.j.williams@intel.com;
> pagupta@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> 
> On 2019-03-06 17:57:09 [+0800], Yongxin Liu wrote:
> > In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> > local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> > Due to preemption on RT, this lock can avoid race condition for the
> > same lane on the same CPU. When CPU number is greater than the lane
> > number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> > protect the lane in this situation.
> 
> so what was the reason that get_cpu() can't be replaced with
> raw_smp_processor_id()?
> 
> Sebastian

The lane is critical resource which needs to be protected. One CPU can use only one
lane. If CPU number is greater than the number of total lane, the lane can be shared
among CPUs. 

In non-RT kernel, get_cpu() disable preemption by calling preempt_disable() first.
Only one thread on the same CPU can get the lane.

In RT kernel, if we only use raw_smp_processor_id(), this doesn't protect the lane. 
Thus two threads on the same CPU can get the same lane at the same time.

In this patch, two-level lock can avoid race condition for the lane.

          CPU A                  CPU B (B == A % num_lanes)
 
    task A1    task A2     task B1    task B2
       |          |           |          |
       |__________|           |__________|
            |                      |
       ndl_local_lock           ndl_local_lock
            |                      |
            |______________________|
                       |
                       |
                  ndl_lock->lock
                       |
                       |
                      lane

 
Thanks,
Yongxin
Pankaj Gupta March 8, 2019, 6:31 a.m. UTC | #4
> Currently, nvdimm driver isn't RT compatible.
> nd_region_acquire_lane() disables preemption with get_cpu() which
> causes "scheduling while atomic" spews on RT, when using fio to test
> pmem as block device.
> 
> In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> Due to preemption on RT, this lock can avoid race condition for the
> same lane on the same CPU. When CPU number is greater than the lane
> number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> protect the lane in this situation.
> 
> This patch is derived from Dan Williams and Pankaj Gupta's proposal from
> https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg13359.html
> and https://www.spinics.net/lists/linux-rt-users/msg20280.html.
> Many thanks to them.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Pankaj Gupta <pagupta@redhat.com>
> Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
> Cc: linux-nvdimm <linux-nvdimm@lists.01.org>
> Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>

This patch looks good to me.

Acked-by: Pankaj Gupta <pagupta@redhat.com>

> ---
>  drivers/nvdimm/region_devs.c | 40 +++++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index fa37afcd43ff..6c5388cf2477 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -18,9 +18,13 @@
>  #include <linux/sort.h>
>  #include <linux/io.h>
>  #include <linux/nd.h>
> +#include <linux/locallock.h>
>  #include "nd-core.h"
>  #include "nd.h"
>  
> +/* lock for tasks on the same CPU to sequence the access to the lane */
> +static DEFINE_LOCAL_IRQ_LOCK(ndl_local_lock);
> +
>  /*
>   * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
>   * irrelevant.
> @@ -935,18 +939,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
>  {
>  	unsigned int cpu, lane;
> +	struct nd_percpu_lane *ndl_lock, *ndl_count;
>  
> -	cpu = get_cpu();
> -	if (nd_region->num_lanes < nr_cpu_ids) {
> -		struct nd_percpu_lane *ndl_lock, *ndl_count;
> +	cpu = local_lock_cpu(ndl_local_lock);
>  
> -		lane = cpu % nd_region->num_lanes;
> -		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -		if (ndl_count->count++ == 0)
> -			spin_lock(&ndl_lock->lock);
> -	} else
> -		lane = cpu;
> +	lane = cpu % nd_region->num_lanes;
> +	ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> +	ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> +	if (ndl_count->count++ == 0)
> +		spin_lock(&ndl_lock->lock);
>  
>  	return lane;
>  }
> @@ -954,17 +955,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
>  
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
>  {
> -	if (nd_region->num_lanes < nr_cpu_ids) {
> -		unsigned int cpu = get_cpu();
> -		struct nd_percpu_lane *ndl_lock, *ndl_count;
> -
> -		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -		if (--ndl_count->count == 0)
> -			spin_unlock(&ndl_lock->lock);
> -		put_cpu();
> -	}
> -	put_cpu();
> +	struct nd_percpu_lane *ndl_lock, *ndl_count;
> +	unsigned int cpu = smp_processor_id();
> +
> +	ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> +	ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> +	if (--ndl_count->count == 0)
> +		spin_unlock(&ndl_lock->lock);
> +	local_unlock_cpu(ndl_local_lock);
>  }
>  EXPORT_SYMBOL(nd_region_release_lane);
>  
> --
> 2.14.4
> 
>
Sebastian Andrzej Siewior March 8, 2019, 9:41 a.m. UTC | #5
On 2019-03-08 00:07:41 [+0000], Liu, Yongxin wrote:
> The lane is critical resource which needs to be protected. One CPU can use only one
> lane. If CPU number is greater than the number of total lane, the lane can be shared
> among CPUs. 
> 
> In non-RT kernel, get_cpu() disable preemption by calling preempt_disable() first.
> Only one thread on the same CPU can get the lane.
> 
> In RT kernel, if we only use raw_smp_processor_id(), this doesn't protect the lane. 
> Thus two threads on the same CPU can get the same lane at the same time.
> 
> In this patch, two-level lock can avoid race condition for the lane.

but you still have the ndl_lock->lock which protects the resource. So in
the unlikely (but possible event) that you switch CPUs after obtaining
the CPU number you block on the lock. No harm is done, right?

> Thanks,
> Yongxin

Sebastian
Yongxin Liu March 11, 2019, 12:44 a.m. UTC | #6
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Friday, March 8, 2019 17:42
> To: Liu, Yongxin
> Cc: linux-kernel@vger.kernel.org; linux-rt-users@vger.kernel.org;
> tglx@linutronix.de; rostedt@goodmis.org; dan.j.williams@intel.com;
> pagupta@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> 
> On 2019-03-08 00:07:41 [+0000], Liu, Yongxin wrote:
> > The lane is critical resource which needs to be protected. One CPU can
> use only one
> > lane. If CPU number is greater than the number of total lane, the lane
> can be shared
> > among CPUs.
> >
> > In non-RT kernel, get_cpu() disable preemption by calling
> preempt_disable() first.
> > Only one thread on the same CPU can get the lane.
> >
> > In RT kernel, if we only use raw_smp_processor_id(), this doesn't
> protect the lane.
> > Thus two threads on the same CPU can get the same lane at the same time.
> >
> > In this patch, two-level lock can avoid race condition for the lane.
> 
> but you still have the ndl_lock->lock which protects the resource. So in
> the unlikely (but possible event) that you switch CPUs after obtaining
> the CPU number you block on the lock. No harm is done, right?

The resource "lane" can be acquired recursively, so "ndl_lock->lock" is a conditional lock.

ndl_count->count is per CPU.
ndl_lock->lock is per lane.

Here is an example:
Thread A  on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> get "ndl_lock->lock"
--> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" due to "ndl_count->count++".

Thread B on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" ("ndl_count->count"
was changed by Thread A)

If we use raw_smp_processor_id(), no matter which CPU the thread was migrated to, 
if there is another thread running on the old CPU, there will be race condition 
due to per CPU variable "ndl_count->count".


Thanks,
Yongxin

> 
> Sebastian
Sebastian Andrzej Siewior March 15, 2019, 4:42 p.m. UTC | #7
On 2019-03-11 00:44:58 [+0000], Liu, Yongxin wrote:
> > but you still have the ndl_lock->lock which protects the resource. So in
> > the unlikely (but possible event) that you switch CPUs after obtaining
> > the CPU number you block on the lock. No harm is done, right?
> 
> The resource "lane" can be acquired recursively, so "ndl_lock->lock" is a conditional lock.
> 
> ndl_count->count is per CPU.
> ndl_lock->lock is per lane.
> 
> Here is an example:
> Thread A  on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> get "ndl_lock->lock"
> --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" due to "ndl_count->count++".
> 
> Thread B on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" ("ndl_count->count"
> was changed by Thread A)
> 
> If we use raw_smp_processor_id(), no matter which CPU the thread was migrated to, 
> if there is another thread running on the old CPU, there will be race condition 
> due to per CPU variable "ndl_count->count".

so I've been looking at it again. The recursive locking could have been
solved better. Like the local_lock() on -RT is doing it.
Given that you lock with preempt_disable() there should be no in-IRQ
usage.
But in the "nd_region->num_lanes >= nr_cpu_ids" case you don't take any
locks. That would be a problem with raw_smp_processor_id() approach.

So what about the completely untested patch here:

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 379bf4305e615..98c2e9df4b2e4 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -109,7 +109,8 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd);
 			res; res = next, next = next ? next->sibling : NULL)
 
 struct nd_percpu_lane {
-	int count;
+	struct task_struct *owner;
+	int nestcnt;
 	spinlock_t lock;
 };
 
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e2818f94f2928..8a62f9833513f 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -946,19 +946,17 @@ int nd_blk_region_init(struct nd_region *nd_region)
  */
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
 {
+	struct nd_percpu_lane *ndl_lock;
 	unsigned int cpu, lane;
 
-	cpu = get_cpu();
-	if (nd_region->num_lanes < nr_cpu_ids) {
-		struct nd_percpu_lane *ndl_lock, *ndl_count;
-
-		lane = cpu % nd_region->num_lanes;
-		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-		if (ndl_count->count++ == 0)
-			spin_lock(&ndl_lock->lock);
-	} else
-		lane = cpu;
+	cpu = raw_smp_processor_id();
+	lane = cpu % nd_region->num_lanes;
+	ndl_lock  = per_cpu_ptr(nd_region->lane, lane);
+	if (ndl_lock->owner != current) {
+		spin_lock(&ndl_lock->lock);
+		ndl_lock->owner = current;
+	}
+	ndl_lock->nestcnt++;
 
 	return lane;
 }
@@ -966,17 +964,16 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
 
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
 {
-	if (nd_region->num_lanes < nr_cpu_ids) {
-		unsigned int cpu = get_cpu();
-		struct nd_percpu_lane *ndl_lock, *ndl_count;
+	struct nd_percpu_lane *ndl_lock;
 
-		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-		if (--ndl_count->count == 0)
-			spin_unlock(&ndl_lock->lock);
-		put_cpu();
-	}
-	put_cpu();
+	ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+	WARN_ON(ndl_lock->nestcnt == 0);
+	WARN_ON(ndl_lock->owner != current);
+	if (--ndl_lock->nestcnt)
+		return;
+
+	ndl_lock->owner = NULL;
+	spin_unlock(&ndl_lock->lock);
 }
 EXPORT_SYMBOL(nd_region_release_lane);
 
@@ -1042,7 +1039,8 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 
 		ndl = per_cpu_ptr(nd_region->lane, i);
 		spin_lock_init(&ndl->lock);
-		ndl->count = 0;
+		ndl->owner = NULL;
+		ndl->nestcnt = 0;
 	}
 
 	for (i = 0; i < ndr_desc->num_mappings; i++) {

> Thanks,
> Yongxin

Sebastian
Yongxin Liu March 18, 2019, 1:41 a.m. UTC | #8
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Saturday, March 16, 2019 00:43
> To: Liu, Yongxin
> Cc: linux-kernel@vger.kernel.org; linux-rt-users@vger.kernel.org;
> tglx@linutronix.de; rostedt@goodmis.org; dan.j.williams@intel.com;
> pagupta@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> 
> On 2019-03-11 00:44:58 [+0000], Liu, Yongxin wrote:
> > > but you still have the ndl_lock->lock which protects the resource. So
> in
> > > the unlikely (but possible event) that you switch CPUs after
> obtaining
> > > the CPU number you block on the lock. No harm is done, right?
> >
> > The resource "lane" can be acquired recursively, so "ndl_lock->lock" is
> a conditional lock.
> >
> > ndl_count->count is per CPU.
> > ndl_lock->lock is per lane.
> >
> > Here is an example:
> > Thread A  on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> get
> "ndl_lock->lock"
> > --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" due
> to "ndl_count->count++".
> >
> > Thread B on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> bypass
> "ndl_lock->lock" ("ndl_count->count"
> > was changed by Thread A)
> >
> > If we use raw_smp_processor_id(), no matter which CPU the thread was
> migrated to,
> > if there is another thread running on the old CPU, there will be race
> condition
> > due to per CPU variable "ndl_count->count".
> 
> so I've been looking at it again. The recursive locking could have been
> solved better. Like the local_lock() on -RT is doing it.
> Given that you lock with preempt_disable() there should be no in-IRQ
> usage.
> But in the "nd_region->num_lanes >= nr_cpu_ids" case you don't take any
> locks. That would be a problem with raw_smp_processor_id() approach.
> 
> So what about the completely untested patch here:
> 
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index 379bf4305e615..98c2e9df4b2e4 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -109,7 +109,8 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata
> *ndd);
>  			res; res = next, next = next ? next->sibling : NULL)
> 
>  struct nd_percpu_lane {
> -	int count;
> +	struct task_struct *owner;
> +	int nestcnt;
>  	spinlock_t lock;
>  };
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index e2818f94f2928..8a62f9833513f 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -946,19 +946,17 @@ int nd_blk_region_init(struct nd_region *nd_region)
>   */
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
>  {
> +	struct nd_percpu_lane *ndl_lock;
>  	unsigned int cpu, lane;
> 
> -	cpu = get_cpu();
> -	if (nd_region->num_lanes < nr_cpu_ids) {
> -		struct nd_percpu_lane *ndl_lock, *ndl_count;
> -
> -		lane = cpu % nd_region->num_lanes;
> -		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -		if (ndl_count->count++ == 0)
> -			spin_lock(&ndl_lock->lock);
> -	} else
> -		lane = cpu;
> +	cpu = raw_smp_processor_id();
> +	lane = cpu % nd_region->num_lanes;
> +	ndl_lock  = per_cpu_ptr(nd_region->lane, lane);
> +	if (ndl_lock->owner != current) {
> +		spin_lock(&ndl_lock->lock);
> +		ndl_lock->owner = current;
> +	}
> +	ndl_lock->nestcnt++;
> 
>  	return lane;
>  }
> @@ -966,17 +964,16 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
> 
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int
> lane)
>  {
> -	if (nd_region->num_lanes < nr_cpu_ids) {
> -		unsigned int cpu = get_cpu();
> -		struct nd_percpu_lane *ndl_lock, *ndl_count;
> +	struct nd_percpu_lane *ndl_lock;
> 
> -		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -		if (--ndl_count->count == 0)
> -			spin_unlock(&ndl_lock->lock);
> -		put_cpu();
> -	}
> -	put_cpu();
> +	ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> +	WARN_ON(ndl_lock->nestcnt == 0);
> +	WARN_ON(ndl_lock->owner != current);
> +	if (--ndl_lock->nestcnt)
> +		return;
> +
> +	ndl_lock->owner = NULL;
> +	spin_unlock(&ndl_lock->lock);
>  }
>  EXPORT_SYMBOL(nd_region_release_lane);
> 
> @@ -1042,7 +1039,8 @@ static struct nd_region *nd_region_create(struct
> nvdimm_bus *nvdimm_bus,
> 
>  		ndl = per_cpu_ptr(nd_region->lane, i);
>  		spin_lock_init(&ndl->lock);
> -		ndl->count = 0;
> +		ndl->owner = NULL;
> +		ndl->nestcnt = 0;
>  	}
> 
>  	for (i = 0; i < ndr_desc->num_mappings; i++) {
> 
> > Thanks,
> > Yongxin
> 

Consider the recursive call to nd_region_acquire_lane() in the following situation.
Will there be a dead lock?


    Thread A                    Thread B
       |                           |
       |                           |
     CPU 1                       CPU 2
       |                           |
       |                           |
 get lock for Lane 1         get lock for Lane 2
       |                           |
       |                           |
 migrate to CPU 2            migrate to CPU 1
       |                           |
       |                           |
 wait lock for Lane 2        wait lock for Lane 1 
       |                           |
       |                           |
       _____________________________
                   |
                dead lock ?


Thanks,
Yognxin


> Sebastian
Sebastian Andrzej Siewior March 18, 2019, 11:40 a.m. UTC | #9
On 2019-03-18 01:41:10 [+0000], Liu, Yongxin wrote:
> 
> Consider the recursive call to nd_region_acquire_lane() in the following situation.
> Will there be a dead lock?
> 
> 
>     Thread A                    Thread B
>        |                           |
>        |                           |
>      CPU 1                       CPU 2
>        |                           |
>        |                           |
>  get lock for Lane 1         get lock for Lane 2
>        |                           |
>        |                           |
>  migrate to CPU 2            migrate to CPU 1
>        |                           |
>        |                           |
>  wait lock for Lane 2        wait lock for Lane 1 
>        |                           |
>        |                           |
>        _____________________________
>                    |
>                 dead lock ?

Bummer. That would dead lock indeed.
Is it easily possible to recognize the recursive case? 

> 
> Thanks,
> Yognxin

Sebastian
Yongxin Liu March 18, 2019, 11:48 a.m. UTC | #10
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Monday, March 18, 2019 19:40
> To: Liu, Yongxin
> Cc: linux-kernel@vger.kernel.org; linux-rt-users@vger.kernel.org;
> tglx@linutronix.de; rostedt@goodmis.org; dan.j.williams@intel.com;
> pagupta@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> 
> On 2019-03-18 01:41:10 [+0000], Liu, Yongxin wrote:
> >
> > Consider the recursive call to nd_region_acquire_lane() in the
> following situation.
> > Will there be a dead lock?
> >
> >
> >     Thread A                    Thread B
> >        |                           |
> >        |                           |
> >      CPU 1                       CPU 2
> >        |                           |
> >        |                           |
> >  get lock for Lane 1         get lock for Lane 2
> >        |                           |
> >        |                           |
> >  migrate to CPU 2            migrate to CPU 1
> >        |                           |
> >        |                           |
> >  wait lock for Lane 2        wait lock for Lane 1
> >        |                           |
> >        |                           |
> >        _____________________________
> >                    |
> >                 dead lock ?
> 
> Bummer. That would dead lock indeed.
> Is it easily possible to recognize the recursive case?

Not easily. I don't have test case for recursive call. 
For now, just code analysis.


Yongxin

> >
> > Thanks,
> > Yognxin
> 
> Sebastian
Sebastian Andrzej Siewior March 28, 2019, 5:38 p.m. UTC | #11
On 2019-03-18 11:48:28 [+0000], Liu, Yongxin wrote:
> 
> > 
> > Bummer. That would dead lock indeed.
> > Is it easily possible to recognize the recursive case?
> 
> Not easily. I don't have test case for recursive call. 
> For now, just code analysis.

So I've been playing with qemu's nvdimm device. So I *think* the
recursive case is here not possible because qemu only supports pmem
while it would require the blk mode to trigger it. It is just a wild
guess…

On top of qemu's nvdimm device I can create a block device via
	ndctl create-namespace  namespace0.0 --mode=sector

and then I trigger the code path in question.

I would *really* prefer to understand the recursive case and avoid it.
That way the recursive case is explicitly known and uses another path.
The lock can then be always acquired which gives you always lockdep
coverage (which is now missing unless you have more LANEs than CPUs).

The local_lock thingy is completely unneeded: a simple get_cpu_light()
would do the job.

> Yongxin

Sebastian
diff mbox series

Patch

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afcd43ff..6c5388cf2477 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -18,9 +18,13 @@ 
 #include <linux/sort.h>
 #include <linux/io.h>
 #include <linux/nd.h>
+#include <linux/locallock.h>
 #include "nd-core.h"
 #include "nd.h"
 
+/* lock for tasks on the same CPU to sequence the access to the lane */
+static DEFINE_LOCAL_IRQ_LOCK(ndl_local_lock);
+
 /*
  * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
  * irrelevant.
@@ -935,18 +939,15 @@  int nd_blk_region_init(struct nd_region *nd_region)
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
 {
 	unsigned int cpu, lane;
+	struct nd_percpu_lane *ndl_lock, *ndl_count;
 
-	cpu = get_cpu();
-	if (nd_region->num_lanes < nr_cpu_ids) {
-		struct nd_percpu_lane *ndl_lock, *ndl_count;
+	cpu = local_lock_cpu(ndl_local_lock);
 
-		lane = cpu % nd_region->num_lanes;
-		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-		if (ndl_count->count++ == 0)
-			spin_lock(&ndl_lock->lock);
-	} else
-		lane = cpu;
+	lane = cpu % nd_region->num_lanes;
+	ndl_count = per_cpu_ptr(nd_region->lane, cpu);
+	ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+	if (ndl_count->count++ == 0)
+		spin_lock(&ndl_lock->lock);
 
 	return lane;
 }
@@ -954,17 +955,14 @@  EXPORT_SYMBOL(nd_region_acquire_lane);
 
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
 {
-	if (nd_region->num_lanes < nr_cpu_ids) {
-		unsigned int cpu = get_cpu();
-		struct nd_percpu_lane *ndl_lock, *ndl_count;
-
-		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-		if (--ndl_count->count == 0)
-			spin_unlock(&ndl_lock->lock);
-		put_cpu();
-	}
-	put_cpu();
+	struct nd_percpu_lane *ndl_lock, *ndl_count;
+	unsigned int cpu = smp_processor_id();
+
+	ndl_count = per_cpu_ptr(nd_region->lane, cpu);
+	ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+	if (--ndl_count->count == 0)
+		spin_unlock(&ndl_lock->lock);
+	local_unlock_cpu(ndl_local_lock);
 }
 EXPORT_SYMBOL(nd_region_release_lane);