Message ID | 20220422194416.983549-1-yanjun.zhu@linux.dev (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [PATCHv6,1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index | expand |
Use of rcu_read_lock solves this problem. Rcu_read_lock and spinlock on same data can Co-exist at the same time. That is the whole point. All this is going away soon. Bob -----Original Message----- From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev> Sent: Friday, April 22, 2022 2:44 PM To: jgg@ziepe.ca; leon@kernel.org; linux-rdma@vger.kernel.org; yanjun.zhu@linux.dev Cc: Yi Zhang <yi.zhang@redhat.com> Subject: [PATCHv6 1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index From: Zhu Yanjun <yanjun.zhu@linux.dev> This is a dead lock problem. The ah_pool xa_lock first is acquired in this: {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x1d2/0x5a0 _raw_spin_lock+0x33/0x80 __rxe_add_to_pool+0x183/0x230 [rdma_rxe] Then ah_pool xa_lock is acquired in this: {IN-SOFTIRQ-W}: Call Trace: <TASK> dump_stack_lvl+0x44/0x57 mark_lock.part.52.cold.79+0x3c/0x46 __lock_acquire+0x1565/0x34a0 lock_acquire+0x1d2/0x5a0 _raw_spin_lock_irqsave+0x42/0x90 rxe_pool_get_index+0x72/0x1d0 [rdma_rxe] rxe_get_av+0x168/0x2a0 [rdma_rxe] </TASK> From the above, in the function __rxe_add_to_pool, xa_lock is acquired. Then the function __rxe_add_to_pool is interrupted by softirq. The function rxe_pool_get_index will also acquire xa_lock. Finally, the dead lock appears. CPU0 ---- lock(&xa->xa_lock#15); <----- __rxe_add_to_pool <Interrupt> lock(&xa->xa_lock#15); <---- rxe_pool_get_index *** DEADLOCK *** Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays") Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> --- V5->V6: One dead lock fix in one commit V4->V5: Commit logs are changed. V3->V4: xa_lock_irq locks are used. V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so GFP_ATOMIC is used in __rxe_add_to_pool. V1->V2: Replace GFP_KERNEL with GFP_ATOMIC --- drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index 87066d04ed18..67f1d4733682 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, atomic_set(&pool->num_elem, 0); - xa_init_flags(&pool->xa, XA_FLAGS_ALLOC); + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); pool->limit.min = info->min_index; pool->limit.max = info->max_index; } @@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool) int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) { int err; + unsigned long flags; if (WARN_ON(pool->flags & RXE_POOL_ALLOC)) return -EINVAL; @@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) elem->obj = (u8 *)elem - pool->elem_offset; kref_init(&elem->ref_cnt); - err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, - &pool->next, GFP_KERNEL); + xa_lock_irqsave(&pool->xa, flags); + err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, + &pool->next, GFP_ATOMIC); + xa_unlock_irqrestore(&pool->xa, flags); if (err) goto err_cnt; @@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref) struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt); struct rxe_pool *pool = elem->pool; - xa_erase(&pool->xa, elem->index); + xa_erase_irq(&pool->xa, elem->index); if (pool->cleanup) pool->cleanup(elem); -- 2.27.0
在 2022/4/22 23:57, Pearson, Robert B 写道: > Use of rcu_read_lock solves this problem. Rcu_read_lock and spinlock on same data can > Co-exist at the same time. That is the whole point. All this is going away soon. This is based on your unproved assumption. Zhu Yanjun > > Bob > > -----Original Message----- > From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev> > Sent: Friday, April 22, 2022 2:44 PM > To: jgg@ziepe.ca; leon@kernel.org; linux-rdma@vger.kernel.org; yanjun.zhu@linux.dev > Cc: Yi Zhang <yi.zhang@redhat.com> > Subject: [PATCHv6 1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > This is a dead lock problem. > The ah_pool xa_lock first is acquired in this: > > {SOFTIRQ-ON-W} state was registered at: > > lock_acquire+0x1d2/0x5a0 > _raw_spin_lock+0x33/0x80 > __rxe_add_to_pool+0x183/0x230 [rdma_rxe] > > Then ah_pool xa_lock is acquired in this: > > {IN-SOFTIRQ-W}: > > Call Trace: > <TASK> > dump_stack_lvl+0x44/0x57 > mark_lock.part.52.cold.79+0x3c/0x46 > __lock_acquire+0x1565/0x34a0 > lock_acquire+0x1d2/0x5a0 > _raw_spin_lock_irqsave+0x42/0x90 > rxe_pool_get_index+0x72/0x1d0 [rdma_rxe] > rxe_get_av+0x168/0x2a0 [rdma_rxe] > </TASK> > > From the above, in the function __rxe_add_to_pool, xa_lock is acquired. Then the function __rxe_add_to_pool is interrupted by softirq. The function rxe_pool_get_index will also acquire xa_lock. > > Finally, the dead lock appears. > > CPU0 > ---- > lock(&xa->xa_lock#15); <----- __rxe_add_to_pool > <Interrupt> > lock(&xa->xa_lock#15); <---- rxe_pool_get_index > > *** DEADLOCK *** > > Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays") > Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > V5->V6: One dead lock fix in one commit > V4->V5: Commit logs are changed. > V3->V4: xa_lock_irq locks are used. > V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so > GFP_ATOMIC is used in __rxe_add_to_pool. > V1->V2: Replace GFP_KERNEL with GFP_ATOMIC > --- > drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > index 87066d04ed18..67f1d4733682 100644 > --- a/drivers/infiniband/sw/rxe/rxe_pool.c > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, > > atomic_set(&pool->num_elem, 0); > > - xa_init_flags(&pool->xa, XA_FLAGS_ALLOC); > + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); > pool->limit.min = info->min_index; > pool->limit.max = info->max_index; > } > @@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool) int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) { > int err; > + unsigned long flags; > > if (WARN_ON(pool->flags & RXE_POOL_ALLOC)) > return -EINVAL; > @@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) > elem->obj = (u8 *)elem - pool->elem_offset; > kref_init(&elem->ref_cnt); > > - err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, > - &pool->next, GFP_KERNEL); > + xa_lock_irqsave(&pool->xa, flags); > + err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, > + &pool->next, GFP_ATOMIC); > + xa_unlock_irqrestore(&pool->xa, flags); > if (err) > goto err_cnt; > > @@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref) > struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt); > struct rxe_pool *pool = elem->pool; > > - xa_erase(&pool->xa, elem->index); > + xa_erase_irq(&pool->xa, elem->index); > > if (pool->cleanup) > pool->cleanup(elem); > -- > 2.27.0 >
On 4/24/22 18:47, Yanjun Zhu wrote: > 在 2022/4/22 23:57, Pearson, Robert B 写道: >> Use of rcu_read_lock solves this problem. Rcu_read_lock and spinlock on same data can >> Co-exist at the same time. That is the whole point. All this is going away soon. > > This is based on your unproved assumption. We are running the same tests (pyverbs, rping, blktests, perftest, etc.) with the same configurations set (lockdep) and I am using rcu_read_lock for rxe_pool_get_index. I do not see any deadlocks or lockdep warnings. The idea of RCU is to separate accesses to a shared data structure into write operations and read operations. For write operations a normal spinlock is used to allow only one writer at a time to access the data. For RCU, unlike rwlocks, the reader is *not* locked at all and the writers are written so that they can change the data structure underneath the readers and the reader will either see the old data or the new data and not some combination of the two. This requires some care but there is library support for this usually denoted by xxx_rcu(). In the case of xarrays the whole library is RCU enabled so one can safely use xa_load() without holding a spinlock. The rcu_read_lock() API marks the critical section so the writers can make sure to wait long enough that there are no readers in the critical section before freeing the data. The rcu_read_lock() is also recursive. It just expands the critical section. In the case of rxe_pool.c the only rcu reader is rxe_pool_get_index() which looks up an object from its index by calling xa_load(). This normally runs in a tasklet but sometimes could run in process context. The only writers (as we have discussed) run in process context in a create or destroy verbs call. The writers sometimes call while holding a _saveirq spinlock from ib_create_ah() so lockdep issues warnings unless the spinlock in rxe_add_to_pool() is also an _irq spinlock. However it does not issue this warning when RCU is used because the reader *doesn't take a lock and therefore can't deadlock*. Thus the default xa_alloc_xxx() which takes a spin_lock() it OK and the sequence xa_lock_irq(); __xa_alloc_xxx(); xa_unlock_irq() is not needed. > > Zhu Yanjun > >> >> Bob >> >> -----Original Message----- >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev> >> Sent: Friday, April 22, 2022 2:44 PM >> To: jgg@ziepe.ca; leon@kernel.org; linux-rdma@vger.kernel.org; yanjun.zhu@linux.dev >> Cc: Yi Zhang <yi.zhang@redhat.com> >> Subject: [PATCHv6 1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index >> >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> This is a dead lock problem. >> The ah_pool xa_lock first is acquired in this: >> >> {SOFTIRQ-ON-W} state was registered at: >> >> lock_acquire+0x1d2/0x5a0 >> _raw_spin_lock+0x33/0x80 >> __rxe_add_to_pool+0x183/0x230 [rdma_rxe] >> >> Then ah_pool xa_lock is acquired in this: >> >> {IN-SOFTIRQ-W}: >> >> Call Trace: >> <TASK> >> dump_stack_lvl+0x44/0x57 >> mark_lock.part.52.cold.79+0x3c/0x46 >> __lock_acquire+0x1565/0x34a0 >> lock_acquire+0x1d2/0x5a0 >> _raw_spin_lock_irqsave+0x42/0x90 >> rxe_pool_get_index+0x72/0x1d0 [rdma_rxe] >> rxe_get_av+0x168/0x2a0 [rdma_rxe] >> </TASK> >> >> From the above, in the function __rxe_add_to_pool, xa_lock is acquired. Then the function __rxe_add_to_pool is interrupted by softirq. The function rxe_pool_get_index will also acquire xa_lock. >> >> Finally, the dead lock appears. >> >> CPU0 >> ---- >> lock(&xa->xa_lock#15); <----- __rxe_add_to_pool >> <Interrupt> >> lock(&xa->xa_lock#15); <---- rxe_pool_get_index >> >> *** DEADLOCK *** >> >> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays") >> Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> V5->V6: One dead lock fix in one commit >> V4->V5: Commit logs are changed. >> V3->V4: xa_lock_irq locks are used. >> V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so >> GFP_ATOMIC is used in __rxe_add_to_pool. >> V1->V2: Replace GFP_KERNEL with GFP_ATOMIC >> --- >> drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >> index 87066d04ed18..67f1d4733682 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >> @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, >> atomic_set(&pool->num_elem, 0); >> - xa_init_flags(&pool->xa, XA_FLAGS_ALLOC); >> + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); >> pool->limit.min = info->min_index; >> pool->limit.max = info->max_index; >> } >> @@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool) int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) { >> int err; >> + unsigned long flags; >> if (WARN_ON(pool->flags & RXE_POOL_ALLOC)) >> return -EINVAL; >> @@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) >> elem->obj = (u8 *)elem - pool->elem_offset; >> kref_init(&elem->ref_cnt); >> - err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, >> - &pool->next, GFP_KERNEL); >> + xa_lock_irqsave(&pool->xa, flags); >> + err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, >> + &pool->next, GFP_ATOMIC); >> + xa_unlock_irqrestore(&pool->xa, flags); >> if (err) >> goto err_cnt; >> @@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref) >> struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt); >> struct rxe_pool *pool = elem->pool; >> - xa_erase(&pool->xa, elem->index); >> + xa_erase_irq(&pool->xa, elem->index); >> if (pool->cleanup) >> pool->cleanup(elem); >> -- >> 2.27.0 >> >
On Mon, Apr 25, 2022 at 07:47:23AM +0800, Yanjun Zhu wrote: > 在 2022/4/22 23:57, Pearson, Robert B 写道: > > Use of rcu_read_lock solves this problem. Rcu_read_lock and spinlock on same data can > > Co-exist at the same time. That is the whole point. All this is going away soon. > > This is based on your unproved assumption. No, Bob is right, RCU avoids the need for a BH lock on this XA, the only remaining issue is the AH creation atomic call path. Jason
在 2022/4/26 3:02, Jason Gunthorpe 写道: > On Mon, Apr 25, 2022 at 07:47:23AM +0800, Yanjun Zhu wrote: >> 在 2022/4/22 23:57, Pearson, Robert B 写道: >>> Use of rcu_read_lock solves this problem. Rcu_read_lock and spinlock on same data can >>> Co-exist at the same time. That is the whole point. All this is going away soon. >> This is based on your unproved assumption. > No, Bob is right, RCU avoids the need for a BH lock on this XA, the > only remaining issue is the AH creation atomic call path. If RCU is used, the similar issues like AH creation atomic call path will become more. Zhu Yanjun > > Jason
On Tue, Apr 26, 2022 at 06:01:51AM +0800, Yanjun Zhu wrote: > > 在 2022/4/26 3:02, Jason Gunthorpe 写道: > > On Mon, Apr 25, 2022 at 07:47:23AM +0800, Yanjun Zhu wrote: > > > 在 2022/4/22 23:57, Pearson, Robert B 写道: > > > > Use of rcu_read_lock solves this problem. Rcu_read_lock and spinlock on same data can > > > > Co-exist at the same time. That is the whole point. All this is going away soon. > > > This is based on your unproved assumption. > > No, Bob is right, RCU avoids the need for a BH lock on this XA, the > > only remaining issue is the AH creation atomic call path. > > If RCU is used, the similar issues like AH creation atomic call path will > become more. AH creation is unique, there will not be more cases like it. Jason
Hi Yanjun, Bob Could you tell me if the dead lock issue has been fixed by the following issue: [PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu Best Regards, Xiao Yang On 2022/4/23 3:44, yanjun.zhu@linux.dev 写道: > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > This is a dead lock problem. > The ah_pool xa_lock first is acquired in this: > > {SOFTIRQ-ON-W} state was registered at: > > lock_acquire+0x1d2/0x5a0 > _raw_spin_lock+0x33/0x80 > __rxe_add_to_pool+0x183/0x230 [rdma_rxe] > > Then ah_pool xa_lock is acquired in this: > > {IN-SOFTIRQ-W}: > > Call Trace: > <TASK> > dump_stack_lvl+0x44/0x57 > mark_lock.part.52.cold.79+0x3c/0x46 > __lock_acquire+0x1565/0x34a0 > lock_acquire+0x1d2/0x5a0 > _raw_spin_lock_irqsave+0x42/0x90 > rxe_pool_get_index+0x72/0x1d0 [rdma_rxe] > rxe_get_av+0x168/0x2a0 [rdma_rxe] > </TASK> > > From the above, in the function __rxe_add_to_pool, > xa_lock is acquired. Then the function __rxe_add_to_pool > is interrupted by softirq. The function > rxe_pool_get_index will also acquire xa_lock. > > Finally, the dead lock appears. > > CPU0 > ---- > lock(&xa->xa_lock#15); <----- __rxe_add_to_pool > <Interrupt> > lock(&xa->xa_lock#15); <---- rxe_pool_get_index > > *** DEADLOCK *** > > Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays") > Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > V5->V6: One dead lock fix in one commit > V4->V5: Commit logs are changed. > V3->V4: xa_lock_irq locks are used. > V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so > GFP_ATOMIC is used in __rxe_add_to_pool. > V1->V2: Replace GFP_KERNEL with GFP_ATOMIC > --- > drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > index 87066d04ed18..67f1d4733682 100644 > --- a/drivers/infiniband/sw/rxe/rxe_pool.c > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, > > atomic_set(&pool->num_elem, 0); > > - xa_init_flags(&pool->xa, XA_FLAGS_ALLOC); > + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); > pool->limit.min = info->min_index; > pool->limit.max = info->max_index; > } > @@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool) > int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) > { > int err; > + unsigned long flags; > > if (WARN_ON(pool->flags & RXE_POOL_ALLOC)) > return -EINVAL; > @@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) > elem->obj = (u8 *)elem - pool->elem_offset; > kref_init(&elem->ref_cnt); > > - err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, > - &pool->next, GFP_KERNEL); > + xa_lock_irqsave(&pool->xa, flags); > + err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, > + &pool->next, GFP_ATOMIC); > + xa_unlock_irqrestore(&pool->xa, flags); > if (err) > goto err_cnt; > > @@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref) > struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt); > struct rxe_pool *pool = elem->pool; > > - xa_erase(&pool->xa, elem->index); > + xa_erase_irq(&pool->xa, elem->index); > > if (pool->cleanup) > pool->cleanup(elem);
在 2022/7/22 14:51, yangx.jy@fujitsu.com 写道: > Hi Yanjun, Bob > > Could you tell me if the dead lock issue has been fixed by the following > issue: > [PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu Hi, Xiao Normally I applied this "RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index" patch series to fix this problem. And I am not sure if this problem is fixed by "[PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu". Zhu Yanjun > > Best Regards, > Xiao Yang > > On 2022/4/23 3:44, yanjun.zhu@linux.dev 写道: >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> This is a dead lock problem. >> The ah_pool xa_lock first is acquired in this: >> >> {SOFTIRQ-ON-W} state was registered at: >> >> lock_acquire+0x1d2/0x5a0 >> _raw_spin_lock+0x33/0x80 >> __rxe_add_to_pool+0x183/0x230 [rdma_rxe] >> >> Then ah_pool xa_lock is acquired in this: >> >> {IN-SOFTIRQ-W}: >> >> Call Trace: >> <TASK> >> dump_stack_lvl+0x44/0x57 >> mark_lock.part.52.cold.79+0x3c/0x46 >> __lock_acquire+0x1565/0x34a0 >> lock_acquire+0x1d2/0x5a0 >> _raw_spin_lock_irqsave+0x42/0x90 >> rxe_pool_get_index+0x72/0x1d0 [rdma_rxe] >> rxe_get_av+0x168/0x2a0 [rdma_rxe] >> </TASK> >> >> From the above, in the function __rxe_add_to_pool, >> xa_lock is acquired. Then the function __rxe_add_to_pool >> is interrupted by softirq. The function >> rxe_pool_get_index will also acquire xa_lock. >> >> Finally, the dead lock appears. >> >> CPU0 >> ---- >> lock(&xa->xa_lock#15); <----- __rxe_add_to_pool >> <Interrupt> >> lock(&xa->xa_lock#15); <---- rxe_pool_get_index >> >> *** DEADLOCK *** >> >> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays") >> Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> V5->V6: One dead lock fix in one commit >> V4->V5: Commit logs are changed. >> V3->V4: xa_lock_irq locks are used. >> V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so >> GFP_ATOMIC is used in __rxe_add_to_pool. >> V1->V2: Replace GFP_KERNEL with GFP_ATOMIC >> --- >> drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >> index 87066d04ed18..67f1d4733682 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >> @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, >> >> atomic_set(&pool->num_elem, 0); >> >> - xa_init_flags(&pool->xa, XA_FLAGS_ALLOC); >> + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); >> pool->limit.min = info->min_index; >> pool->limit.max = info->max_index; >> } >> @@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool) >> int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) >> { >> int err; >> + unsigned long flags; >> >> if (WARN_ON(pool->flags & RXE_POOL_ALLOC)) >> return -EINVAL; >> @@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) >> elem->obj = (u8 *)elem - pool->elem_offset; >> kref_init(&elem->ref_cnt); >> >> - err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, >> - &pool->next, GFP_KERNEL); >> + xa_lock_irqsave(&pool->xa, flags); >> + err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, >> + &pool->next, GFP_ATOMIC); >> + xa_unlock_irqrestore(&pool->xa, flags); >> if (err) >> goto err_cnt; >> >> @@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref) >> struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt); >> struct rxe_pool *pool = elem->pool; >> >> - xa_erase(&pool->xa, elem->index); >> + xa_erase_irq(&pool->xa, elem->index); >> >> if (pool->cleanup) >> pool->cleanup(elem);
On 2022/7/22 21:43, Yanjun Zhu wrote: > Hi, Xiao > > Normally I applied this "RDMA/rxe: Fix dead lock caused by > __rxe_add_to_pool interrupted by rxe_pool_get_index" patch > > series to fix this problem. And I am not sure if this problem is fixed > by "[PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu". > > Zhu Yanjun Hi Yanjun, I have confirmed that the problem has been fixed by: [PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu Best Regards, Xiao Yang
On Fri, Jul 22, 2022 at 03:14:25PM +0000, yangx.jy@fujitsu.com wrote: > On 2022/7/22 21:43, Yanjun Zhu wrote: > > Hi, Xiao > > > > Normally I applied this "RDMA/rxe: Fix dead lock caused by > > __rxe_add_to_pool interrupted by rxe_pool_get_index" patch > > > > series to fix this problem. And I am not sure if this problem is fixed > > by "[PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu". > > > > Zhu Yanjun > Hi Yanjun, > > I have confirmed that the problem has been fixed by: > [PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu Thanks! Jason
在 2022/7/22 23:14, yangx.jy@fujitsu.com 写道: > On 2022/7/22 21:43, Yanjun Zhu wrote: >> Hi, Xiao >> >> Normally I applied this "RDMA/rxe: Fix dead lock caused by >> __rxe_add_to_pool interrupted by rxe_pool_get_index" patch >> >> series to fix this problem. And I am not sure if this problem is fixed >> by "[PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu". >> >> Zhu Yanjun > Hi Yanjun, > > I have confirmed that the problem has been fixed by: > [PATCH v16 2/2] RDMA/rxe: Convert read side locking to rcu Thanks. Zhu Yanjun > > Best Regards, > Xiao Yang
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index 87066d04ed18..67f1d4733682 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, atomic_set(&pool->num_elem, 0); - xa_init_flags(&pool->xa, XA_FLAGS_ALLOC); + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); pool->limit.min = info->min_index; pool->limit.max = info->max_index; } @@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool) int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) { int err; + unsigned long flags; if (WARN_ON(pool->flags & RXE_POOL_ALLOC)) return -EINVAL; @@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) elem->obj = (u8 *)elem - pool->elem_offset; kref_init(&elem->ref_cnt); - err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, - &pool->next, GFP_KERNEL); + xa_lock_irqsave(&pool->xa, flags); + err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit, + &pool->next, GFP_ATOMIC); + xa_unlock_irqrestore(&pool->xa, flags); if (err) goto err_cnt; @@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref) struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt); struct rxe_pool *pool = elem->pool; - xa_erase(&pool->xa, elem->index); + xa_erase_irq(&pool->xa, elem->index); if (pool->cleanup) pool->cleanup(elem);