diff mbox series

[PATCHv3,1/1] RDMA/rxe: Fix a dead lock problem

Message ID 20220413074208.1401112-1-yanjun.zhu@linux.dev (mailing list archive)
State Superseded
Delegated to: Jason Gunthorpe
Headers show
Series [PATCHv3,1/1] RDMA/rxe: Fix a dead lock problem | expand

Commit Message

Zhu Yanjun April 13, 2022, 7:42 a.m. UTC
From: Zhu Yanjun <yanjun.zhu@linux.dev>

This is a dead lock problem.
The xa_lock first is acquired in this:

{SOFTIRQ-ON-W} state was registered at:

  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock+0x33/0x80
  __rxe_add_to_pool+0x183/0x230 [rdma_rxe]
  __ib_alloc_pd+0xf9/0x550 [ib_core]
  ib_mad_init_device+0x2d9/0xd20 [ib_core]
  add_client_context+0x2fa/0x450 [ib_core]
  enable_device_and_get+0x1b7/0x350 [ib_core]
  ib_register_device+0x757/0xaf0 [ib_core]
  rxe_register_device+0x2eb/0x390 [rdma_rxe]
  rxe_net_add+0x83/0xc0 [rdma_rxe]
  rxe_newlink+0x76/0x90 [rdma_rxe]
  nldev_newlink+0x245/0x3e0 [ib_core]
  rdma_nl_rcv_msg+0x2d4/0x790 [ib_core]
  rdma_nl_rcv+0x1ca/0x3f0 [ib_core]
  netlink_unicast+0x43b/0x640
  netlink_sendmsg+0x7eb/0xc40
  sock_sendmsg+0xe0/0x110
  __sys_sendto+0x1d7/0x2b0
  __x64_sys_sendto+0xdd/0x1b0
  do_syscall_64+0x37/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Then xa_lock is acquired in this:

{IN-SOFTIRQ-W}:

Call Trace:
 <TASK>
  dump_stack_lvl+0x44/0x57
  mark_lock.part.52.cold.79+0x3c/0x46
  __lock_acquire+0x1565/0x34a0
  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock_irqsave+0x42/0x90
  rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
  rxe_get_av+0x168/0x2a0 [rdma_rxe]
  rxe_requester+0x75b/0x4a90 [rdma_rxe]
  rxe_do_task+0x134/0x230 [rdma_rxe]
  tasklet_action_common.isra.12+0x1f7/0x2d0
  __do_softirq+0x1ea/0xa4c
  run_ksoftirqd+0x32/0x60
  smpboot_thread_fn+0x503/0x860
  kthread+0x29b/0x340
  ret_from_fork+0x1f/0x30
 </TASK>

From the above, in the function __rxe_add_to_pool,
xa_lock is acquired. Then the function __rxe_add_to_pool
is interrupted by softirq. The function
rxe_pool_get_index will also acquire xa_lock.

Finally, the dead lock appears.

[  296.806097]        CPU0
[  296.808550]        ----
[  296.811003]   lock(&xa->xa_lock#15);  <----- __rxe_add_to_pool
[  296.814583]   <Interrupt>
[  296.817209]     lock(&xa->xa_lock#15); <---- rxe_pool_get_index
[  296.820961]
                 *** DEADLOCK ***

Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so
        GFP_ATOMIC is used in __rxe_add_to_pool.
V1->V2: Replace GFP_KERNEL with GFP_ATOMIC
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Jason Gunthorpe April 13, 2022, 12:45 a.m. UTC | #1
On Wed, Apr 13, 2022 at 03:42:08AM -0400, yanjun.zhu@linux.dev wrote:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> This is a dead lock problem.
> The xa_lock first is acquired in this:
> 
> {SOFTIRQ-ON-W} state was registered at:
> 
>   lock_acquire+0x1d2/0x5a0
>   _raw_spin_lock+0x33/0x80
>   __rxe_add_to_pool+0x183/0x230 [rdma_rxe]
>   __ib_alloc_pd+0xf9/0x550 [ib_core]
>   ib_mad_init_device+0x2d9/0xd20 [ib_core]
>   add_client_context+0x2fa/0x450 [ib_core]
>   enable_device_and_get+0x1b7/0x350 [ib_core]
>   ib_register_device+0x757/0xaf0 [ib_core]
>   rxe_register_device+0x2eb/0x390 [rdma_rxe]
>   rxe_net_add+0x83/0xc0 [rdma_rxe]
>   rxe_newlink+0x76/0x90 [rdma_rxe]
>   nldev_newlink+0x245/0x3e0 [ib_core]
>   rdma_nl_rcv_msg+0x2d4/0x790 [ib_core]
>   rdma_nl_rcv+0x1ca/0x3f0 [ib_core]
>   netlink_unicast+0x43b/0x640
>   netlink_sendmsg+0x7eb/0xc40
>   sock_sendmsg+0xe0/0x110
>   __sys_sendto+0x1d7/0x2b0
>   __x64_sys_sendto+0xdd/0x1b0
>   do_syscall_64+0x37/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Then xa_lock is acquired in this:
> 
> {IN-SOFTIRQ-W}:
> 
> Call Trace:
>  <TASK>
>   dump_stack_lvl+0x44/0x57
>   mark_lock.part.52.cold.79+0x3c/0x46
>   __lock_acquire+0x1565/0x34a0
>   lock_acquire+0x1d2/0x5a0
>   _raw_spin_lock_irqsave+0x42/0x90
>   rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
>   rxe_get_av+0x168/0x2a0 [rdma_rxe]
>   rxe_requester+0x75b/0x4a90 [rdma_rxe]
>   rxe_do_task+0x134/0x230 [rdma_rxe]
>   tasklet_action_common.isra.12+0x1f7/0x2d0
>   __do_softirq+0x1ea/0xa4c
>   run_ksoftirqd+0x32/0x60
>   smpboot_thread_fn+0x503/0x860
>   kthread+0x29b/0x340
>   ret_from_fork+0x1f/0x30
>  </TASK>
> 
> From the above, in the function __rxe_add_to_pool,
> xa_lock is acquired. Then the function __rxe_add_to_pool
> is interrupted by softirq. The function
> rxe_pool_get_index will also acquire xa_lock.
> 
> Finally, the dead lock appears.
> 
> [  296.806097]        CPU0
> [  296.808550]        ----
> [  296.811003]   lock(&xa->xa_lock#15);  <----- __rxe_add_to_pool
> [  296.814583]   <Interrupt>
> [  296.817209]     lock(&xa->xa_lock#15); <---- rxe_pool_get_index
> [  296.820961]
>                  *** DEADLOCK ***
> 
> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
> Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
> V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so
>         GFP_ATOMIC is used in __rxe_add_to_pool.
> V1->V2: Replace GFP_KERNEL with GFP_ATOMIC
> ---
>  drivers/infiniband/sw/rxe/rxe_pool.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> index 87066d04ed18..b9b147df4020 100644
> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> @@ -138,8 +138,10 @@ void *rxe_alloc(struct rxe_pool *pool)
>  	elem->obj = obj;
>  	kref_init(&elem->ref_cnt);
>  
> -	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
> -			      &pool->next, GFP_KERNEL);
> +	xa_lock_bh(&pool->xa);
> +	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
> +				&pool->next, GFP_KERNEL);
> +	xa_unlock_bh(&pool->xa);
>  	if (err)
>  		goto err_free;

You can't mix bh and not bh locks, either this is an irq spinlock or
it is bh spinlock, pick one and also ensure that the proper xa_init
flag is set.

> @@ -166,8 +168,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
>  	elem->obj = (u8 *)elem - pool->elem_offset;
>  	kref_init(&elem->ref_cnt);
>  
> -	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
> -			      &pool->next, GFP_KERNEL);
> +	xa_lock_irq(&pool->xa);
> +	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
> +				&pool->next, GFP_ATOMIC);
> +	xa_unlock_irq(&pool->xa);
>  	if (err)
>  		goto err_cnt;

Still no, this does almost every allocation - only AH with the
non-blocking flag set should use this path.

Jason
Zhu Yanjun April 13, 2022, 2:50 p.m. UTC | #2
在 2022/4/13 8:45, Jason Gunthorpe 写道:
> On Wed, Apr 13, 2022 at 03:42:08AM -0400, yanjun.zhu@linux.dev wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> This is a dead lock problem.
>> The xa_lock first is acquired in this:
>>
>> {SOFTIRQ-ON-W} state was registered at:
>>
>>    lock_acquire+0x1d2/0x5a0
>>    _raw_spin_lock+0x33/0x80
>>    __rxe_add_to_pool+0x183/0x230 [rdma_rxe]
>>    __ib_alloc_pd+0xf9/0x550 [ib_core]
>>    ib_mad_init_device+0x2d9/0xd20 [ib_core]
>>    add_client_context+0x2fa/0x450 [ib_core]
>>    enable_device_and_get+0x1b7/0x350 [ib_core]
>>    ib_register_device+0x757/0xaf0 [ib_core]
>>    rxe_register_device+0x2eb/0x390 [rdma_rxe]
>>    rxe_net_add+0x83/0xc0 [rdma_rxe]
>>    rxe_newlink+0x76/0x90 [rdma_rxe]
>>    nldev_newlink+0x245/0x3e0 [ib_core]
>>    rdma_nl_rcv_msg+0x2d4/0x790 [ib_core]
>>    rdma_nl_rcv+0x1ca/0x3f0 [ib_core]
>>    netlink_unicast+0x43b/0x640
>>    netlink_sendmsg+0x7eb/0xc40
>>    sock_sendmsg+0xe0/0x110
>>    __sys_sendto+0x1d7/0x2b0
>>    __x64_sys_sendto+0xdd/0x1b0
>>    do_syscall_64+0x37/0x80
>>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> Then xa_lock is acquired in this:
>>
>> {IN-SOFTIRQ-W}:
>>
>> Call Trace:
>>   <TASK>
>>    dump_stack_lvl+0x44/0x57
>>    mark_lock.part.52.cold.79+0x3c/0x46
>>    __lock_acquire+0x1565/0x34a0
>>    lock_acquire+0x1d2/0x5a0
>>    _raw_spin_lock_irqsave+0x42/0x90
>>    rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
>>    rxe_get_av+0x168/0x2a0 [rdma_rxe]
>>    rxe_requester+0x75b/0x4a90 [rdma_rxe]
>>    rxe_do_task+0x134/0x230 [rdma_rxe]
>>    tasklet_action_common.isra.12+0x1f7/0x2d0
>>    __do_softirq+0x1ea/0xa4c
>>    run_ksoftirqd+0x32/0x60
>>    smpboot_thread_fn+0x503/0x860
>>    kthread+0x29b/0x340
>>    ret_from_fork+0x1f/0x30
>>   </TASK>
>>
>>  From the above, in the function __rxe_add_to_pool,
>> xa_lock is acquired. Then the function __rxe_add_to_pool
>> is interrupted by softirq. The function
>> rxe_pool_get_index will also acquire xa_lock.
>>
>> Finally, the dead lock appears.
>>
>> [  296.806097]        CPU0
>> [  296.808550]        ----
>> [  296.811003]   lock(&xa->xa_lock#15);  <----- __rxe_add_to_pool
>> [  296.814583]   <Interrupt>
>> [  296.817209]     lock(&xa->xa_lock#15); <---- rxe_pool_get_index
>> [  296.820961]
>>                   *** DEADLOCK ***
>>
>> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
>> Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> ---
>> V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so
>>          GFP_ATOMIC is used in __rxe_add_to_pool.
>> V1->V2: Replace GFP_KERNEL with GFP_ATOMIC
>> ---
>>   drivers/infiniband/sw/rxe/rxe_pool.c | 17 ++++++++++++-----
>>   1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>> index 87066d04ed18..b9b147df4020 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>> @@ -138,8 +138,10 @@ void *rxe_alloc(struct rxe_pool *pool)
>>   	elem->obj = obj;
>>   	kref_init(&elem->ref_cnt);
>>   
>> -	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>> -			      &pool->next, GFP_KERNEL);
>> +	xa_lock_bh(&pool->xa);
>> +	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>> +				&pool->next, GFP_KERNEL);
>> +	xa_unlock_bh(&pool->xa);
>>   	if (err)
>>   		goto err_free;
> You can't mix bh and not bh locks, either this is an irq spinlock or
> it is bh spinlock, pick one and also ensure that the proper xa_init
> flag is set.

Got it. I should use irq spinlock. So XA_FLAGS_LOCK_IRQ is added in 
xa_init flags.

So the code should be:

@@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct 
rxe_pool *pool,

         atomic_set(&pool->num_elem, 0);

-       xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
+       xa_init_flags(&pool->xa, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ALLOC);
         pool->limit.min = info->min_index;
         pool->limit.max = info->max_index;
  }
@@ -138,10 +138,10 @@ void *rxe_alloc(struct rxe_pool *pool)
         elem->obj = obj;
         kref_init(&elem->ref_cnt);

-       xa_lock_bh(&pool->xa);
+       xa_lock_irq(&pool->xa);
         err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
                                 &pool->next, GFP_KERNEL);
-       xa_unlock_bh(&pool->xa);
+       xa_unlock_irq(&pool->xa);
         if (err)
                 goto err_free;

>
>> @@ -166,8 +168,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
>>   	elem->obj = (u8 *)elem - pool->elem_offset;
>>   	kref_init(&elem->ref_cnt);
>>   
>> -	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>> -			      &pool->next, GFP_KERNEL);
>> +	xa_lock_irq(&pool->xa);
>> +	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>> +				&pool->next, GFP_ATOMIC);
>> +	xa_unlock_irq(&pool->xa);
>>   	if (err)
>>   		goto err_cnt;
> Still no, this does almost every allocation - only AH with the
> non-blocking flag set should use this path.

Yes. Got it.

In the following, xa_lock_irqsave/xa_unlock_irqrestore should be used.

int ib_send_cm_req(struct ib_cm_id *cm_id,

                    struct ib_cm_req_param *param)
{

...

spin_lock_irqsave(&cm_id_priv->lock, flags);

...

__rxe_add_to_pool

...

spin_unlock_irqrestore(&cm_id_priv->lock, flags);


So the diff is as below:

@@ -157,6 +157,7 @@ void *rxe_alloc(struct rxe_pool *pool)
  int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
  {
         int err;
+       unsigned long flags;

         if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
                 return -EINVAL;
@@ -168,10 +169,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, 
struct rxe_pool_elem *elem)
         elem->obj = (u8 *)elem - pool->elem_offset;
         kref_init(&elem->ref_cnt);

-       xa_lock_irq(&pool->xa);
+       xa_lock_irqsave(&pool->xa, flags);
         err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
                                 &pool->next, GFP_ATOMIC);
-       xa_unlock_irq(&pool->xa);
+       xa_unlock_irqrestore(&pool->xa, flags);
         if (err)
                 goto err_cnt;

Please comment. Thanks a lot.

Zhu Yanjun

>
> Jason
Zhu Yanjun April 14, 2022, 1:01 p.m. UTC | #3
在 2022/4/13 22:50, Yanjun Zhu 写道:
> 
> 在 2022/4/13 8:45, Jason Gunthorpe 写道:
>> On Wed, Apr 13, 2022 at 03:42:08AM -0400, yanjun.zhu@linux.dev wrote:
>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>
>>> This is a dead lock problem.
>>> The xa_lock first is acquired in this:
>>>
>>> {SOFTIRQ-ON-W} state was registered at:
>>>
>>>    lock_acquire+0x1d2/0x5a0
>>>    _raw_spin_lock+0x33/0x80
>>>    __rxe_add_to_pool+0x183/0x230 [rdma_rxe]
>>>    __ib_alloc_pd+0xf9/0x550 [ib_core]
>>>    ib_mad_init_device+0x2d9/0xd20 [ib_core]
>>>    add_client_context+0x2fa/0x450 [ib_core]
>>>    enable_device_and_get+0x1b7/0x350 [ib_core]
>>>    ib_register_device+0x757/0xaf0 [ib_core]
>>>    rxe_register_device+0x2eb/0x390 [rdma_rxe]
>>>    rxe_net_add+0x83/0xc0 [rdma_rxe]
>>>    rxe_newlink+0x76/0x90 [rdma_rxe]
>>>    nldev_newlink+0x245/0x3e0 [ib_core]
>>>    rdma_nl_rcv_msg+0x2d4/0x790 [ib_core]
>>>    rdma_nl_rcv+0x1ca/0x3f0 [ib_core]
>>>    netlink_unicast+0x43b/0x640
>>>    netlink_sendmsg+0x7eb/0xc40
>>>    sock_sendmsg+0xe0/0x110
>>>    __sys_sendto+0x1d7/0x2b0
>>>    __x64_sys_sendto+0xdd/0x1b0
>>>    do_syscall_64+0x37/0x80
>>>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> Then xa_lock is acquired in this:
>>>
>>> {IN-SOFTIRQ-W}:
>>>
>>> Call Trace:
>>>   <TASK>
>>>    dump_stack_lvl+0x44/0x57
>>>    mark_lock.part.52.cold.79+0x3c/0x46
>>>    __lock_acquire+0x1565/0x34a0
>>>    lock_acquire+0x1d2/0x5a0
>>>    _raw_spin_lock_irqsave+0x42/0x90
>>>    rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
>>>    rxe_get_av+0x168/0x2a0 [rdma_rxe]
>>>    rxe_requester+0x75b/0x4a90 [rdma_rxe]
>>>    rxe_do_task+0x134/0x230 [rdma_rxe]
>>>    tasklet_action_common.isra.12+0x1f7/0x2d0
>>>    __do_softirq+0x1ea/0xa4c
>>>    run_ksoftirqd+0x32/0x60
>>>    smpboot_thread_fn+0x503/0x860
>>>    kthread+0x29b/0x340
>>>    ret_from_fork+0x1f/0x30
>>>   </TASK>
>>>
>>>  From the above, in the function __rxe_add_to_pool,
>>> xa_lock is acquired. Then the function __rxe_add_to_pool
>>> is interrupted by softirq. The function
>>> rxe_pool_get_index will also acquire xa_lock.
>>>
>>> Finally, the dead lock appears.
>>>
>>> [  296.806097]        CPU0
>>> [  296.808550]        ----
>>> [  296.811003]   lock(&xa->xa_lock#15);  <----- __rxe_add_to_pool
>>> [  296.814583]   <Interrupt>
>>> [  296.817209]     lock(&xa->xa_lock#15); <---- rxe_pool_get_index
>>> [  296.820961]
>>>                   *** DEADLOCK ***
>>>
>>> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
>>> Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> ---
>>> V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so
>>>          GFP_ATOMIC is used in __rxe_add_to_pool.
>>> V1->V2: Replace GFP_KERNEL with GFP_ATOMIC
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_pool.c | 17 ++++++++++++-----
>>>   1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c 
>>> b/drivers/infiniband/sw/rxe/rxe_pool.c
>>> index 87066d04ed18..b9b147df4020 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>>> @@ -138,8 +138,10 @@ void *rxe_alloc(struct rxe_pool *pool)
>>>       elem->obj = obj;
>>>       kref_init(&elem->ref_cnt);
>>> -    err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>>> -                  &pool->next, GFP_KERNEL);
>>> +    xa_lock_bh(&pool->xa);
>>> +    err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>>> +                &pool->next, GFP_KERNEL);
>>> +    xa_unlock_bh(&pool->xa);
>>>       if (err)
>>>           goto err_free;
>> You can't mix bh and not bh locks, either this is an irq spinlock or
>> it is bh spinlock, pick one and also ensure that the proper xa_init
>> flag is set.
> 
> Got it. I should use irq spinlock. So XA_FLAGS_LOCK_IRQ is added in 
> xa_init flags.
> 
> So the code should be:
> 
> @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct 
> rxe_pool *pool,
> 
>          atomic_set(&pool->num_elem, 0);
> 
> -       xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
> +       xa_init_flags(&pool->xa, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ALLOC);
>          pool->limit.min = info->min_index;
>          pool->limit.max = info->max_index;
>   }
> @@ -138,10 +138,10 @@ void *rxe_alloc(struct rxe_pool *pool)
>          elem->obj = obj;
>          kref_init(&elem->ref_cnt);
> 
> -       xa_lock_bh(&pool->xa);
> +       xa_lock_irq(&pool->xa);
>          err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, 
> pool->limit,
>                                  &pool->next, GFP_KERNEL);
> -       xa_unlock_bh(&pool->xa);
> +       xa_unlock_irq(&pool->xa);
>          if (err)
>                  goto err_free;
> 
>>
>>> @@ -166,8 +168,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, 
>>> struct rxe_pool_elem *elem)
>>>       elem->obj = (u8 *)elem - pool->elem_offset;
>>>       kref_init(&elem->ref_cnt);
>>> -    err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>>> -                  &pool->next, GFP_KERNEL);
>>> +    xa_lock_irq(&pool->xa);
>>> +    err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>>> +                &pool->next, GFP_ATOMIC);
>>> +    xa_unlock_irq(&pool->xa);
>>>       if (err)
>>>           goto err_cnt;
>> Still no, this does almost every allocation - only AH with the
>> non-blocking flag set should use this path.
> 
> Yes. Got it.
> 
> In the following, xa_lock_irqsave/xa_unlock_irqrestore should be used.
> 
> int ib_send_cm_req(struct ib_cm_id *cm_id,
> 
>                     struct ib_cm_req_param *param)
> {
> 
> ...
> 
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> 
> ...
> 
> __rxe_add_to_pool
> 
> ...
> 
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);

Hi,Jason

To the function ib_send_cm_req, the call chain is as below.

ib_send_cm_req --> cm_alloc_priv_msg --> cm_alloc_msg --> rdma_create_ah 
--> _rdma_create_ah --> rxe_create_ah --> rxe_av_chk_attr 
-->__rxe_add_to_pool

As such, xa_lock_irqsave/irqrestore is selected.

Zhu Yanjun

> 
> 
> So the diff is as below:
> 
> @@ -157,6 +157,7 @@ void *rxe_alloc(struct rxe_pool *pool)
>   int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
>   {
>          int err;
> +       unsigned long flags;
> 
>          if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
>                  return -EINVAL;
> @@ -168,10 +169,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, 
> struct rxe_pool_elem *elem)
>          elem->obj = (u8 *)elem - pool->elem_offset;
>          kref_init(&elem->ref_cnt);
> 
> -       xa_lock_irq(&pool->xa);
> +       xa_lock_irqsave(&pool->xa, flags);
>          err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, 
> pool->limit,
>                                  &pool->next, GFP_ATOMIC);
> -       xa_unlock_irq(&pool->xa);
> +       xa_unlock_irqrestore(&pool->xa, flags);
>          if (err)
>                  goto err_cnt;
> 
> Please comment. Thanks a lot.
> 
> Zhu Yanjun
> 
>>
>> Jason
Jason Gunthorpe April 14, 2022, 1:52 p.m. UTC | #4
On Thu, Apr 14, 2022 at 09:01:29PM +0800, Yanjun Zhu wrote:

> > > Still no, this does almost every allocation - only AH with the
> > > non-blocking flag set should use this path.

> To the function ib_send_cm_req, the call chain is as below.
> 
> ib_send_cm_req --> cm_alloc_priv_msg --> cm_alloc_msg --> rdma_create_ah -->
> _rdma_create_ah --> rxe_create_ah --> rxe_av_chk_attr -->__rxe_add_to_pool
> 
> As such, xa_lock_irqsave/irqrestore is selected.

As I keep saying, only the rxe_create_ah() with the non-blocking flag
set should use the GFP_ATOMIC. All other paths must use GFP_KERNEL.

Jason
Zhu Yanjun April 14, 2022, 3:13 p.m. UTC | #5
在 2022/4/14 21:52, Jason Gunthorpe 写道:
> On Thu, Apr 14, 2022 at 09:01:29PM +0800, Yanjun Zhu wrote:
> 
>>>> Still no, this does almost every allocation - only AH with the
>>>> non-blocking flag set should use this path.
> 
>> To the function ib_send_cm_req, the call chain is as below.
>>
>> ib_send_cm_req --> cm_alloc_priv_msg --> cm_alloc_msg --> rdma_create_ah -->
>> _rdma_create_ah --> rxe_create_ah --> rxe_av_chk_attr -->__rxe_add_to_pool
>>
>> As such, xa_lock_irqsave/irqrestore is selected.
> 
> As I keep saying, only the rxe_create_ah() with the non-blocking flag
> set should use the GFP_ATOMIC. All other paths must use GFP_KERNEL.
> 

Got it. The GFP_ATOMIC/GFP_KERNEL are used in different paths.
rxe_create_ah will use GFP_ATOMIC and others will use GFP_KERNEL.
So the codes should be as below:

-int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
+int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem 
*elem, gfp_t gfp)
  {
         int err;
+       unsigned long flags;

         if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
                 return -EINVAL;
@@ -168,10 +170,17 @@ int __rxe_add_to_pool(struct rxe_pool *pool, 
struct rxe_pool_elem *elem)
         elem->obj = (u8 *)elem - pool->elem_offset;
         kref_init(&elem->ref_cnt);

-       xa_lock_irq(&pool->xa);
-       err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-                               &pool->next, GFP_ATOMIC);
-       xa_unlock_irq(&pool->xa);
+       if (gfp == GFP_ATOMIC) { /* for rxe_create_ah */
+               xa_lock_irqsave(&pool->xa, flags);
+               err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, 
pool->limit,
+                                       &pool->next, GFP_ATOMIC);
+               xa_unlock_irqrestore(&pool->xa, flags);
+       } else if (gfp == GFP_KERNEL) {
+               xa_lock_irq(&pool->xa);
+               err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, 
pool->limit,
+                                       &pool->next, GFP_KERNEL);
+               xa_unlock_irq(&pool->xa);
+       }
         if (err)
                 goto err_cnt;

Please commnet.

Zhu Yanjun

> Jason
Leon Romanovsky April 14, 2022, 4:12 p.m. UTC | #6
On Thu, Apr 14, 2022 at 11:13:57PM +0800, Yanjun Zhu wrote:
> 在 2022/4/14 21:52, Jason Gunthorpe 写道:
> > On Thu, Apr 14, 2022 at 09:01:29PM +0800, Yanjun Zhu wrote:
> > 
> > > > > Still no, this does almost every allocation - only AH with the
> > > > > non-blocking flag set should use this path.
> > 
> > > To the function ib_send_cm_req, the call chain is as below.
> > > 
> > > ib_send_cm_req --> cm_alloc_priv_msg --> cm_alloc_msg --> rdma_create_ah -->
> > > _rdma_create_ah --> rxe_create_ah --> rxe_av_chk_attr -->__rxe_add_to_pool
> > > 
> > > As such, xa_lock_irqsave/irqrestore is selected.
> > 
> > As I keep saying, only the rxe_create_ah() with the non-blocking flag
> > set should use the GFP_ATOMIC. All other paths must use GFP_KERNEL.
> > 
> 
> Got it. The GFP_ATOMIC/GFP_KERNEL are used in different paths.
> rxe_create_ah will use GFP_ATOMIC and others will use GFP_KERNEL.
> So the codes should be as below:
> 
> -int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
> +int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> gfp_t gfp)
>  {
>         int err;
> +       unsigned long flags;
> 
>         if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
>                 return -EINVAL;
> @@ -168,10 +170,17 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct
> rxe_pool_elem *elem)
>         elem->obj = (u8 *)elem - pool->elem_offset;
>         kref_init(&elem->ref_cnt);
> 
> -       xa_lock_irq(&pool->xa);
> -       err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
> -                               &pool->next, GFP_ATOMIC);
> -       xa_unlock_irq(&pool->xa);
> +       if (gfp == GFP_ATOMIC) { /* for rxe_create_ah */

gfp is bitfield.
"gfp == GFP_ATOMIC" -> "gfp & GFP_ATOMIC"

Thanks
Jason Gunthorpe April 14, 2022, 4:18 p.m. UTC | #7
On Thu, Apr 14, 2022 at 11:13:57PM +0800, Yanjun Zhu wrote:
> 在 2022/4/14 21:52, Jason Gunthorpe 写道:
> > On Thu, Apr 14, 2022 at 09:01:29PM +0800, Yanjun Zhu wrote:
> > 
> > > > > Still no, this does almost every allocation - only AH with the
> > > > > non-blocking flag set should use this path.
> > 
> > > To the function ib_send_cm_req, the call chain is as below.
> > > 
> > > ib_send_cm_req --> cm_alloc_priv_msg --> cm_alloc_msg --> rdma_create_ah -->
> > > _rdma_create_ah --> rxe_create_ah --> rxe_av_chk_attr -->__rxe_add_to_pool
> > > 
> > > As such, xa_lock_irqsave/irqrestore is selected.
> > 
> > As I keep saying, only the rxe_create_ah() with the non-blocking flag
> > set should use the GFP_ATOMIC. All other paths must use GFP_KERNEL.
> > 
> 
> Got it. The GFP_ATOMIC/GFP_KERNEL are used in different paths.
> rxe_create_ah will use GFP_ATOMIC and others will use GFP_KERNEL.
> So the codes should be as below:

This seems better

Jason
Zhu Yanjun April 15, 2022, 2:35 a.m. UTC | #8
在 2022/4/15 0:12, Leon Romanovsky 写道:
> On Thu, Apr 14, 2022 at 11:13:57PM +0800, Yanjun Zhu wrote:
>> 在 2022/4/14 21:52, Jason Gunthorpe 写道:
>>> On Thu, Apr 14, 2022 at 09:01:29PM +0800, Yanjun Zhu wrote:
>>>
>>>>>> Still no, this does almost every allocation - only AH with the
>>>>>> non-blocking flag set should use this path.
>>>
>>>> To the function ib_send_cm_req, the call chain is as below.
>>>>
>>>> ib_send_cm_req --> cm_alloc_priv_msg --> cm_alloc_msg --> rdma_create_ah -->
>>>> _rdma_create_ah --> rxe_create_ah --> rxe_av_chk_attr -->__rxe_add_to_pool
>>>>
>>>> As such, xa_lock_irqsave/irqrestore is selected.
>>>
>>> As I keep saying, only the rxe_create_ah() with the non-blocking flag
>>> set should use the GFP_ATOMIC. All other paths must use GFP_KERNEL.
>>>
>>
>> Got it. The GFP_ATOMIC/GFP_KERNEL are used in different paths.
>> rxe_create_ah will use GFP_ATOMIC and others will use GFP_KERNEL.
>> So the codes should be as below:
>>
>> -int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
>> +int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
>> gfp_t gfp)
>>   {
>>          int err;
>> +       unsigned long flags;
>>
>>          if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
>>                  return -EINVAL;
>> @@ -168,10 +170,17 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct
>> rxe_pool_elem *elem)
>>          elem->obj = (u8 *)elem - pool->elem_offset;
>>          kref_init(&elem->ref_cnt);
>>
>> -       xa_lock_irq(&pool->xa);
>> -       err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>> -                               &pool->next, GFP_ATOMIC);
>> -       xa_unlock_irq(&pool->xa);
>> +       if (gfp == GFP_ATOMIC) { /* for rxe_create_ah */
> 
> gfp is bitfield.
> "gfp == GFP_ATOMIC" -> "gfp & GFP_ATOMIC"

Got it. Thanks

Zhu Yanjun

> 
> Thanks
Zhu Yanjun April 15, 2022, 2:36 a.m. UTC | #9
在 2022/4/15 0:18, Jason Gunthorpe 写道:
> On Thu, Apr 14, 2022 at 11:13:57PM +0800, Yanjun Zhu wrote:
>> 在 2022/4/14 21:52, Jason Gunthorpe 写道:
>>> On Thu, Apr 14, 2022 at 09:01:29PM +0800, Yanjun Zhu wrote:
>>>
>>>>>> Still no, this does almost every allocation - only AH with the
>>>>>> non-blocking flag set should use this path.
>>>
>>>> To the function ib_send_cm_req, the call chain is as below.
>>>>
>>>> ib_send_cm_req --> cm_alloc_priv_msg --> cm_alloc_msg --> rdma_create_ah -->
>>>> _rdma_create_ah --> rxe_create_ah --> rxe_av_chk_attr -->__rxe_add_to_pool
>>>>
>>>> As such, xa_lock_irqsave/irqrestore is selected.
>>>
>>> As I keep saying, only the rxe_create_ah() with the non-blocking flag
>>> set should use the GFP_ATOMIC. All other paths must use GFP_KERNEL.
>>>
>>
>> Got it. The GFP_ATOMIC/GFP_KERNEL are used in different paths.
>> rxe_create_ah will use GFP_ATOMIC and others will use GFP_KERNEL.
>> So the codes should be as below:
> 
> This seems better
Thanks. I will send the latest patch very soon.

Zhu Yanjun
> 
> Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 87066d04ed18..b9b147df4020 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -138,8 +138,10 @@  void *rxe_alloc(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-			      &pool->next, GFP_KERNEL);
+	xa_lock_bh(&pool->xa);
+	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+				&pool->next, GFP_KERNEL);
+	xa_unlock_bh(&pool->xa);
 	if (err)
 		goto err_free;
 
@@ -166,8 +168,10 @@  int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-			      &pool->next, GFP_KERNEL);
+	xa_lock_irq(&pool->xa);
+	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+				&pool->next, GFP_ATOMIC);
+	xa_unlock_irq(&pool->xa);
 	if (err)
 		goto err_cnt;
 
@@ -200,8 +204,11 @@  static void rxe_elem_release(struct kref *kref)
 {
 	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
 	struct rxe_pool *pool = elem->pool;
+	unsigned long flags;
 
-	xa_erase(&pool->xa, elem->index);
+	xa_lock_irqsave(&pool->xa, flags);
+	__xa_erase(&pool->xa, elem->index);
+	xa_unlock_irqrestore(&pool->xa, flags);
 
 	if (pool->cleanup)
 		pool->cleanup(elem);