diff mbox series

[3/3] RDMA/rxe: Replace spin_lock_bh with spin_lock_irqsave in post_one_send

Message ID 20220210073655.42281-4-guoqing.jiang@linux.dev (mailing list archive)
State Changes Requested
Headers show
Series patches and bug report for rxe | expand

Commit Message

Guoqing Jiang Feb. 10, 2022, 7:36 a.m. UTC
Same as __rxe_add_index, the lock need to be fully IRQ safe, otherwise
below calltrace appears.

[  763.942623] ------------[ cut here ]------------
[  763.943171] WARNING: CPU: 5 PID: 97 at kernel/softirq.c:363 __local_bh_enable_ip+0xb1/0x110
[ ... ]
[  763.947276] CPU: 5 PID: 97 Comm: kworker/5:2 Kdump: loaded Tainted: G           OEL    5.17.0-rc3-57-default #17
[  763.947575] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[  763.947893] Workqueue: ib_cm cm_work_handler [ib_cm]
[  763.948075] RIP: 0010:__local_bh_enable_ip+0xb1/0x110
[  763.948232] Code: e8 54 ae 04 00 e8 7f 4e 20 00 fb 66 0f 1f 44 00 00 65 8b 05 b1 03 f3 56 85 c0 74 51 5b 5d c3 65 8b 05 3f 0f f3 56
85 c0 75 8e <0f> 0b eb 8a e8 76 4c 20 00 eb 99 48 89 ef e8 9c 8d 0b 00 eb a2 48
[  763.948736] RSP: 0018:ffff888004a970e8 EFLAGS: 00010046
[  763.948897] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
[  763.949095] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffffab95dbac
[  763.949292] RBP: ffffffffc127c269 R08: ffffffffa91059da R09: ffff88800afde323
[  763.949556] R10: ffffed10015fbc64 R11: 0000000000000001 R12: ffffc900005a2000
[  763.949781] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800afde000
[  763.949982] FS:  0000000000000000(0000) GS:ffff888104c80000(0000) knlGS:0000000000000000
[  763.950205] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  763.950367] CR2: 00007f85ec3f5b18 CR3: 0000000116216005 CR4: 0000000000770ee0
[  763.956480] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  763.962608] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  763.968785] PKRU: 55555554
[  763.974707] Call Trace:
[  763.980557]  <TASK>
[  763.986377]  rxe_post_send+0x569/0x8e0 [rdma_rxe]
[  763.992340]  ib_send_mad+0x4c1/0x850 [ib_core]
[  763.998442]  ? ib_register_mad_agent+0x1710/0x1710 [ib_core]
[  764.004486]  ? __kmalloc+0x21d/0x3a0
[  764.010465]  ib_post_send_mad+0x28c/0x10b0 [ib_core]
[  764.016480]  ? lock_is_held_type+0xe4/0x140
[  764.022359]  ? find_held_lock+0x85/0xa0
[  764.028230]  ? lock_release+0x24e/0x450
[  764.034061]  ? timeout_sends+0x420/0x420 [ib_core]
[  764.039879]  ? ib_create_send_mad+0x541/0x670 [ib_core]
[  764.045604]  ? do_raw_spin_unlock+0x86/0xf0
[  764.051178]  ? preempt_count_sub+0x14/0xc0
[  764.056851]  ? lock_is_held_type+0xe4/0x140
[  764.062412]  ib_send_cm_rep+0x47a/0x860 [ib_cm]
[  764.067965]  rdma_accept+0x44c/0x5e0 [rdma_cm]
[  764.073381]  ? cma_rep_recv+0x330/0x330 [rdma_cm]
[  764.078762]  ? rcu_read_lock_sched_held+0x3f/0x60
[  764.084072]  ? trace_kmalloc+0x29/0xd0
[  764.089185]  ? __kmalloc+0x1c5/0x3a0
[  764.094185]  ? rtrs_iu_alloc+0x12b/0x260 [rtrs_core]
[  764.099075]  rtrs_srv_rdma_cm_handler+0x7ba/0xcf0 [rtrs_server]
[  764.103917]  ? rtrs_srv_inv_rkey_done+0x100/0x100 [rtrs_server]
[  764.108563]  ? find_held_lock+0x85/0xa0
[  764.113033]  ? lock_release+0x24e/0x450
[  764.117452]  ? rdma_restrack_add+0x9c/0x220 [ib_core]
[  764.121797]  ? rcu_read_lock_sched_held+0x3f/0x60
[  764.125961]  cma_cm_event_handler+0x77/0x2c0 [rdma_cm]
[  764.130061]  cma_ib_req_handler+0xbd5/0x23f0 [rdma_cm]
[  764.134027]  ? cma_cancel_operation+0x1f0/0x1f0 [rdma_cm]
[  764.137950]  ? lockdep_lock+0xb4/0x170
[  764.141667]  ? _find_first_zero_bit+0x28/0x50
[  764.145486]  ? mark_held_locks+0x65/0x90
[  764.149002]  cm_process_work+0x2f/0x210 [ib_cm]
[  764.152413]  ? _raw_spin_unlock_irq+0x35/0x50
[  764.155763]  ? cm_queue_work_unlock+0x40/0x110 [ib_cm]
[  764.159080]  cm_req_handler+0xf7f/0x2030 [ib_cm]
[  764.162522]  ? cm_lap_handler+0xba0/0xba0 [ib_cm]
[  764.165847]  ? lockdep_hardirqs_on_prepare+0x220/0x220
[  764.169155]  cm_work_handler+0x6ce/0x37c0 [ib_cm]
[  764.172497]  ? lock_acquire+0x182/0x410
[  764.175771]  ? lock_release+0x450/0x450
[  764.178925]  ? lock_downgrade+0x3c0/0x3c0
[  764.182148]  ? ib_cm_init_qp_attr+0xa90/0xa90 [ib_cm]
[  764.185511]  ? mark_held_locks+0x24/0x90
[  764.188692]  ? lock_is_held_type+0xe4/0x140
[  764.191876]  process_one_work+0x5a8/0xa80
[  764.195034]  ? lock_release+0x450/0x450
[  764.198208]  ? pwq_dec_nr_in_flight+0x100/0x100
[  764.201433]  ? rwlock_bug.part.0+0x60/0x60
[  764.204660]  ? _raw_spin_lock_irq+0x54/0x60
[  764.207835]  worker_thread+0x2b5/0x760
[  764.210920]  ? process_one_work+0xa80/0xa80
[  764.214014]  kthread+0x169/0x1a0
[  764.217033]  ? kthread_complete_and_exit+0x20/0x20
[  764.220205]  ret_from_fork+0x1f/0x30
[  764.223467]  </TASK>
[  764.226482] irq event stamp: 55805
[  764.229527] hardirqs last  enabled at (55803): [<ffffffffaa179c6d>] _raw_spin_unlock_irqrestore+0x2d/0x60
[  764.232779] hardirqs last disabled at (55804): [<ffffffffaa179a10>] _raw_spin_lock_irqsave+0x60/0x70
[  764.236052] softirqs last  enabled at (55794): [<ffffffffc127cb68>] rxe_post_recv+0xb8/0x120 [rdma_rxe]
[  764.239428] softirqs last disabled at (55805): [<ffffffffc127beeb>] rxe_post_send+0x1eb/0x8e0 [rdma_rxe]
[  764.242740] ---[ end trace 0000000000000000 ]---

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Zhu Yanjun Feb. 10, 2022, 2:18 p.m. UTC | #1
On Thu, Feb 10, 2022 at 3:37 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> Same as __rxe_add_index, the lock need to be fully IRQ safe, otherwise
> below calltrace appears.
>
> [  763.942623] ------------[ cut here ]------------
> [  763.943171] WARNING: CPU: 5 PID: 97 at kernel/softirq.c:363 __local_bh_enable_ip+0xb1/0x110
> [ ... ]
> [  763.947276] CPU: 5 PID: 97 Comm: kworker/5:2 Kdump: loaded Tainted: G           OEL    5.17.0-rc3-57-default #17
> [  763.947575] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> [  763.947893] Workqueue: ib_cm cm_work_handler [ib_cm]
> [  763.948075] RIP: 0010:__local_bh_enable_ip+0xb1/0x110
> [  763.948232] Code: e8 54 ae 04 00 e8 7f 4e 20 00 fb 66 0f 1f 44 00 00 65 8b 05 b1 03 f3 56 85 c0 74 51 5b 5d c3 65 8b 05 3f 0f f3 56
> 85 c0 75 8e <0f> 0b eb 8a e8 76 4c 20 00 eb 99 48 89 ef e8 9c 8d 0b 00 eb a2 48
> [  763.948736] RSP: 0018:ffff888004a970e8 EFLAGS: 00010046
> [  763.948897] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000
> [  763.949095] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffffab95dbac
> [  763.949292] RBP: ffffffffc127c269 R08: ffffffffa91059da R09: ffff88800afde323
> [  763.949556] R10: ffffed10015fbc64 R11: 0000000000000001 R12: ffffc900005a2000
> [  763.949781] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800afde000
> [  763.949982] FS:  0000000000000000(0000) GS:ffff888104c80000(0000) knlGS:0000000000000000
> [  763.950205] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  763.950367] CR2: 00007f85ec3f5b18 CR3: 0000000116216005 CR4: 0000000000770ee0
> [  763.956480] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  763.962608] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  763.968785] PKRU: 55555554
> [  763.974707] Call Trace:
> [  763.980557]  <TASK>
> [  763.986377]  rxe_post_send+0x569/0x8e0 [rdma_rxe]
> [  763.992340]  ib_send_mad+0x4c1/0x850 [ib_core]
> [  763.998442]  ? ib_register_mad_agent+0x1710/0x1710 [ib_core]
> [  764.004486]  ? __kmalloc+0x21d/0x3a0
> [  764.010465]  ib_post_send_mad+0x28c/0x10b0 [ib_core]
> [  764.016480]  ? lock_is_held_type+0xe4/0x140
> [  764.022359]  ? find_held_lock+0x85/0xa0
> [  764.028230]  ? lock_release+0x24e/0x450
> [  764.034061]  ? timeout_sends+0x420/0x420 [ib_core]
> [  764.039879]  ? ib_create_send_mad+0x541/0x670 [ib_core]
> [  764.045604]  ? do_raw_spin_unlock+0x86/0xf0
> [  764.051178]  ? preempt_count_sub+0x14/0xc0
> [  764.056851]  ? lock_is_held_type+0xe4/0x140
> [  764.062412]  ib_send_cm_rep+0x47a/0x860 [ib_cm]
> [  764.067965]  rdma_accept+0x44c/0x5e0 [rdma_cm]
> [  764.073381]  ? cma_rep_recv+0x330/0x330 [rdma_cm]
> [  764.078762]  ? rcu_read_lock_sched_held+0x3f/0x60
> [  764.084072]  ? trace_kmalloc+0x29/0xd0
> [  764.089185]  ? __kmalloc+0x1c5/0x3a0
> [  764.094185]  ? rtrs_iu_alloc+0x12b/0x260 [rtrs_core]
> [  764.099075]  rtrs_srv_rdma_cm_handler+0x7ba/0xcf0 [rtrs_server]
> [  764.103917]  ? rtrs_srv_inv_rkey_done+0x100/0x100 [rtrs_server]
> [  764.108563]  ? find_held_lock+0x85/0xa0
> [  764.113033]  ? lock_release+0x24e/0x450
> [  764.117452]  ? rdma_restrack_add+0x9c/0x220 [ib_core]
> [  764.121797]  ? rcu_read_lock_sched_held+0x3f/0x60
> [  764.125961]  cma_cm_event_handler+0x77/0x2c0 [rdma_cm]
> [  764.130061]  cma_ib_req_handler+0xbd5/0x23f0 [rdma_cm]
> [  764.134027]  ? cma_cancel_operation+0x1f0/0x1f0 [rdma_cm]
> [  764.137950]  ? lockdep_lock+0xb4/0x170
> [  764.141667]  ? _find_first_zero_bit+0x28/0x50
> [  764.145486]  ? mark_held_locks+0x65/0x90
> [  764.149002]  cm_process_work+0x2f/0x210 [ib_cm]
> [  764.152413]  ? _raw_spin_unlock_irq+0x35/0x50
> [  764.155763]  ? cm_queue_work_unlock+0x40/0x110 [ib_cm]
> [  764.159080]  cm_req_handler+0xf7f/0x2030 [ib_cm]
> [  764.162522]  ? cm_lap_handler+0xba0/0xba0 [ib_cm]
> [  764.165847]  ? lockdep_hardirqs_on_prepare+0x220/0x220
> [  764.169155]  cm_work_handler+0x6ce/0x37c0 [ib_cm]
> [  764.172497]  ? lock_acquire+0x182/0x410
> [  764.175771]  ? lock_release+0x450/0x450
> [  764.178925]  ? lock_downgrade+0x3c0/0x3c0
> [  764.182148]  ? ib_cm_init_qp_attr+0xa90/0xa90 [ib_cm]
> [  764.185511]  ? mark_held_locks+0x24/0x90
> [  764.188692]  ? lock_is_held_type+0xe4/0x140
> [  764.191876]  process_one_work+0x5a8/0xa80
> [  764.195034]  ? lock_release+0x450/0x450
> [  764.198208]  ? pwq_dec_nr_in_flight+0x100/0x100
> [  764.201433]  ? rwlock_bug.part.0+0x60/0x60
> [  764.204660]  ? _raw_spin_lock_irq+0x54/0x60
> [  764.207835]  worker_thread+0x2b5/0x760
> [  764.210920]  ? process_one_work+0xa80/0xa80
> [  764.214014]  kthread+0x169/0x1a0
> [  764.217033]  ? kthread_complete_and_exit+0x20/0x20
> [  764.220205]  ret_from_fork+0x1f/0x30
> [  764.223467]  </TASK>
> [  764.226482] irq event stamp: 55805
> [  764.229527] hardirqs last  enabled at (55803): [<ffffffffaa179c6d>] _raw_spin_unlock_irqrestore+0x2d/0x60
> [  764.232779] hardirqs last disabled at (55804): [<ffffffffaa179a10>] _raw_spin_lock_irqsave+0x60/0x70
> [  764.236052] softirqs last  enabled at (55794): [<ffffffffc127cb68>] rxe_post_recv+0xb8/0x120 [rdma_rxe]
> [  764.239428] softirqs last disabled at (55805): [<ffffffffc127beeb>] rxe_post_send+0x1eb/0x8e0 [rdma_rxe]
> [  764.242740] ---[ end trace 0000000000000000 ]---
>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>

Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>

Zhu Yanjun

> ---
>  drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 9f0aef4b649d..0056418425a1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -644,12 +644,13 @@ static int post_one_send(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
>         struct rxe_sq *sq = &qp->sq;
>         struct rxe_send_wqe *send_wqe;
>         int full;
> +       unsigned long flags;
>
>         err = validate_send_wr(qp, ibwr, mask, length);
>         if (err)
>                 return err;
>
> -       spin_lock_bh(&qp->sq.sq_lock);
> +       spin_lock_irqsave(&qp->sq.sq_lock, flags);
>
>         full = queue_full(sq->queue, QUEUE_TYPE_TO_DRIVER);
>
> @@ -663,7 +664,7 @@ static int post_one_send(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
>
>         queue_advance_producer(sq->queue, QUEUE_TYPE_TO_DRIVER);
>
> -       spin_unlock_bh(&qp->sq.sq_lock);
> +       spin_unlock_irqrestore(&qp->sq.sq_lock, flags);
>
>         return 0;
>  }
> --
> 2.26.2
>
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 9f0aef4b649d..0056418425a1 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -644,12 +644,13 @@  static int post_one_send(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
 	struct rxe_sq *sq = &qp->sq;
 	struct rxe_send_wqe *send_wqe;
 	int full;
+	unsigned long flags;
 
 	err = validate_send_wr(qp, ibwr, mask, length);
 	if (err)
 		return err;
 
-	spin_lock_bh(&qp->sq.sq_lock);
+	spin_lock_irqsave(&qp->sq.sq_lock, flags);
 
 	full = queue_full(sq->queue, QUEUE_TYPE_TO_DRIVER);
 
@@ -663,7 +664,7 @@  static int post_one_send(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
 
 	queue_advance_producer(sq->queue, QUEUE_TYPE_TO_DRIVER);
 
-	spin_unlock_bh(&qp->sq.sq_lock);
+	spin_unlock_irqrestore(&qp->sq.sq_lock, flags);
 
 	return 0;
 }