diff mbox series

[1/1] RDMA/rxe: Fix the qp flush warnings in req

Message ID 20241025152036.121417-1-yanjun.zhu@linux.dev (mailing list archive)
State Accepted
Headers show
Series [1/1] RDMA/rxe: Fix the qp flush warnings in req | expand

Commit Message

Zhu Yanjun Oct. 25, 2024, 3:20 p.m. UTC
When the qp is in error state, the status of WQEs in the queue should be
set to error. Or else the following will appear.

[  920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe]
[  920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6
[  920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G           O       6.1.113-storage+ #65
[  920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe]
[  920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24
[  920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246
[  920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008
[  920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac
[  920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450
[  920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800
[  920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000
[  920.622609] FS:  0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000
[  920.622979] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0
[  920.623680] Call Trace:
[  920.623815]  <TASK>
[  920.623933]  ? __warn+0x79/0xc0
[  920.624116]  ? rxe_completer+0x989/0xcc0 [rdma_rxe]
[  920.624356]  ? report_bug+0xfb/0x150
[  920.624594]  ? handle_bug+0x3c/0x60
[  920.624796]  ? exc_invalid_op+0x14/0x70
[  920.624976]  ? asm_exc_invalid_op+0x16/0x20
[  920.625203]  ? rxe_completer+0x989/0xcc0 [rdma_rxe]
[  920.625474]  ? rxe_completer+0x329/0xcc0 [rdma_rxe]
[  920.625749]  rxe_do_task+0x80/0x110 [rdma_rxe]
[  920.626037]  rxe_requester+0x625/0xde0 [rdma_rxe]
[  920.626310]  ? rxe_cq_post+0xe2/0x180 [rdma_rxe]
[  920.626583]  ? do_complete+0x18d/0x220 [rdma_rxe]
[  920.626812]  ? rxe_completer+0x1a3/0xcc0 [rdma_rxe]
[  920.627050]  rxe_do_task+0x80/0x110 [rdma_rxe]
[  920.627285]  tasklet_action_common.constprop.0+0xa4/0x120
[  920.627522]  handle_softirqs+0xc2/0x250
[  920.627728]  ? sort_range+0x20/0x20
[  920.627942]  run_ksoftirqd+0x1f/0x30
[  920.628158]  smpboot_thread_fn+0xc7/0x1b0
[  920.628334]  kthread+0xd6/0x100
[  920.628504]  ? kthread_complete_and_exit+0x20/0x20
[  920.628709]  ret_from_fork+0x1f/0x30
[  920.628892]  </TASK>

Fixes: ae720bdb703b ("RDMA/rxe: Generate error completion for error requester QP state")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Honggang LI Oct. 26, 2024, 1:58 a.m. UTC | #1
On Fri, Oct 25, 2024 at 05:20:36PM +0200, Zhu Yanjun wrote:
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index 479c07e6e4ed..87a02f0deb00 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp)
>  	if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
>  		wqe = __req_next_wqe(qp);
>  		spin_unlock_irqrestore(&qp->state_lock, flags);
> -		if (wqe)
> +		if (wqe) {
> +			wqe->status = IB_WC_WR_FLUSH_ERR;
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Why not update wqe->status in function `flush_send_wqe()` ?

thanks
Zhu Yanjun Oct. 26, 2024, 5:52 a.m. UTC | #2
在 2024/10/26 3:58, Honggang LI 写道:
> On Fri, Oct 25, 2024 at 05:20:36PM +0200, Zhu Yanjun wrote:
>> ---
>>   drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index 479c07e6e4ed..87a02f0deb00 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp)
>>   	if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
>>   		wqe = __req_next_wqe(qp);
>>   		spin_unlock_irqrestore(&qp->state_lock, flags);
>> -		if (wqe)
>> +		if (wqe) {
>> +			wqe->status = IB_WC_WR_FLUSH_ERR;
>                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Why not update wqe->status in function `flush_send_wqe()` ?

flush_send_wqe is to handle the cqe in cq queue. Please see the source 
code as below.

static int flush_send_wqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
{
     struct rxe_cqe cqe = {};
     struct ib_wc *wc = &cqe.ibwc;
     struct ib_uverbs_wc *uwc = &cqe.uibwc;
     int err;

     if (qp->is_user) {
         uwc->wr_id = wqe->wr.wr_id;
         uwc->status = IB_WC_WR_FLUSH_ERR;
         uwc->qp_num = qp->ibqp.qp_num;
     } else {
         wc->wr_id = wqe->wr.wr_id;
         wc->status = IB_WC_WR_FLUSH_ERR;
         wc->qp = &qp->ibqp;
     }

     err = rxe_cq_post(qp->scq, &cqe, 0);
     if (err)
         rxe_dbg_cq(qp->scq, "post cq failed, err = %d\n", err);

     return err;
}

This error occurs in send queue. Please see the source code as below.

static struct rxe_send_wqe *__req_next_wqe(struct rxe_qp *qp)
{
     struct rxe_queue *q = qp->sq.queue;
     unsigned int index = qp->req.wqe_index;
     unsigned int prod;

     prod = queue_get_producer(q, QUEUE_TYPE_FROM_CLIENT);
     if (index == prod)
         return NULL;
     else
         return queue_addr_from_index(q, index);
}

This is why we should set the error status in send queue error handler.

Thanks,

Zhu Yanjun

>
> thanks
>
Leon Romanovsky Oct. 30, 2024, 12:22 p.m. UTC | #3
On Fri, 25 Oct 2024 17:20:36 +0200, Zhu Yanjun wrote:
> When the qp is in error state, the status of WQEs in the queue should be
> set to error. Or else the following will appear.
> 
> [  920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe]
> [  920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6
> [  920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G           O       6.1.113-storage+ #65
> [  920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> [  920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe]
> [  920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24
> [  920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246
> [  920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008
> [  920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac
> [  920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450
> [  920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800
> [  920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000
> [  920.622609] FS:  0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000
> [  920.622979] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0
> [  920.623680] Call Trace:
> [  920.623815]  <TASK>
> [  920.623933]  ? __warn+0x79/0xc0
> [  920.624116]  ? rxe_completer+0x989/0xcc0 [rdma_rxe]
> [  920.624356]  ? report_bug+0xfb/0x150
> [  920.624594]  ? handle_bug+0x3c/0x60
> [  920.624796]  ? exc_invalid_op+0x14/0x70
> [  920.624976]  ? asm_exc_invalid_op+0x16/0x20
> [  920.625203]  ? rxe_completer+0x989/0xcc0 [rdma_rxe]
> [  920.625474]  ? rxe_completer+0x329/0xcc0 [rdma_rxe]
> [  920.625749]  rxe_do_task+0x80/0x110 [rdma_rxe]
> [  920.626037]  rxe_requester+0x625/0xde0 [rdma_rxe]
> [  920.626310]  ? rxe_cq_post+0xe2/0x180 [rdma_rxe]
> [  920.626583]  ? do_complete+0x18d/0x220 [rdma_rxe]
> [  920.626812]  ? rxe_completer+0x1a3/0xcc0 [rdma_rxe]
> [  920.627050]  rxe_do_task+0x80/0x110 [rdma_rxe]
> [  920.627285]  tasklet_action_common.constprop.0+0xa4/0x120
> [  920.627522]  handle_softirqs+0xc2/0x250
> [  920.627728]  ? sort_range+0x20/0x20
> [  920.627942]  run_ksoftirqd+0x1f/0x30
> [  920.628158]  smpboot_thread_fn+0xc7/0x1b0
> [  920.628334]  kthread+0xd6/0x100
> [  920.628504]  ? kthread_complete_and_exit+0x20/0x20
> [  920.628709]  ret_from_fork+0x1f/0x30
> [  920.628892]  </TASK>
> 
> [...]

Applied, thanks!

[1/1] RDMA/rxe: Fix the qp flush warnings in req
      https://git.kernel.org/rdma/rdma/c/ea4c990fa9e19f

Best regards,
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 479c07e6e4ed..87a02f0deb00 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -663,10 +663,12 @@  int rxe_requester(struct rxe_qp *qp)
 	if (unlikely(qp_state(qp) == IB_QPS_ERR)) {
 		wqe = __req_next_wqe(qp);
 		spin_unlock_irqrestore(&qp->state_lock, flags);
-		if (wqe)
+		if (wqe) {
+			wqe->status = IB_WC_WR_FLUSH_ERR;
 			goto err;
-		else
+		} else {
 			goto exit;
+		}
 	}
 
 	if (unlikely(qp_state(qp) == IB_QPS_RESET)) {