Message ID | 20241025152036.121417-1-yanjun.zhu@linux.dev (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [1/1] RDMA/rxe: Fix the qp flush warnings in req | expand |
On Fri, Oct 25, 2024 at 05:20:36PM +0200, Zhu Yanjun wrote: > --- > drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c > index 479c07e6e4ed..87a02f0deb00 100644 > --- a/drivers/infiniband/sw/rxe/rxe_req.c > +++ b/drivers/infiniband/sw/rxe/rxe_req.c > @@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp) > if (unlikely(qp_state(qp) == IB_QPS_ERR)) { > wqe = __req_next_wqe(qp); > spin_unlock_irqrestore(&qp->state_lock, flags); > - if (wqe) > + if (wqe) { > + wqe->status = IB_WC_WR_FLUSH_ERR; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Why not update wqe->status in function `flush_send_wqe()` ? thanks
在 2024/10/26 3:58, Honggang LI 写道: > On Fri, Oct 25, 2024 at 05:20:36PM +0200, Zhu Yanjun wrote: >> --- >> drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c >> index 479c07e6e4ed..87a02f0deb00 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_req.c >> +++ b/drivers/infiniband/sw/rxe/rxe_req.c >> @@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp) >> if (unlikely(qp_state(qp) == IB_QPS_ERR)) { >> wqe = __req_next_wqe(qp); >> spin_unlock_irqrestore(&qp->state_lock, flags); >> - if (wqe) >> + if (wqe) { >> + wqe->status = IB_WC_WR_FLUSH_ERR; > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Why not update wqe->status in function `flush_send_wqe()` ? flush_send_wqe is to handle the cqe in cq queue. Please see the source code as below. static int flush_send_wqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe) { struct rxe_cqe cqe = {}; struct ib_wc *wc = &cqe.ibwc; struct ib_uverbs_wc *uwc = &cqe.uibwc; int err; if (qp->is_user) { uwc->wr_id = wqe->wr.wr_id; uwc->status = IB_WC_WR_FLUSH_ERR; uwc->qp_num = qp->ibqp.qp_num; } else { wc->wr_id = wqe->wr.wr_id; wc->status = IB_WC_WR_FLUSH_ERR; wc->qp = &qp->ibqp; } err = rxe_cq_post(qp->scq, &cqe, 0); if (err) rxe_dbg_cq(qp->scq, "post cq failed, err = %d\n", err); return err; } This error occurs in send queue. Please see the source code as below. static struct rxe_send_wqe *__req_next_wqe(struct rxe_qp *qp) { struct rxe_queue *q = qp->sq.queue; unsigned int index = qp->req.wqe_index; unsigned int prod; prod = queue_get_producer(q, QUEUE_TYPE_FROM_CLIENT); if (index == prod) return NULL; else return queue_addr_from_index(q, index); } This is why we should set the error status in send queue error handler. Thanks, Zhu Yanjun > > thanks >
On Fri, 25 Oct 2024 17:20:36 +0200, Zhu Yanjun wrote: > When the qp is in error state, the status of WQEs in the queue should be > set to error. Or else the following will appear. > > [ 920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe] > [ 920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6 > [ 920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G O 6.1.113-storage+ #65 > [ 920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > [ 920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe] > [ 920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24 > [ 920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246 > [ 920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008 > [ 920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac > [ 920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450 > [ 920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800 > [ 920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000 > [ 920.622609] FS: 0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000 > [ 920.622979] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0 > [ 920.623680] Call Trace: > [ 920.623815] <TASK> > [ 920.623933] ? __warn+0x79/0xc0 > [ 920.624116] ? rxe_completer+0x989/0xcc0 [rdma_rxe] > [ 920.624356] ? report_bug+0xfb/0x150 > [ 920.624594] ? handle_bug+0x3c/0x60 > [ 920.624796] ? exc_invalid_op+0x14/0x70 > [ 920.624976] ? asm_exc_invalid_op+0x16/0x20 > [ 920.625203] ? rxe_completer+0x989/0xcc0 [rdma_rxe] > [ 920.625474] ? rxe_completer+0x329/0xcc0 [rdma_rxe] > [ 920.625749] rxe_do_task+0x80/0x110 [rdma_rxe] > [ 920.626037] rxe_requester+0x625/0xde0 [rdma_rxe] > [ 920.626310] ? rxe_cq_post+0xe2/0x180 [rdma_rxe] > [ 920.626583] ? do_complete+0x18d/0x220 [rdma_rxe] > [ 920.626812] ? rxe_completer+0x1a3/0xcc0 [rdma_rxe] > [ 920.627050] rxe_do_task+0x80/0x110 [rdma_rxe] > [ 920.627285] tasklet_action_common.constprop.0+0xa4/0x120 > [ 920.627522] handle_softirqs+0xc2/0x250 > [ 920.627728] ? sort_range+0x20/0x20 > [ 920.627942] run_ksoftirqd+0x1f/0x30 > [ 920.628158] smpboot_thread_fn+0xc7/0x1b0 > [ 920.628334] kthread+0xd6/0x100 > [ 920.628504] ? kthread_complete_and_exit+0x20/0x20 > [ 920.628709] ret_from_fork+0x1f/0x30 > [ 920.628892] </TASK> > > [...] Applied, thanks! [1/1] RDMA/rxe: Fix the qp flush warnings in req https://git.kernel.org/rdma/rdma/c/ea4c990fa9e19f Best regards,
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 479c07e6e4ed..87a02f0deb00 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -663,10 +663,12 @@ int rxe_requester(struct rxe_qp *qp) if (unlikely(qp_state(qp) == IB_QPS_ERR)) { wqe = __req_next_wqe(qp); spin_unlock_irqrestore(&qp->state_lock, flags); - if (wqe) + if (wqe) { + wqe->status = IB_WC_WR_FLUSH_ERR; goto err; - else + } else { goto exit; + } } if (unlikely(qp_state(qp) == IB_QPS_RESET)) {
When the qp is in error state, the status of WQEs in the queue should be set to error. Or else the following will appear. [ 920.617269] WARNING: CPU: 1 PID: 21 at drivers/infiniband/sw/rxe/rxe_comp.c:756 rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.617744] Modules linked in: rnbd_client(O) rtrs_client(O) rtrs_core(O) rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop brd null_blk ipv6 [ 920.618516] CPU: 1 PID: 21 Comm: ksoftirqd/1 Tainted: G O 6.1.113-storage+ #65 [ 920.618986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 920.619396] RIP: 0010:rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.619658] Code: 0f b6 84 24 3a 02 00 00 41 89 84 24 44 04 00 00 e9 2a f7 ff ff 39 ca bb 03 00 00 00 b8 0e 00 00 00 48 0f 45 d8 e9 15 f7 ff ff <0f> 0b e9 cb f8 ff ff 41 bf f5 ff ff ff e9 08 f8 ff ff 49 8d bc 24 [ 920.620482] RSP: 0018:ffff97b7c00bbc38 EFLAGS: 00010246 [ 920.620817] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000008 [ 920.621183] RDX: ffff960dc396ebc0 RSI: 0000000000005400 RDI: ffff960dc4e2fbac [ 920.621548] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffffffac406450 [ 920.621884] R10: ffffffffac4060c0 R11: 0000000000000001 R12: ffff960dc4e2f800 [ 920.622254] R13: ffff960dc4e2f928 R14: ffff97b7c029c580 R15: 0000000000000000 [ 920.622609] FS: 0000000000000000(0000) GS:ffff960ef7d00000(0000) knlGS:0000000000000000 [ 920.622979] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 920.623245] CR2: 00007fa056965e90 CR3: 00000001107f1000 CR4: 00000000000006e0 [ 920.623680] Call Trace: [ 920.623815] <TASK> [ 920.623933] ? __warn+0x79/0xc0 [ 920.624116] ? rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.624356] ? report_bug+0xfb/0x150 [ 920.624594] ? handle_bug+0x3c/0x60 [ 920.624796] ? exc_invalid_op+0x14/0x70 [ 920.624976] ? asm_exc_invalid_op+0x16/0x20 [ 920.625203] ? rxe_completer+0x989/0xcc0 [rdma_rxe] [ 920.625474] ? rxe_completer+0x329/0xcc0 [rdma_rxe] [ 920.625749] rxe_do_task+0x80/0x110 [rdma_rxe] [ 920.626037] rxe_requester+0x625/0xde0 [rdma_rxe] [ 920.626310] ? rxe_cq_post+0xe2/0x180 [rdma_rxe] [ 920.626583] ? do_complete+0x18d/0x220 [rdma_rxe] [ 920.626812] ? rxe_completer+0x1a3/0xcc0 [rdma_rxe] [ 920.627050] rxe_do_task+0x80/0x110 [rdma_rxe] [ 920.627285] tasklet_action_common.constprop.0+0xa4/0x120 [ 920.627522] handle_softirqs+0xc2/0x250 [ 920.627728] ? sort_range+0x20/0x20 [ 920.627942] run_ksoftirqd+0x1f/0x30 [ 920.628158] smpboot_thread_fn+0xc7/0x1b0 [ 920.628334] kthread+0xd6/0x100 [ 920.628504] ? kthread_complete_and_exit+0x20/0x20 [ 920.628709] ret_from_fork+0x1f/0x30 [ 920.628892] </TASK> Fixes: ae720bdb703b ("RDMA/rxe: Generate error completion for error requester QP state") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> --- drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)