diff mbox series

[rdma-next] RDMA/mlx5: print wc status on CQE error and dump needed

Message ID 20211227123806.47530-1-dust.li@linux.alibaba.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [rdma-next] RDMA/mlx5: print wc status on CQE error and dump needed | expand

Commit Message

Dust Li Dec. 27, 2021, 12:38 p.m. UTC
mlx5_handle_error_cqe() only dump the content of the CQE
which is raw hex data, and not straighforward for debug.
Print WC status message when we got CQE error and dump is
need.

Here is an example of how the dmesg log looks like with this:
[166755.330649] infiniband mlx5_0: mlx5_handle_error_cqe:333:(pid 0): WC error: 10, message: remote access error
[166755.332323] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
[166755.332944] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[166755.333574] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[166755.334202] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[166755.334837] 00000030: 00 00 00 00 00 00 88 13 08 03 61 b3 1e a1 42 d3

Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
---
 drivers/infiniband/hw/mlx5/cq.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Leon Romanovsky Jan. 3, 2022, 11:27 a.m. UTC | #1
On Mon, Dec 27, 2021 at 08:38:06PM +0800, Dust Li wrote:
> mlx5_handle_error_cqe() only dump the content of the CQE
> which is raw hex data, and not straighforward for debug.
> Print WC status message when we got CQE error and dump is
> need.
> 
> Here is an example of how the dmesg log looks like with this:
> [166755.330649] infiniband mlx5_0: mlx5_handle_error_cqe:333:(pid 0): WC error: 10, message: remote access error
> [166755.332323] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
> [166755.332944] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [166755.333574] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [166755.334202] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [166755.334837] 00000030: 00 00 00 00 00 00 88 13 08 03 61 b3 1e a1 42 d3
> 
> Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
> ---
>  drivers/infiniband/hw/mlx5/cq.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 

Thanks,
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Jason Gunthorpe Jan. 5, 2022, 7:06 p.m. UTC | #2
On Mon, Dec 27, 2021 at 08:38:06PM +0800, Dust Li wrote:
> mlx5_handle_error_cqe() only dump the content of the CQE
> which is raw hex data, and not straighforward for debug.
> Print WC status message when we got CQE error and dump is
> need.
> 
> Here is an example of how the dmesg log looks like with this:
> [166755.330649] infiniband mlx5_0: mlx5_handle_error_cqe:333:(pid 0): WC error: 10, message: remote access error
> [166755.332323] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
> [166755.332944] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [166755.333574] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [166755.334202] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [166755.334837] 00000030: 00 00 00 00 00 00 88 13 08 03 61 b3 1e a1 42 d3
> 
> Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
> Acked-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/mlx5/cq.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)

Applied to for-next, thanks

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index a190fb581591..66dfadb96c66 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -328,8 +328,11 @@  static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
 	}
 
 	wc->vendor_err = cqe->vendor_err_synd;
-	if (dump)
+	if (dump) {
+		mlx5_ib_warn(dev, "WC error: %d, Message: %s\n",
+				wc->status, ib_wc_status_msg(wc->status));
 		dump_cqe(dev, cqe);
+	}
 }
 
 static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64,