From patchwork Thu Mar 2 03:04:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert LeBlanc X-Patchwork-Id: 9599493 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A3992600CB for ; Thu, 2 Mar 2017 03:11:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 444552654B for ; Thu, 2 Mar 2017 03:11:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 384C127165; Thu, 2 Mar 2017 03:11:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 981BD2850F for ; Thu, 2 Mar 2017 03:11:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753703AbdCBDLr (ORCPT ); Wed, 1 Mar 2017 22:11:47 -0500 Received: from mail-ot0-f193.google.com ([74.125.82.193]:36033 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753576AbdCBDLn (ORCPT ); Wed, 1 Mar 2017 22:11:43 -0500 Received: by mail-ot0-f193.google.com with SMTP id i1so6196417ota.3 for ; Wed, 01 Mar 2017 19:11:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leblancnet-us.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=ExpKYu/AdK77rieXBZTHUSgHjPclJWz+ApRpbQr1NR8=; b=xx+Q9z8G0mHVBgT/Oc2KyTOd5eRAlnp1eVFwvjwOpqYrhTaySCDzs+HrCvCJuBUU/R DWv4OI/VUuUb6RCMDSbo38fLvpjfvseH/IYDBai7u7yMlm2Ee8FmI4G7xcLbd5qzvz8h PYhkTJBknsn5EZpOd5c517SBuJFguOf2Tl/HsJ4DlmSYSebL9oh/bKe09EWt/TUv6/+F jh+fbCRdFUHyyIX0joYywwU6rxvYoRn7NwHDbkOAjSrG+Q2j+S7TOzvJZvcXM/vyn4i8 gqhihxIHWpVQvX4jGYqm/EuXld1svs5IhDZAcmlyRXgFIsVKiVpTfjsonbW6WypZpbE6 hufw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ExpKYu/AdK77rieXBZTHUSgHjPclJWz+ApRpbQr1NR8=; b=cCoegxwTOTpkJGA24koudvtMwpOGiHdwABy0OT1WmkLHlLGCWkKG+DYGvhdRnZqI9h ONXJkh24/5N0pcSVp0U5Snt5PdsPZINH+n8ySRO/Qa4cGsDEQW8XD7wq3OOzyY7mJML+ 6BKgHBqKTICr37lUnyX5H4mqFu0YESIr5TfFbqnEy6ZlGQY8zWzpzIjwYbo2Y9MasPMH GxjwA6SqFTcjE9CpeAre+p17caANLsChaZVn79OygcOnKcbP2aIczWUUJDjAc+xFisPF nvyNbM/rO88HGtEf2YkFML8DFed7aQAsSRlEz1qj57roereyRmGb65ewm/9f7Eg2rPi9 R+jw== X-Gm-Message-State: AMke39lFhiadS5JorHbwBsxIMQRLFG7GDJ9bAjFj003wVQMqIYhJpsABPfgqcJUM3KDEFhAA6eBfVWg39tP/QA== X-Received: by 10.157.1.101 with SMTP id 92mr5536515otu.21.1488423876441; Wed, 01 Mar 2017 19:04:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.129.16 with HTTP; Wed, 1 Mar 2017 19:04:36 -0800 (PST) X-Originating-IP: [2604:ba00:2:1:28d4:104:1d91:f170] From: Robert LeBlanc Date: Wed, 1 Mar 2017 20:04:36 -0700 Message-ID: Subject: mlx5_ib_post_send stuck? To: linux-rdma Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP I've been trying to track down a hang in iSERT that requires rebooting our target and is very inconvenient. Our environment is very complicated so I've tried to reproduce the problem on a much simpler config. I believe I have a situation that triggers the same backtrace (it is hard to know 100% because our production is on 4.4 and Infiniband and my test is on 4.9.0 with RoCE, but they look very similar). I found that there is a bug in iSER (initiator) that triggers the target hang with the ConnectX-4-LX cards. This commit fixes the initiator side, but does not fix the target side, so I keep leveraging the bug to track down the target bug. I include it here in the case it adds useful information. commit 24be606cd3b433412488ce962842489fe6e34a7a Author: Max Gurtovoy Date: Wed Jan 18 00:40:39 2017 +0200 IB/iser: Fix sg_tablesize calculation commit 1e5db6c31ade4150c2e2b1a21e39f776c38fea39 upstream. For devices that can register page list that is bigger than USHRT_MAX, we actually take the wrong value for sg_tablesize. E.g: for CX4 max_fast_reg_page_list_len is 65536 (bigger than USHRT_MAX) so we set sg_tablesize to 0 by mistake. Therefore, each IO that is bigger than 4k splitted to "< 4k" chunks that cause performance degredation. Remove wrong sg_tablesize assignment, and use the value that was set during address resolution handler with the needed casting. Signed-off-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman @@ -3828,7 +3829,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } + printk("mips: start iterating through wr.\n"); for (nreq = 0; wr; nreq++, wr = wr->next) { + printk("mips: nreq=%d\n", nreq); if (unlikely(wr->opcode >= ARRAY_SIZE(mlx5_ib_opcode))) { mlx5_ib_warn(dev, "\n"); err = -EINVAL; @@ -3847,12 +3850,14 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, err = begin_wqe(qp, &seg, &ctrl, wr, &idx, &size, nreq); if (err) { + printk("mips: begin_wqe error: %d\n", err); mlx5_ib_warn(dev, "\n"); err = -ENOMEM; *bad_wr = wr; goto out; } + printk("mips: Starting qp_type switch on %d\n", ibqp->qp_type); switch (ibqp->qp_type) { case IB_QPT_XRC_INI: xrc = seg; @@ -3994,6 +3999,8 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, seg = mlx5_get_send_wqe(qp, 0); break; case IB_QPT_UD: + printk("mips: Starting IB_QPT_UD case.\n"); + dump_wqe(qp, idx, size); set_datagram_seg(seg, wr); seg += sizeof(struct mlx5_wqe_datagram_seg); size += sizeof(struct mlx5_wqe_datagram_seg) / 16; @@ -4015,6 +4022,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, if (unlikely((seg == qend))) seg = mlx5_get_send_wqe(qp, 0); } + printk("mips: Finished IB_QPT_UD case.\n"); break; case MLX5_IB_QPT_REG_UMR: if (wr->opcode != MLX5_IB_WR_UMR) { @@ -4070,11 +4078,12 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, get_fence(fence, wr), next_fence, mlx5_ib_opcode[wr->opcode]); skip_psv: - if (0) + if (1) dump_wqe(qp, idx, size); } out: + printk("mips: out.\n"); if (likely(nreq)) { qp->sq.head += nreq; @@ -4115,6 +4124,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, spin_unlock_irqrestore(&qp->sq.lock, flags); + printk("Finished mlx5_ib_post_send.\n"); return err; } And running `echo "func dump_wqe +p" > /sys/kernel/debug/dynamic_debug/control` gives: [ 170.268608] Starting mlx5_ib_post_send. [ 170.268609] mips: start iterating through wr. [ 170.268609] mips: nreq=0 [ 170.268610] mips: Starting qp_type switch on 2 [ 170.268611] dump wqe at ffff887f30fa78c0 [ 170.268612] 0089a30a 00011b02 00000008 00000000 [ 170.268613] 0000004c 00183b51 0000007f 30f53858 [ 170.268613] mips: out. [ 170.268614] Finished mlx5_ib_post_send. [ 170.268615] Starting mlx5_ib_post_send. [ 170.268615] mips: start iterating through wr. [ 170.268616] mips: nreq=0 [ 170.268616] mips: Starting qp_type switch on 2 [ 170.268617] dump wqe at ffff887f30fa7900 [ 170.268618] 0089a40a 00011b02 00000008 00000000 [ 170.268619] 0000004c 00183b51 0000007f 30f5b700 [ 170.268619] mips: out. [ 170.268620] Finished mlx5_ib_post_send. [ 170.268621] Starting mlx5_ib_post_send. [ 170.268621] mips: start iterating through wr. [ 170.268622] mips: nreq=0 [ 170.268623] mips: Starting qp_type switch on 2 [ 170.268623] dump wqe at ffff887f30fa7940 [ 170.268624] 0089a50a 00011b02 00000008 00000000 [ 170.268625] 0000004c 00183b51 0000007f 30f52038 [ 170.268626] mips: out. [ 170.268626] Finished mlx5_ib_post_send. [ 171.175601] Starting mlx5_ib_post_send. [ 171.176342] mips: start iterating through wr. [ 171.176620] systemd-journald[871]: /dev/kmsg buffer overrun, some messages lost. [ 171.177814] mips: nreq=0 [ 171.178628] mips: Starting qp_type switch on 2 [ 171.179427] dump wqe at ffff887f30fa7980 [ 171.179429] 0089a60a 00011b02 00000008 00000000 [ 171.179431] 0000004c 00183b51 0000007f 30f4eff8 [ 171.179432] mips: out. [ 171.180203] Finished mlx5_ib_post_send. [ 181.674314] Starting mlx5_ib_post_send. [ 181.675136] Starting mlx5_ib_post_send. [ 181.675926] mips: start iterating through wr. [ 181.676721] mips: nreq=0 [ 181.677509] mips: Starting qp_type switch on 4 [ 181.678323] mips: Starting IB_QPT_UD case. [ 181.679133] dump wqe at ffff887f4ac04180 [ 181.679135] 00000000 00000000 00000008 00000000 [ 181.679136] mips: Finished IB_QPT_UD case. [ 181.679933] dump wqe at ffff887f4ac04180 [ 181.679934] 0000060a 00009706 00000008 00000000 [ 181.679936] 80000000 00000000 80000001 30000000 [ 181.679937] 00000000 0cc47a88 07310040 40200000 [ 181.679939] 00000000 00000000 0000ffff c0a80b0e [ 181.679940] 00000018 00001202 0000007f 5d9f8000 [ 181.679941] 000000e8 00001202 0000007f 5d9f8018 [ 181.679942] mips: out. [ 181.680728] Finished mlx5_ib_post_send. [ 181.681588] Starting mlx5_ib_post_send. [ 181.682390] mips: start iterating through wr. [ 181.683197] mips: nreq=0 [ 181.683997] mips: Starting qp_type switch on 2 [ 181.684807] dump wqe at ffff887f30fa79c0 [ 181.684809] 0089a708 00011b02 00000000 00000000 [ 181.684810] 00000000 00000006 00000000 00000000 [ 181.684811] mips: out. [ 181.685626] Finished mlx5_ib_post_send. [ 183.733919] Starting mlx5_ib_post_send. [ 183.734734] Starting mlx5_ib_post_send. [ 183.735530] mips: start iterating through wr. [ 183.736329] mips: nreq=0 [ 183.737122] mips: Starting qp_type switch on 4 [ 183.737926] mips: Starting IB_QPT_UD case. [ 183.738724] dump wqe at ffff887f4ac04200 [ 183.738726] 00000000 00000000 00000008 00000000 [ 183.738727] mips: Finished IB_QPT_UD case. [ 183.739517] dump wqe at ffff887f4ac04200 [ 183.739519] 0000080a 00009706 00000008 00000000 [ 183.739521] 80000000 00000000 80000001 30000000 [ 183.739522] 00000000 0cc47a88 07310040 40200000 [ 183.739523] 00000000 00000000 0000ffff c0a80b0e [ 183.739525] 00000018 00001202 0000007f 653ddc00 [ 183.739526] 000000e8 00001202 0000007f 653ddc18 [ 183.739527] mips: out. [ 183.740315] Finished mlx5_ib_post_send. [ 198.786376] iSCSI Login timeout on Network Portal 0.0.0.0:3260 [ 200.962188] Starting mlx5_ib_post_send. [ 200.963030] Starting mlx5_ib_post_send. [ 200.963845] mips: start iterating through wr. [ 200.964672] mips: nreq=0 [ 200.965507] mips: Starting qp_type switch on 4 [ 200.966338] mips: Starting IB_QPT_UD case. [ 200.967171] dump wqe at ffff887f4ac04280 [ 200.967173] 00000000 00000000 00000008 00000000 [ 200.967174] mips: Finished IB_QPT_UD case. [ 200.967995] dump wqe at ffff887f4ac04280 [ 200.967997] 00000a0a 00009706 00000008 00000000 [ 200.967998] 80000000 00000000 80000001 30000000 [ 200.968000] 00000000 0cc47a88 07310040 40200000 [ 200.968001] 00000000 00000000 0000ffff c0a80b0e [ 200.968003] 00000018 00001202 0000007f 653dd400 [ 200.968004] 000000e8 00001202 0000007f 653dd418 [ 200.968005] mips: out. [ 200.968815] Finished mlx5_ib_post_send. During the test it seems that for the most part wqe stays pretty small, but when the 'issue' starts to happen (at 181) there is more stuff there. The interesting thing is that by adding these statements, the test is able to run longer than without them. It usually hangs within the first 1 GB, but with this patch, it goes for a few GB before hanging. Sagi has been very helpful so far, but he thinks there might be something in the driver preventing the queue from being drained. I don't understand the output to know what it means. Based on what I've seen during the disconnection of iSER, it seemed that qp_type 4 is what was hanging, but the problem may have been on a different type before then. I tried dumping wqe for each op, but it would cause the box to be unresponsive. I'd appreciate any ideas on how to troubleshoot this. Thank you, ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 64b3d11..140f3f3 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -651,13 +651,6 @@ static void iscsi_iser_cleanup_task(struct iscsi_task *task) SHOST_DIX_GUARD_CRC); } - /* - * Limit the sg_tablesize and max_sectors based on the device - * max fastreg page list length. - */ - shost->sg_tablesize = min_t(unsigned short, shost->sg_tablesize, - ib_conn->device->ib_device->attrs.max_fast_reg_page_list_len); - if (iscsi_host_add(shost, ib_conn->device->ib_device->dma_device)) { mutex_unlock(&iser_conn->state_mutex); Based on the backtraces produced by the target, it looks like there is something in mlx5_ib_post_send that may be wonky such that the send Q can't be drained. [ 369.794100] INFO: task iscsi_np:6998 blocked for more than 120 seconds. [ 369.794891] Not tainted 4.9.0+ #5 [ 369.795689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.796525] iscsi_np D 0 6998 2 0x00000084 [ 369.797379] ffff887f5e275d00 0000000000000000 ffff887f66650000 ffff887f7f219300 [ 369.798275] ffff887f3c691600 ffffc9003293fbf0 ffffffff816d86d5 ffff88807ffdba00 [ 369.799190] 0000000200000000 0000000000000000 ffff887f3c691600 7fffffffffffffff [ 369.800120] Call Trace: [ 369.801041] [] ? __schedule+0x195/0x630 [ 369.801977] [] schedule+0x36/0x80 [ 369.802921] [] schedule_timeout+0x21c/0x3a0 [ 369.803877] [] ? vsnprintf+0x34e/0x4d0 [ 369.804833] [] wait_for_completion+0xf2/0x130 [ 369.805760] [] ? wake_up_q+0x80/0x80 [ 369.806666] [] iscsit_stop_session+0x1b1/0x1c0 [iscsi_target_mod] [ 369.807600] [] iscsi_check_for_session_reinstatement+0x1e2/0x270 [iscsi_target_mod] [ 369.808572] [] iscsi_target_check_for_existing_instances+0x30/0x40 [iscsi_target_mod] [ 369.809562] [] iscsi_target_do_login+0x138/0x630 [iscsi_target_mod] [ 369.810562] [] iscsi_target_start_negotiation+0x4e/0xa0 [iscsi_target_mod] [ 369.811577] [] __iscsi_target_login_thread+0x83e/0xf20 [iscsi_target_mod] [ 369.812597] [] ? __schedule+0xc1/0x630 [ 369.813639] [] ? __iscsi_target_login_thread+0xf20/0xf20 [iscsi_target_mod] [ 369.814722] [] iscsi_target_login_thread+0x24/0x30 [iscsi_target_mod] [ 369.815819] [] kthread+0xd9/0xf0 [ 369.816861] [] ? kthread_park+0x60/0x60 [ 369.817909] [] ret_from_fork+0x25/0x30 [ 369.819118] INFO: task iscsi_trx:8458 blocked for more than 120 seconds. [ 369.820435] Not tainted 4.9.0+ #5 [ 369.821754] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.823135] iscsi_trx D 0 8458 2 0x00000084 [ 369.824533] ffff887f5e724d80 0000000000000000 ffff887f66659600 ffff887f7f399300 [ 369.825984] ffff883f5f7a9600 ffffc90035687b70 ffffffff816d86d5 0000000000000000 [ 369.827189] ffff887f65e12400 ffffc90035687b50 ffff883f5f7a9600 7fffffffffffffff [ 369.828283] Call Trace: [ 369.829369] [] ? __schedule+0x195/0x630 [ 369.830475] [] schedule+0x36/0x80 [ 369.831575] [] schedule_timeout+0x21c/0x3a0 [ 369.832698] [] ? mlx5_ib_post_send+0xc8/0x1750 [mlx5_ib] [ 369.833819] [] wait_for_completion+0xf2/0x130 [ 369.834956] [] ? wake_up_q+0x80/0x80 [ 369.836130] [] __ib_drain_sq+0x190/0x1c0 [ib_core] [ 369.837234] [] ? ib_sg_to_pages+0x1a0/0x1a0 [ib_core] [ 369.838342] [] ib_drain_sq+0x25/0x30 [ib_core] [ 369.839435] [] ib_drain_qp+0x12/0x30 [ib_core] [ 369.840519] [] isert_wait_conn+0x5f/0x2d0 [ib_isert] [ 369.841621] [] iscsit_close_connection+0x157/0x860 [iscsi_target_mod] [ 369.842747] [] ? do_syscall_64+0x67/0x180 [ 369.843877] [] iscsit_take_action_for_connection_exit+0x7b/0xf0 [iscsi_target_mod] [ 369.845037] [] iscsi_target_rx_thread+0x95/0xa0 [iscsi_target_mod] [ 369.846189] [] ? iscsi_target_tx_thread+0x1d0/0x1d0 [iscsi_target_mod] [ 369.847276] [] kthread+0xd9/0xf0 [ 369.848333] [] ? kthread_park+0x60/0x60 [ 369.849391] [] ret_from_fork+0x25/0x30 So I added some debug statements in the mlx5 driver. diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index d1e9218..ce5180e 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -3812,6 +3812,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, u8 next_fence = 0; u8 fence; + printk("Starting mlx5_ib_post_send.\n"); if (unlikely(ibqp->qp_type == IB_QPT_GSI)) return mlx5_ib_gsi_post_send(ibqp, wr, bad_wr);