From patchwork Mon Oct 26 14:28:30 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 7490181 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 8FB6FBEEA4 for ; Mon, 26 Oct 2015 14:29:23 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id AD97B206F1 for ; Mon, 26 Oct 2015 14:29:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AD5282062A for ; Mon, 26 Oct 2015 14:29:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754128AbbJZO3Q (ORCPT ); Mon, 26 Oct 2015 10:29:16 -0400 Received: from mga02.intel.com ([134.134.136.20]:23683 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754340AbbJZO3O (ORCPT ); Mon, 26 Oct 2015 10:29:14 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP; 26 Oct 2015 07:29:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,201,1444719600"; d="scan'208";a="819660636" Received: from phlsvsds.ph.intel.com ([10.228.195.38]) by fmsmga001.fm.intel.com with ESMTP; 26 Oct 2015 07:29:13 -0700 Received: from phlsvsds.ph.intel.com (localhost.localdomain [127.0.0.1]) by phlsvsds.ph.intel.com (8.13.8/8.13.8) with ESMTP id t9QETCTK008086; Mon, 26 Oct 2015 10:29:12 -0400 Received: (from iweiny@localhost) by phlsvsds.ph.intel.com (8.13.8/8.13.8/Submit) id t9QETCAZ008083; Mon, 26 Oct 2015 10:29:12 -0400 X-Authentication-Warning: phlsvsds.ph.intel.com: iweiny set sender to ira.weiny@intel.com using -f From: ira.weiny@intel.com To: gregkh@linuxfoundation.org, devel@driverdev.osuosl.org Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, dennis.dalessandro@intel.com, mike.marciniszyn@intel.com, Vennila Megavannan , Ira Weiny Subject: [PATCH v3 04/23] staging/rdma/hfi1: Prevent host software lock up Date: Mon, 26 Oct 2015 10:28:30 -0400 Message-Id: <1445869729-7507-5-git-send-email-ira.weiny@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1445869729-7507-1-git-send-email-ira.weiny@intel.com> References: <1445869729-7507-1-git-send-email-ira.weiny@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Vennila Megavannan If packets stop egressing the hardware link, software can lock up. Implement a timeout for send context halt recovery. This patch increases the timeout for packet egress to 500 us and timer resets to zero if the packet occupancy changes. Also we bounce the link on time out. Reviewed-by: Dean Luick Signed-off-by: Vennila Megavannan Signed-off-by: Ira Weiny --- drivers/staging/rdma/hfi1/pio.c | 14 +++++++++++--- drivers/staging/rdma/hfi1/sdma.c | 15 ++++++++++++--- 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/drivers/staging/rdma/hfi1/pio.c b/drivers/staging/rdma/hfi1/pio.c index 67dd93a6888c..e5c32db4bc67 100644 --- a/drivers/staging/rdma/hfi1/pio.c +++ b/drivers/staging/rdma/hfi1/pio.c @@ -922,10 +922,12 @@ void sc_disable(struct send_context *sc) static void sc_wait_for_packet_egress(struct send_context *sc, int pause) { struct hfi1_devdata *dd = sc->dd; - u64 reg; + u64 reg = 0; + u64 reg_prev; u32 loop = 0; while (1) { + reg_prev = reg; reg = read_csr(dd, sc->hw_context * 8 + SEND_EGRESS_CTXT_STATUS); /* done if egress is stopped */ @@ -934,11 +936,17 @@ static void sc_wait_for_packet_egress(struct send_context *sc, int pause) reg = packet_occupancy(reg); if (reg == 0) break; - if (loop > 100) { + /* counter is reset if occupancy count changes */ + if (reg != reg_prev) + loop = 0; + if (loop > 500) { + /* timed out - bounce the link */ dd_dev_err(dd, - "%s: context %u(%u) timeout waiting for packets to egress, remaining count %u\n", + "%s: context %u(%u) timeout waiting for packets to egress, remaining count %u, bouncing link\n", __func__, sc->sw_index, sc->hw_context, (u32)reg); + queue_work(dd->pport->hfi1_wq, + &dd->pport->link_bounce_work); break; } loop++; diff --git a/drivers/staging/rdma/hfi1/sdma.c b/drivers/staging/rdma/hfi1/sdma.c index 63ab72102183..d57531796723 100644 --- a/drivers/staging/rdma/hfi1/sdma.c +++ b/drivers/staging/rdma/hfi1/sdma.c @@ -303,17 +303,26 @@ static void sdma_wait_for_packet_egress(struct sdma_engine *sde, u64 off = 8 * sde->this_idx; struct hfi1_devdata *dd = sde->dd; int lcnt = 0; + u64 reg_prev; + u64 reg = 0; while (1) { - u64 reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS); + reg_prev = reg; + reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS); reg &= SDMA_EGRESS_PACKET_OCCUPANCY_SMASK; reg >>= SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT; if (reg == 0) break; - if (lcnt++ > 100) { - dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u\n", + /* counter is reest if accupancy count changes */ + if (reg != reg_prev) + lcnt = 0; + if (lcnt++ > 500) { + /* timed out - bounce the link */ + dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u, bouncing link\n", __func__, sde->this_idx, (u32)reg); + queue_work(dd->pport->hfi1_wq, + &dd->pport->link_bounce_work); break; } udelay(1);