From patchwork Tue Oct 20 02:11:19 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 7441461 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 637BA9F443 for ; Tue, 20 Oct 2015 02:11:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 896EB2078B for ; Tue, 20 Oct 2015 02:11:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 752B720785 for ; Tue, 20 Oct 2015 02:11:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751257AbbJTCLx (ORCPT ); Mon, 19 Oct 2015 22:11:53 -0400 Received: from mga02.intel.com ([134.134.136.20]:37497 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750905AbbJTCLx (ORCPT ); Mon, 19 Oct 2015 22:11:53 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP; 19 Oct 2015 19:11:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,705,1437462000"; d="scan'208";a="814850789" Received: from phlsvsds.ph.intel.com ([10.228.195.38]) by fmsmga001.fm.intel.com with ESMTP; 19 Oct 2015 19:11:52 -0700 Received: from phlsvsds.ph.intel.com (localhost.localdomain [127.0.0.1]) by phlsvsds.ph.intel.com (8.13.8/8.13.8) with ESMTP id t9K2Bp4w008674; Mon, 19 Oct 2015 22:11:51 -0400 Received: (from iweiny@localhost) by phlsvsds.ph.intel.com (8.13.8/8.13.8/Submit) id t9K2BoVO008671; Mon, 19 Oct 2015 22:11:50 -0400 X-Authentication-Warning: phlsvsds.ph.intel.com: iweiny set sender to ira.weiny@intel.com using -f From: ira.weiny@intel.com To: gregkh@linuxfoundation.org, devel@driverdev.osuosl.org Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, dennis.dalessandro@intel.com, mike.marciniszyn@intel.com, Vennila Megavannan , Ira Weiny Subject: [PATCH v2 04/22] staging/rdma/hfi1: Prevent host software lock up Date: Mon, 19 Oct 2015 22:11:19 -0400 Message-Id: <1445307097-8244-5-git-send-email-ira.weiny@intel.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1445307097-8244-1-git-send-email-ira.weiny@intel.com> References: <1445307097-8244-1-git-send-email-ira.weiny@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Vennila Megavannan If packets stop egressing the hardware link, software can lock up. Implement a timeout for send context halt recovery. This patch increases the timeout for packet egress to 500 us and timer resets to zero if the packet occupancy changes. Also we bounce the link on time out. Reviewed-by: Dean Luick Signed-off-by: Vennila Megavannan Signed-off-by: Ira Weiny --- drivers/staging/rdma/hfi1/pio.c | 14 +++++++++++--- drivers/staging/rdma/hfi1/sdma.c | 15 ++++++++++++--- 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/drivers/staging/rdma/hfi1/pio.c b/drivers/staging/rdma/hfi1/pio.c index 67dd93a6888c..e5c32db4bc67 100644 --- a/drivers/staging/rdma/hfi1/pio.c +++ b/drivers/staging/rdma/hfi1/pio.c @@ -922,10 +922,12 @@ void sc_disable(struct send_context *sc) static void sc_wait_for_packet_egress(struct send_context *sc, int pause) { struct hfi1_devdata *dd = sc->dd; - u64 reg; + u64 reg = 0; + u64 reg_prev; u32 loop = 0; while (1) { + reg_prev = reg; reg = read_csr(dd, sc->hw_context * 8 + SEND_EGRESS_CTXT_STATUS); /* done if egress is stopped */ @@ -934,11 +936,17 @@ static void sc_wait_for_packet_egress(struct send_context *sc, int pause) reg = packet_occupancy(reg); if (reg == 0) break; - if (loop > 100) { + /* counter is reset if occupancy count changes */ + if (reg != reg_prev) + loop = 0; + if (loop > 500) { + /* timed out - bounce the link */ dd_dev_err(dd, - "%s: context %u(%u) timeout waiting for packets to egress, remaining count %u\n", + "%s: context %u(%u) timeout waiting for packets to egress, remaining count %u, bouncing link\n", __func__, sc->sw_index, sc->hw_context, (u32)reg); + queue_work(dd->pport->hfi1_wq, + &dd->pport->link_bounce_work); break; } loop++; diff --git a/drivers/staging/rdma/hfi1/sdma.c b/drivers/staging/rdma/hfi1/sdma.c index 63ab72102183..d57531796723 100644 --- a/drivers/staging/rdma/hfi1/sdma.c +++ b/drivers/staging/rdma/hfi1/sdma.c @@ -303,17 +303,26 @@ static void sdma_wait_for_packet_egress(struct sdma_engine *sde, u64 off = 8 * sde->this_idx; struct hfi1_devdata *dd = sde->dd; int lcnt = 0; + u64 reg_prev; + u64 reg = 0; while (1) { - u64 reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS); + reg_prev = reg; + reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS); reg &= SDMA_EGRESS_PACKET_OCCUPANCY_SMASK; reg >>= SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT; if (reg == 0) break; - if (lcnt++ > 100) { - dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u\n", + /* counter is reest if accupancy count changes */ + if (reg != reg_prev) + lcnt = 0; + if (lcnt++ > 500) { + /* timed out - bounce the link */ + dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u, bouncing link\n", __func__, sde->this_idx, (u32)reg); + queue_work(dd->pport->hfi1_wq, + &dd->pport->link_bounce_work); break; } udelay(1);