From patchwork Fri Jun 9 11:01:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Selvin Xavier X-Patchwork-Id: 13273730 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 505D0C7EE25 for ; Fri, 9 Jun 2023 11:15:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237974AbjFILPa (ORCPT ); Fri, 9 Jun 2023 07:15:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238991AbjFILPY (ORCPT ); Fri, 9 Jun 2023 07:15:24 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8F452113 for ; Fri, 9 Jun 2023 04:15:00 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1b034ca1195so5026205ad.2 for ; Fri, 09 Jun 2023 04:15:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; t=1686309300; x=1688901300; h=references:in-reply-to:message-id:date:subject:cc:to:from:from:to :cc:subject:date:message-id:reply-to; bh=2RL3DNUD89yZWW7nAjpOstbptfj1RN29sh757zvczY8=; b=AODVTqVvhRzfXIHBpKp1YK1ABGV+3bNmvhVSZCtZIVPF0AbLhocUHq6ywC9chJVcXX kD/1ap6sTMMSPMmdEIrHkpFx5UtSpcDRcskcRbd2Eb1BeNaWdDLaB/TuY+1nxiWXFQU1 wxzQrPibIjFFQLDZUB2jvjiWg6nSgIVXZX3Xc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686309300; x=1688901300; h=references:in-reply-to:message-id:date:subject:cc:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2RL3DNUD89yZWW7nAjpOstbptfj1RN29sh757zvczY8=; b=R+Lzl6mBBxtSgm586PlqFNYZvwptif94ZsgPnFzhYpZKtxzAZMArf2MXrSMMeuALPy hgGQUjl4knCePDVtdo8ee3YNDLF59uaAYP1q2SPXakC3XJGE+z1MO03qJ11HP4RnGlY/ r0lNNx3CpcNbO8yK6z58MnZgO2SsESRIRo2i5dyrXyurgts3lF+jjA/yDbFoCgsv2VB3 KCmm4G/rSRVjxumokJYCRt7YDHpQoBK2NPBPHKJkCOqY1sR2fRShNv1TEBCMSoANRxvP GZqWJZnrUgrpZSuGj6HrUurD1gj+ucrmcZs4iGey6/ZnlKK37UQt4js66IszMytAkrWw tW6Q== X-Gm-Message-State: AC+VfDyn9Ddh1FMZi2RNN+yzH013pah1CzA6lAYvbLV2E7J3g2ZB+aJJ UReiad0UVOHB8eG/EssI5CTxRg+qgkN5I/Yt+/w= X-Google-Smtp-Source: ACHHUZ48MacTfA81CtVUR5l65MU2V6Qs6SMnarDyy02YWI3FYg2Uat/iqJxCglWlZDnCRHAS7uMORA== X-Received: by 2002:a17:902:a50b:b0:1b2:1b22:196 with SMTP id s11-20020a170902a50b00b001b21b220196mr617324plq.48.1686309300323; Fri, 09 Jun 2023 04:15:00 -0700 (PDT) Received: from dhcp-10-192-206-197.iig.avagotech.net.net ([192.19.234.250]) by smtp.gmail.com with ESMTPSA id q4-20020a170902dac400b001b0142908f7sm2992954plx.291.2023.06.09.04.14.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Jun 2023 04:14:53 -0700 (PDT) From: Selvin Xavier To: jgg@ziepe.ca, leon@kernel.org Cc: linux-rdma@vger.kernel.org, andrew.gospodarek@broadcom.com, kashyap.desai@broadcom.com, Selvin Xavier Subject: [PATCH v2 for-next 15/17] RDMA/bnxt_re: use firmware provided max request timeout Date: Fri, 9 Jun 2023 04:01:52 -0700 Message-Id: <1686308514-11996-16-git-send-email-selvin.xavier@broadcom.com> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1686308514-11996-1-git-send-email-selvin.xavier@broadcom.com> References: <1686308514-11996-1-git-send-email-selvin.xavier@broadcom.com> Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Kashyap Desai Firmware provides max request timeout value as part of hwrm_ver_get API. Driver gets the timeout from firmware and if that interface is not available then fall back to hardcoded timeout value. Also, Add a helper function to check the FW status. Signed-off-by: Kashyap Desai Signed-off-by: Selvin Xavier --- drivers/infiniband/hw/bnxt_re/main.c | 8 ++++ drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 59 ++++++++++++++++++++++++------ drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 4 +- drivers/infiniband/hw/bnxt_re/qplib_res.h | 1 + 4 files changed, 60 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c index 8241154..a2c7d3f 100644 --- a/drivers/infiniband/hw/bnxt_re/main.c +++ b/drivers/infiniband/hw/bnxt_re/main.c @@ -1041,6 +1041,7 @@ static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev) struct bnxt_en_dev *en_dev = rdev->en_dev; struct hwrm_ver_get_output resp = {0}; struct hwrm_ver_get_input req = {0}; + struct bnxt_qplib_chip_ctx *cctx; struct bnxt_fw_msg fw_msg; int rc = 0; @@ -1058,11 +1059,18 @@ static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev) rc); return; } + + cctx = rdev->chip_ctx; rdev->qplib_ctx.hwrm_intf_ver = (u64)le16_to_cpu(resp.hwrm_intf_major) << 48 | (u64)le16_to_cpu(resp.hwrm_intf_minor) << 32 | (u64)le16_to_cpu(resp.hwrm_intf_build) << 16 | le16_to_cpu(resp.hwrm_intf_patch); + + cctx->hwrm_cmd_max_timeout = le16_to_cpu(resp.max_req_timeout); + + if (!cctx->hwrm_cmd_max_timeout) + cctx->hwrm_cmd_max_timeout = RCFW_FW_STALL_MAX_TIMEOUT; } static int bnxt_re_ib_init(struct bnxt_re_dev *rdev) diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c index 8b1b413..99aa1ae 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c @@ -90,6 +90,41 @@ static int bnxt_qplib_map_rc(u8 opcode) } /** + * bnxt_re_is_fw_stalled - Check firmware health + * @rcfw - rcfw channel instance of rdev + * @cookie - cookie to track the command + * @opcode - rcfw submitted for given opcode + * @cbit - bitmap entry of cookie + * + * If firmware has not responded any rcfw command within + * rcfw->max_timeout, consider firmware as stalled. + * + * Returns: + * 0 if firmware is responding + * -ENODEV if firmware is not responding + */ +static int bnxt_re_is_fw_stalled(struct bnxt_qplib_rcfw *rcfw, + u16 cookie, u8 opcode, u16 cbit) +{ + struct bnxt_qplib_cmdq_ctx *cmdq; + + cmdq = &rcfw->cmdq; + + if (time_after(jiffies, cmdq->last_seen + + (rcfw->max_timeout * HZ))) { + dev_warn_ratelimited(&rcfw->pdev->dev, + "%s: FW STALL Detected. cmdq[%#x]=%#x waited (%d > %d) msec active %d ", + __func__, cookie, opcode, + jiffies_to_msecs(jiffies - cmdq->last_seen), + rcfw->max_timeout * 1000, + test_bit(cbit, cmdq->cmdq_bitmap)); + return -ENODEV; + } + + return 0; +} + +/** * __wait_for_resp - Don't hold the cpu context and wait for response * @rcfw - rcfw channel instance of rdev * @cookie - cookie to track the command @@ -105,6 +140,7 @@ static int __wait_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie, u8 opcode) { struct bnxt_qplib_cmdq_ctx *cmdq; u16 cbit; + int ret; cmdq = &rcfw->cmdq; cbit = cookie % rcfw->cmdq_depth; @@ -118,8 +154,8 @@ static int __wait_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie, u8 opcode) wait_event_timeout(cmdq->waitq, !test_bit(cbit, cmdq->cmdq_bitmap) || test_bit(ERR_DEVICE_DETACHED, &cmdq->flags), - msecs_to_jiffies(RCFW_FW_STALL_TIMEOUT_SEC - * 1000)); + msecs_to_jiffies(rcfw->max_timeout * 1000)); + if (!test_bit(cbit, cmdq->cmdq_bitmap)) return 0; @@ -128,10 +164,9 @@ static int __wait_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie, u8 opcode) if (!test_bit(cbit, cmdq->cmdq_bitmap)) return 0; - /* Firmware stall is detected */ - if (time_after(jiffies, cmdq->last_seen + - (RCFW_FW_STALL_TIMEOUT_SEC * HZ))) - return -ENODEV; + ret = bnxt_re_is_fw_stalled(rcfw, cookie, opcode, cbit); + if (ret) + return ret; } while (true); }; @@ -352,6 +387,7 @@ static int __poll_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie, struct bnxt_qplib_cmdq_ctx *cmdq = &rcfw->cmdq; unsigned long issue_time; u16 cbit; + int ret; cbit = cookie % rcfw->cmdq_depth; issue_time = jiffies; @@ -368,11 +404,10 @@ static int __poll_for_resp(struct bnxt_qplib_rcfw *rcfw, u16 cookie, if (!test_bit(cbit, cmdq->cmdq_bitmap)) return 0; if (jiffies_to_msecs(jiffies - issue_time) > - (RCFW_FW_STALL_TIMEOUT_SEC * 1000)) { - /* Firmware stall is detected */ - if (time_after(jiffies, cmdq->last_seen + - (RCFW_FW_STALL_TIMEOUT_SEC * HZ))) - return -ENODEV; + (rcfw->max_timeout * 1000)) { + ret = bnxt_re_is_fw_stalled(rcfw, cookie, opcode, cbit); + if (ret) + return ret; } } while (true); }; @@ -951,6 +986,8 @@ int bnxt_qplib_alloc_rcfw_channel(struct bnxt_qplib_res *res, if (!rcfw->qp_tbl) goto fail; + rcfw->max_timeout = res->cctx->hwrm_cmd_max_timeout; + return 0; fail: diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h index 338bf6a..b644dcc 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.h @@ -51,7 +51,7 @@ #define RCFW_DBR_PCI_BAR_REGION 2 #define RCFW_DBR_BASE_PAGE_SHIFT 12 -#define RCFW_FW_STALL_TIMEOUT_SEC 40 +#define RCFW_FW_STALL_MAX_TIMEOUT 40 /* Cmdq contains a fix number of a 16-Byte slots */ struct bnxt_qplib_cmdqe { @@ -227,6 +227,8 @@ struct bnxt_qplib_rcfw { atomic_t rcfw_intr_enabled; struct semaphore rcfw_inflight; atomic_t timeout_send; + /* cached from chip cctx for quick reference in slow path */ + u16 max_timeout; }; struct bnxt_qplib_cmdqmsg { diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.h b/drivers/infiniband/hw/bnxt_re/qplib_res.h index 982e2c9..77f0b84 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_res.h +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.h @@ -55,6 +55,7 @@ struct bnxt_qplib_chip_ctx { u8 chip_rev; u8 chip_metal; u16 hw_stats_size; + u16 hwrm_cmd_max_timeout; struct bnxt_qplib_drv_modes modes; };