From patchwork Sun Feb 3 12:43:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wei Hu (Xavier)" X-Patchwork-Id: 10794647 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 423CB1390 for ; Sun, 3 Feb 2019 12:44:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DDC32877B for ; Sun, 3 Feb 2019 12:44:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2159128C88; Sun, 3 Feb 2019 12:44:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 838FE2877B for ; Sun, 3 Feb 2019 12:44:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727891AbfBCMoA (ORCPT ); Sun, 3 Feb 2019 07:44:00 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:3172 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727563AbfBCMoA (ORCPT ); Sun, 3 Feb 2019 07:44:00 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 44D27D182CBE298A5303; Sun, 3 Feb 2019 20:43:58 +0800 (CST) Received: from localhost.localdomain (10.67.212.132) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Sun, 3 Feb 2019 20:43:47 +0800 From: "Wei Hu (Xavier)" To: , CC: , , , , , , , , , Subject: [PATCH V4 rdma-next 3/3] RDMA/hns: Fix the chip hanging caused by sending doorbell during reset Date: Sun, 3 Feb 2019 20:43:15 +0800 Message-ID: <1549197795-23118-4-git-send-email-xavier.huwei@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1549197795-23118-1-git-send-email-xavier.huwei@huawei.com> References: <1549197795-23118-1-git-send-email-xavier.huwei@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.212.132] X-CFilter-Loop: Reflected Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On hi08 chip, There is a possibility of chip hanging when sending doorbell during reset. We can fix it by prohibiting doorbell during reset. Fixes: 2d40788825ac ("RDMA/hns: Add support for processing send wr and receive wr") Signed-off-by: Wei Hu (Xavier) --- v3->v4: Non change. v2->v3: Non change. v1->v2: Non change. --- drivers/infiniband/hw/hns/hns_roce_device.h | 1 + drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 25 ++++++++++++++++--------- drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 11 +++++++++++ 3 files changed, 28 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index 65eb4bc..8ca8d74 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -947,6 +947,7 @@ struct hns_roce_dev { spinlock_t bt_cmd_lock; bool active; bool is_reset; + bool dis_db; unsigned long reset_cnt; struct hns_roce_ib_iboe iboe; diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 109127e..657dd38 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -587,7 +587,7 @@ static int hns_roce_v2_post_send(struct ib_qp *ibqp, roce_set_field(sq_db.parameter, V2_DB_PARAMETER_SL_M, V2_DB_PARAMETER_SL_S, qp->sl); - hns_roce_write64_k((__le32 *)&sq_db, qp->sq.db_reg_l); + hns_roce_write64(hr_dev, (__le32 *)&sq_db, qp->sq.db_reg_l); qp->sq_next_wqe = ind; qp->next_sge = sge_ind; @@ -717,7 +717,7 @@ static int hns_roce_v2_cmd_hw_reseted(struct hns_roce_dev *hr_dev, unsigned long reset_stage) { /* When hardware reset has been completed once or more, we should stop - * sending mailbox&cmq to hardware. If now in .init_instance() + * sending mailbox&cmq&doorbell to hardware. If now in .init_instance() * function, we should exit with error. If now at HNAE3_INIT_CLIENT * stage of soft reset process, we should exit with error, and then * HNAE3_INIT_CLIENT related process can rollback the operation like @@ -726,6 +726,7 @@ static int hns_roce_v2_cmd_hw_reseted(struct hns_roce_dev *hr_dev, * reset process once again. */ hr_dev->is_reset = true; + hr_dev->dis_db = true; if (reset_stage == HNS_ROCE_STATE_RST_INIT || instance_stage == HNS_ROCE_STATE_INIT) @@ -742,8 +743,8 @@ static int hns_roce_v2_cmd_hw_resetting(struct hns_roce_dev *hr_dev, struct hnae3_handle *handle = priv->handle; const struct hnae3_ae_ops *ops = handle->ae_algo->ops; - /* When hardware reset is detected, we should stop sending mailbox&cmq - * to hardware. If now in .init_instance() function, we should + /* When hardware reset is detected, we should stop sending mailbox&cmq& + * doorbell to hardware. If now in .init_instance() function, we should * exit with error. If now at HNAE3_INIT_CLIENT stage of soft reset * process, we should exit with error, and then HNAE3_INIT_CLIENT * related process can rollback the operation like notifing hardware to @@ -751,6 +752,7 @@ static int hns_roce_v2_cmd_hw_resetting(struct hns_roce_dev *hr_dev, * error to notify NIC driver to reschedule soft reset process once * again. */ + hr_dev->dis_db = true; if (!ops->get_hw_reset_stat(handle)) hr_dev->is_reset = true; @@ -768,9 +770,10 @@ static int hns_roce_v2_cmd_sw_resetting(struct hns_roce_dev *hr_dev) const struct hnae3_ae_ops *ops = handle->ae_algo->ops; /* When software reset is detected at .init_instance() function, we - * should stop sending mailbox&cmq to hardware, and exit with - * error. + * should stop sending mailbox&cmq&doorbell to hardware, and exit + * with error. */ + hr_dev->dis_db = true; if (ops->ae_dev_reset_cnt(handle) != hr_dev->reset_cnt) hr_dev->is_reset = true; @@ -2495,6 +2498,7 @@ static void hns_roce_v2_write_cqc(struct hns_roce_dev *hr_dev, static int hns_roce_v2_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags) { + struct hns_roce_dev *hr_dev = to_hr_dev(ibcq->device); struct hns_roce_cq *hr_cq = to_hr_cq(ibcq); u32 notification_flag; u32 doorbell[2]; @@ -2520,7 +2524,7 @@ static int hns_roce_v2_req_notify_cq(struct ib_cq *ibcq, roce_set_bit(doorbell[1], V2_CQ_DB_PARAMETER_NOTIFY_S, notification_flag); - hns_roce_write64_k(doorbell, hr_cq->cq_db_l); + hns_roce_write64(hr_dev, doorbell, hr_cq->cq_db_l); return 0; } @@ -4763,6 +4767,7 @@ static void hns_roce_v2_init_irq_work(struct hns_roce_dev *hr_dev, static void set_eq_cons_index_v2(struct hns_roce_eq *eq) { + struct hns_roce_dev *hr_dev = eq->hr_dev; u32 doorbell[2]; doorbell[0] = 0; @@ -4789,7 +4794,7 @@ static void set_eq_cons_index_v2(struct hns_roce_eq *eq) HNS_ROCE_V2_EQ_DB_PARA_S, (eq->cons_index & HNS_ROCE_V2_CONS_IDX_M)); - hns_roce_write64_k(doorbell, eq->doorbell); + hns_roce_write64(hr_dev, doorbell, eq->doorbell); } static struct hns_roce_aeqe *get_aeqe_v2(struct hns_roce_eq *eq, u32 entry) @@ -6011,6 +6016,7 @@ static int hns_roce_v2_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr, const struct ib_recv_wr **bad_wr) { + struct hns_roce_dev *hr_dev = to_hr_dev(ibsrq->device); struct hns_roce_srq *srq = to_hr_srq(ibsrq); struct hns_roce_v2_wqe_data_seg *dseg; struct hns_roce_v2_db srq_db; @@ -6072,7 +6078,7 @@ static int hns_roce_v2_post_srq_recv(struct ib_srq *ibsrq, srq_db.byte_4 = HNS_ROCE_V2_SRQ_DB << 24 | srq->srqn; srq_db.parameter = srq->head; - hns_roce_write64_k((__le32 *)&srq_db, srq->db_reg_l); + hns_roce_write64(hr_dev, (__le32 *)&srq_db, srq->db_reg_l); } @@ -6309,6 +6315,7 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle) return 0; hr_dev->active = false; + hr_dev->dis_db = true; event.event = IB_EVENT_DEVICE_FATAL; event.device = &hr_dev->ib_dev; diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h index f22094e..6b0486f 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h @@ -1799,4 +1799,15 @@ struct hns_roce_sccc_clr_done { __le32 rsv[5]; }; +static inline void hns_roce_write64(struct hns_roce_dev *hr_dev, __le32 val[2], + void __iomem *dest) +{ + struct hns_roce_v2_priv *priv = (struct hns_roce_v2_priv *)hr_dev->priv; + struct hnae3_handle *handle = priv->handle; + const struct hnae3_ae_ops *ops = handle->ae_algo->ops; + + if (!hr_dev->dis_db && !ops->get_hw_reset_stat(handle)) + hns_roce_write64_k(val, dest); +} + #endif