From patchwork Thu Jan 24 14:36:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Dalessandro X-Patchwork-Id: 10779119 X-Patchwork-Delegate: dledford@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 845691515 for ; Thu, 24 Jan 2019 14:36:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71CD42FD43 for ; Thu, 24 Jan 2019 14:36:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6597930883; Thu, 24 Jan 2019 14:36:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D37392FD43 for ; Thu, 24 Jan 2019 14:36:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728343AbfAXOg5 (ORCPT ); Thu, 24 Jan 2019 09:36:57 -0500 Received: from mga02.intel.com ([134.134.136.20]:43630 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727649AbfAXOg5 (ORCPT ); Thu, 24 Jan 2019 09:36:57 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jan 2019 06:36:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,516,1539673200"; d="scan'208";a="112332290" Received: from scymds02.sc.intel.com ([10.82.195.37]) by orsmga008.jf.intel.com with ESMTP; 24 Jan 2019 06:36:49 -0800 Received: from scvm10.sc.intel.com (scvm10.sc.intel.com [10.82.195.27]) by scymds02.sc.intel.com with ESMTP id x0OEam7O032079; Thu, 24 Jan 2019 06:36:48 -0800 Received: from scvm10.sc.intel.com (localhost [127.0.0.1]) by scvm10.sc.intel.com with ESMTP id x0OEamWA030174; Thu, 24 Jan 2019 06:36:48 -0800 Subject: [PATCH for-next 15/17] IB/hfi1: Add interlock between a TID RDMA request and other requests From: Dennis Dalessandro To: jgg@ziepe.ca, dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Mike Marciniszyn , Mitko Haralanov , Kaike Wan Date: Thu, 24 Jan 2019 06:36:48 -0800 Message-ID: <20190124143644.29949.15294.stgit@scvm10.sc.intel.com> In-Reply-To: <20190124032400.3371.74628.stgit@scvm10.sc.intel.com> References: <20190124032400.3371.74628.stgit@scvm10.sc.intel.com> User-Agent: StGit/0.17.1-18-g2e886-dirty MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Kaike Wan This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA READ/WRITE requests are interleaved with TID RDMA READ/WRITE requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: Mike Marciniszyn Signed-off-by: Mitko Haralanov Signed-off-by: Kaike Wan Signed-off-by: Dennis Dalessandro --- drivers/infiniband/hw/hfi1/rc.c | 16 ++++++++++++++ drivers/infiniband/hw/hfi1/tid_rdma.c | 37 +++++++++++++++++++++++++++++++++ drivers/infiniband/hw/hfi1/tid_rdma.h | 11 ++++++++++ drivers/infiniband/hw/hfi1/verbs.h | 3 +++ 4 files changed, 67 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c index a5aacf8..349751c 100644 --- a/drivers/infiniband/hw/hfi1/rc.c +++ b/drivers/infiniband/hw/hfi1/rc.c @@ -482,6 +482,15 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) len = wqe->length; ss = &qp->s_sge; bth2 = mask_psn(qp->s_psn); + + /* + * Interlock between various IB requests and TID RDMA + * if necessary. + */ + if ((priv->s_flags & HFI1_S_TID_WAIT_INTERLCK) || + hfi1_tid_rdma_wqe_interlock(qp, wqe)) + goto bail; + switch (wqe->wr.opcode) { case IB_WR_SEND: case IB_WR_SEND_WITH_IMM: @@ -1321,6 +1330,7 @@ static void reset_psn(struct rvt_qp *qp, u32 psn) qp->s_state = OP(SEND_LAST); } done: + priv->s_flags &= ~HFI1_S_TID_WAIT_INTERLCK; qp->s_psn = psn; /* * Set RVT_S_WAIT_PSN as rc_complete() may start the timer @@ -1540,6 +1550,8 @@ struct rvt_swqe *do_rc_completion(struct rvt_qp *qp, struct rvt_swqe *wqe, struct hfi1_ibport *ibp) { + struct hfi1_qp_priv *priv = qp->priv; + lockdep_assert_held(&qp->s_lock); /* * Don't decrement refcount and don't generate a @@ -1608,6 +1620,10 @@ struct rvt_swqe *do_rc_completion(struct rvt_qp *qp, qp->s_draining = 0; wqe = rvt_get_swqe_ptr(qp, qp->s_acked); } + if (priv->s_flags & HFI1_S_TID_WAIT_INTERLCK) { + priv->s_flags &= ~HFI1_S_TID_WAIT_INTERLCK; + hfi1_schedule_send(qp); + } return wqe; } diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.c b/drivers/infiniband/hw/hfi1/tid_rdma.c index f4cda45..135667b 100644 --- a/drivers/infiniband/hw/hfi1/tid_rdma.c +++ b/drivers/infiniband/hw/hfi1/tid_rdma.c @@ -2825,3 +2825,40 @@ void hfi1_qp_kern_exp_rcv_clear_all(struct rvt_qp *qp) } while (!ret); } } + +bool hfi1_tid_rdma_wqe_interlock(struct rvt_qp *qp, struct rvt_swqe *wqe) +{ + struct rvt_swqe *prev; + struct hfi1_qp_priv *priv = qp->priv; + u32 s_prev; + + s_prev = (qp->s_cur == 0 ? qp->s_size : qp->s_cur) - 1; + prev = rvt_get_swqe_ptr(qp, s_prev); + + switch (wqe->wr.opcode) { + case IB_WR_SEND: + case IB_WR_SEND_WITH_IMM: + case IB_WR_SEND_WITH_INV: + case IB_WR_ATOMIC_CMP_AND_SWP: + case IB_WR_ATOMIC_FETCH_AND_ADD: + case IB_WR_RDMA_WRITE: + case IB_WR_RDMA_READ: + break; + case IB_WR_TID_RDMA_READ: + switch (prev->wr.opcode) { + case IB_WR_RDMA_READ: + if (qp->s_acked != qp->s_cur) + goto interlock; + break; + default: + break; + } + default: + break; + } + return false; + +interlock: + priv->s_flags |= HFI1_S_TID_WAIT_INTERLCK; + return true; +} diff --git a/drivers/infiniband/hw/hfi1/tid_rdma.h b/drivers/infiniband/hw/hfi1/tid_rdma.h index 4f85b7e..689a549 100644 --- a/drivers/infiniband/hw/hfi1/tid_rdma.h +++ b/drivers/infiniband/hw/hfi1/tid_rdma.h @@ -17,6 +17,16 @@ #define TID_RDMA_MAX_SEGMENT_SIZE BIT(18) /* 256 KiB (for now) */ #define TID_RDMA_MAX_PAGES (BIT(18) >> PAGE_SHIFT) +/* + * Bit definitions for priv->s_flags. + * These bit flags overload the bit flags defined for the QP's s_flags. + * Due to the fact that these bit fields are used only for the QP priv + * s_flags, there are no collisions. + * + * HFI1_S_TID_WAIT_INTERLCK - QP is waiting for requester interlock + */ +#define HFI1_S_TID_WAIT_INTERLCK BIT(5) + struct tid_rdma_params { struct rcu_head rcu_head; u32 qp; @@ -210,5 +220,6 @@ bool hfi1_handle_kdeth_eflags(struct hfi1_ctxtdata *rcd, void hfi1_tid_rdma_restart_req(struct rvt_qp *qp, struct rvt_swqe *wqe, u32 *bth2); void hfi1_qp_kern_exp_rcv_clear_all(struct rvt_qp *qp); +bool hfi1_tid_rdma_wqe_interlock(struct rvt_qp *qp, struct rvt_swqe *wqe); #endif /* HFI1_TID_RDMA_H */ diff --git a/drivers/infiniband/hw/hfi1/verbs.h b/drivers/infiniband/hw/hfi1/verbs.h index 7642b59..841727a 100644 --- a/drivers/infiniband/hw/hfi1/verbs.h +++ b/drivers/infiniband/hw/hfi1/verbs.h @@ -171,6 +171,9 @@ struct hfi1_qp_priv { u8 hdr_type; /* 9B or 16B */ unsigned long tid_timer_timeout_jiffies; + /* variables for the TID RDMA SE state machine */ + u32 s_flags; + /* For TID RDMA READ */ u32 tid_r_reqs; /* Num of tid reads requested */ u32 tid_r_comp; /* Num of tid reads completed */