From patchwork Sun Oct 14 18:55:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640783 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8CB1913AD for ; Sun, 14 Oct 2018 18:55:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 43A3A28B9F for ; Sun, 14 Oct 2018 18:55:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 37841291F2; Sun, 14 Oct 2018 18:55:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B7A4B28B9F for ; Sun, 14 Oct 2018 18:55:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CE57121F8E1; Sun, 14 Oct 2018 11:55:42 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9398321F1B2 for ; Sun, 14 Oct 2018 11:55:36 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BC7DA2221; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B4D102AB; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:23 -0400 Message-Id: <1539543332-28679-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/10] lustre: lnd: set device capabilities X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata MLX-4, MLX-5 and OPA support different capabilities. Query the device and cache the capabilities of the device for future use. At the time of the patches creation MLX5 could support fast registration and gaps while MLX4 and OPA only support FMR Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-10129 Reviewed-on: https://review.whamcloud.com/30309 Reviewed-by: Alexey Lyashkov Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 41 ++++++++++++---------- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 5 +++ 2 files changed, 28 insertions(+), 18 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index bf969b3..b10658b 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -1399,6 +1399,7 @@ static int kiblnd_alloc_fmr_pool(struct kib_fmr_poolset *fps, struct kib_fmr_poo else CERROR("FMRs are not supported\n"); } + fpo->fpo_is_fmr = true; return rc; } @@ -1408,6 +1409,8 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, struct kib_fmr_po struct kib_fast_reg_descriptor *frd; int i, rc; + fpo->fpo_is_fmr = false; + INIT_LIST_HEAD(&fpo->fast_reg.fpo_pool_list); fpo->fast_reg.fpo_pool_size = 0; for (i = 0; i < fps->fps_pool_size; i++) { @@ -1469,23 +1472,7 @@ static int kiblnd_create_fmr_pool(struct kib_fmr_poolset *fps, fpo->fpo_hdev = kiblnd_current_hdev(dev); dev_attr = &fpo->fpo_hdev->ibh_ibdev->attrs; - /* Check for FMR or FastReg support */ - fpo->fpo_is_fmr = 0; - if (fpo->fpo_hdev->ibh_ibdev->alloc_fmr && - fpo->fpo_hdev->ibh_ibdev->dealloc_fmr && - fpo->fpo_hdev->ibh_ibdev->map_phys_fmr && - fpo->fpo_hdev->ibh_ibdev->unmap_fmr) { - LCONSOLE_INFO("Using FMR for registration\n"); - fpo->fpo_is_fmr = 1; - } else if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) { - LCONSOLE_INFO("Using FastReg for registration\n"); - } else { - rc = -ENOSYS; - LCONSOLE_ERROR_MSG(rc, "IB device does not support FMRs nor FastRegs, can't register memory\n"); - goto out_fpo; - } - - if (fpo->fpo_is_fmr) + if (dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED) rc = kiblnd_alloc_fmr_pool(fps, fpo); else rc = kiblnd_alloc_freg_pool(fps, fpo); @@ -2261,6 +2248,9 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni, static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) { + struct ib_device_attr *dev_attr = &hdev->ibh_ibdev->attrs; + int rc = 0; + /* * It's safe to assume a HCA can handle a page size * matching that of the native system @@ -2269,7 +2259,22 @@ static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) hdev->ibh_page_size = 1 << PAGE_SHIFT; hdev->ibh_page_mask = ~((__u64)hdev->ibh_page_size - 1); - hdev->ibh_mr_size = hdev->ibh_ibdev->attrs.max_mr_size; + if (hdev->ibh_ibdev->alloc_fmr && + hdev->ibh_ibdev->dealloc_fmr && + hdev->ibh_ibdev->map_phys_fmr && + hdev->ibh_ibdev->unmap_fmr) { + LCONSOLE_INFO("Using FMR for registration\n"); + hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FMR_ENABLED; + } else if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) { + LCONSOLE_INFO("Using FastReg for registration\n"); + hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FASTREG_ENABLED; + } else { + CERROR("IB device does not support FMRs nor FastRegs, can't register memory: %d\n", + rc); + return -ENXIO; + } + + hdev->ibh_mr_size = dev_attr->max_mr_size; if (hdev->ibh_mr_size == ~0ULL) { hdev->ibh_mr_shift = 64; return 0; diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index a4438d2..9f0a47d 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -73,6 +73,10 @@ #define IBLND_N_SCHED 2 #define IBLND_N_SCHED_HIGH 4 +#define IBLND_DEV_CAPS_FASTREG_ENABLED 0x1 +#define IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT 0x2 +#define IBLND_DEV_CAPS_FMR_ENABLED 0x4 + struct kib_tunables { int *kib_dev_failover; /* HCA failover */ unsigned int *kib_service; /* IB service number */ @@ -162,6 +166,7 @@ struct kib_dev { unsigned int ibd_can_failover; /* IPoIB interface is a bonding master */ struct list_head ibd_nets; struct kib_hca_dev *ibd_hdev; + u32 ibd_dev_caps; }; struct kib_hca_dev { From patchwork Sun Oct 14 18:55:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A94D1508 for ; Sun, 14 Oct 2018 18:55:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2EA8F28B9F for ; Sun, 14 Oct 2018 18:55:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 22B23291F2; Sun, 14 Oct 2018 18:55:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 94AF628B9F for ; Sun, 14 Oct 2018 18:55:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36FD821FA38; Sun, 14 Oct 2018 11:55:46 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E7D0B21F20E for ; Sun, 14 Oct 2018 11:55:36 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C18942222; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B7CCB2AC; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:24 -0400 Message-Id: <1539543332-28679-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/10] lustre: o2iblnd: use IB_MR_TYPE_SG_GAPS X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata When allocating fastreg buffers allow the use of IB_MR_TYPE_SG_GAPS instead of IB_MR_TYPE_MEM_REG, since the fragments we provide the fast registration API can have gaps. MEM_REG doesn't handle that case. There is a performance drop when using IB_MR_TYPE_SG_GAPS and it is recommended not to use it. To mitigate this, we added a module parameter, use_fastreg_gaps, which defaults to 0. When allocating the memory region if this parameter is set to 1 and the hw has gaps support then use it and output a warning that performance may drop. Otherwise always use IB_MR_TYPE_MEM_REG. We still want to give user the choice to use this option. Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-10089 Reviewed-on: https://review.whamcloud.com/29551 WC-bug-id: https://jira.whamcloud.com/browse/LU-10394 Reviewed-on: https://review.whamcloud.com/30749 Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 24 +++++++++++++++++++--- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 3 +++ .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 5 +++++ 3 files changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index b10658b..ca3e9ce 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -1404,7 +1404,8 @@ static int kiblnd_alloc_fmr_pool(struct kib_fmr_poolset *fps, struct kib_fmr_poo return rc; } -static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, struct kib_fmr_pool *fpo) +static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, + struct kib_fmr_pool *fpo, u32 dev_caps) { struct kib_fast_reg_descriptor *frd; int i, rc; @@ -1414,6 +1415,8 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, struct kib_fmr_po INIT_LIST_HEAD(&fpo->fast_reg.fpo_pool_list); fpo->fast_reg.fpo_pool_size = 0; for (i = 0; i < fps->fps_pool_size; i++) { + bool fastreg_gaps = false; + frd = kzalloc_cpt(sizeof(*frd), GFP_NOFS, fps->fps_cpt); if (!frd) { CERROR("Failed to allocate a new fast_reg descriptor\n"); @@ -1421,8 +1424,21 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, struct kib_fmr_po goto out; } + /* + * it is expected to get here if this is an MLX-5 card. + * MLX-4 cards will always use FMR and MLX-5 cards will + * always use fast_reg. It turns out that some MLX-5 cards + * (possibly due to older FW versions) do not natively support + * gaps. So we will need to track them here. + */ + if ((*kiblnd_tunables.kib_use_fastreg_gaps == 1) && + (dev_caps & IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT)) { + CWARN("using IB_MR_TYPE_SG_GAPS, expect a performance drop\n"); + fastreg_gaps = true; + } frd->frd_mr = ib_alloc_mr(fpo->fpo_hdev->ibh_pd, - IB_MR_TYPE_MEM_REG, + fastreg_gaps ? IB_MR_TYPE_SG_GAPS : + IB_MR_TYPE_MEM_REG, LNET_MAX_PAYLOAD / PAGE_SIZE); if (IS_ERR(frd->frd_mr)) { rc = PTR_ERR(frd->frd_mr); @@ -1475,7 +1491,7 @@ static int kiblnd_create_fmr_pool(struct kib_fmr_poolset *fps, if (dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED) rc = kiblnd_alloc_fmr_pool(fps, fpo); else - rc = kiblnd_alloc_freg_pool(fps, fpo); + rc = kiblnd_alloc_freg_pool(fps, fpo, dev->ibd_dev_caps); if (rc) goto out_fpo; @@ -2268,6 +2284,8 @@ static int kiblnd_hdev_get_attr(struct kib_hca_dev *hdev) } else if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) { LCONSOLE_INFO("Using FastReg for registration\n"); hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FASTREG_ENABLED; + if (dev_attr->device_cap_flags & IB_DEVICE_SG_GAPS_REG) + hdev->ibh_dev->ibd_dev_caps |= IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT; } else { CERROR("IB device does not support FMRs nor FastRegs, can't register memory: %d\n", rc); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index 9f0a47d..aaf0118 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -94,6 +94,9 @@ struct kib_tunables { int *kib_use_priv_port; /* use privileged port for active connect */ int *kib_nscheds; /* # threads on each CPT */ int *kib_wrq_sge; /* # sg elements per wrq */ + bool *kib_use_fastreg_gaps; /* enable discontiguous fastreg + * fragment support + */ }; extern struct kib_tunables kiblnd_tunables; diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c index 13b19f3..985ccdf 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -111,6 +111,10 @@ module_param(concurrent_sends, int, 0444); MODULE_PARM_DESC(concurrent_sends, "send work-queue sizing"); +static bool use_fastreg_gaps; +module_param(use_fastreg_gaps, bool, 0444); +MODULE_PARM_DESC(use_fastreg_gaps, "Enable discontiguous fastreg fragment support. Expect performance drop"); + #define IBLND_DEFAULT_MAP_ON_DEMAND IBLND_MAX_RDMA_FRAGS static int map_on_demand = IBLND_DEFAULT_MAP_ON_DEMAND; module_param(map_on_demand, int, 0444); @@ -165,6 +169,7 @@ struct kib_tunables kiblnd_tunables = { .kib_use_priv_port = &use_privileged_port, .kib_nscheds = &nscheds, .kib_wrq_sge = &wrq_sge, + .kib_use_fastreg_gaps = &use_fastreg_gaps, }; static struct lnet_ioctl_config_o2iblnd_tunables default_tunables; From patchwork Sun Oct 14 18:55:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640781 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10B8213AD for ; Sun, 14 Oct 2018 18:55:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA84B28B9F for ; Sun, 14 Oct 2018 18:55:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AEA1F291F2; Sun, 14 Oct 2018 18:55:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5E44928B9F for ; Sun, 14 Oct 2018 18:55:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6A89321F25B; Sun, 14 Oct 2018 11:55:40 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 44FCE21F20E for ; Sun, 14 Oct 2018 11:55:37 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C87202223; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BFC6B2AD; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:25 -0400 Message-Id: <1539543332-28679-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/10] lustre: lnd: rework map_on_demand behavior X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alex Zhuravlev , Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata map_on_demand use is ambiguous. In kernels which supported global memory regions, map-on-demand was used to limit the number of RDMA fragments transferred between peers. That tuanble didn't impact the memory allocation since the maximum allowed was always allocated. It was used however as a variable in determining max_send_wr. With the introduction of FMR if the number of fragments exceed the negotiated number of fragments, which is determined by the map-on-demand value, then FMR is used because we can transfer all the fragments in one FMR fragment. The latest kernels have removed support for global memory regions so the use of map-on-demand has become ineffectual. However, we need to keep it for backwards compatibility. The behavior has changed as follows: map_on_demand is a flag used to determine if we can use FMR or FastReg. This is applicable for kernels which support global memory regions. For later kernels this flag is always enabled, since we will always either use FMR or FastReg For kernels which support global memory regions map_on_demand defaults to 0 which means we will be using global memory regions exclusively. If it is set to a value other than 0, then we will behave as follows: 1. Always default the number of fragments to IBLND_MAX_RDMA_FRAGS 2. Create FMR/FastReg pools 3. Negotiate the supported number of fragments per connection 4. Attempt to transmit using global memory regions only if map-on-demand is off, otherwise use FMR or FastReg. 5. In case of transmitting tx with GAPS over FMR we will need to transmit it with multiple fragments. Look at the comments in kiblnd_fmr_map_tx() for an explanation of the behavior. For later kernels we default map_on_demand to 1 and not allow it to be set to 0, since there is no longer support for global memory regions. Behavior: 1. Default the number of fragments to IBLND_MAX_RDMA_FRAGS 2. Create FMR/FastReg pools 3. Negotiate the supported number of fragments per connection 4. Look at the comments in kiblnd_fmr_map_tx() for an explanation of the behavior when transmit with GAPS verses contiguous. Signed-off-by: Amir Shehata Signed-off-by: Alex Zhuravlev WC-bug-id: https://jira.whamcloud.com/browse/LU-10129 Reviewed-on: https://review.whamcloud.com/29995 WC-bug-id: https://jira.whamcloud.com/browse/LU-8573 Reviewed-on: http://review.whamcloud.com/23439 Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 31 ++--- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 37 ++---- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 125 +++++++++++++++------ .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 55 ++++++--- 4 files changed, 154 insertions(+), 94 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index ca3e9ce..8f984a0 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -130,7 +130,6 @@ static int kiblnd_msgtype2size(int type) static int kiblnd_unpack_rd(struct kib_msg *msg, int flip) { struct kib_rdma_desc *rd; - int msg_size; int nob; int n; int i; @@ -149,6 +148,12 @@ static int kiblnd_unpack_rd(struct kib_msg *msg, int flip) n = rd->rd_nfrags; + if (n <= 0 || n > IBLND_MAX_RDMA_FRAGS) { + CERROR("Bad nfrags: %d, should be 0 < n <= %d\n", + n, IBLND_MAX_RDMA_FRAGS); + return 1; + } + nob = offsetof(struct kib_msg, ibm_u) + kiblnd_rd_msg_size(rd, msg->ibm_type, n); @@ -158,13 +163,6 @@ static int kiblnd_unpack_rd(struct kib_msg *msg, int flip) return 1; } - msg_size = kiblnd_rd_size(rd); - if (msg_size <= 0 || msg_size > LNET_MAX_PAYLOAD) { - CERROR("Bad msg_size: %d, should be 0 < n <= %d\n", - msg_size, LNET_MAX_PAYLOAD); - return 1; - } - if (!flip) return 0; @@ -336,7 +334,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp, peer_ni->ibp_nid = nid; peer_ni->ibp_error = 0; peer_ni->ibp_last_alive = 0; - peer_ni->ibp_max_frags = kiblnd_cfg_rdma_frags(peer_ni->ibp_ni); + peer_ni->ibp_max_frags = IBLND_MAX_RDMA_FRAGS; peer_ni->ibp_queue_depth = ni->ni_net->net_tunables.lct_peer_tx_credits; atomic_set(&peer_ni->ibp_refcount, 1); /* 1 ref for caller */ @@ -782,6 +780,12 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, kiblnd_cq_completion, kiblnd_cq_event, conn, &cq_attr); if (IS_ERR(cq)) { + /* + * on MLX-5 (possibly MLX-4 as well) this error could be + * hit if the concurrent_sends and/or peer_tx_credits is set + * too high. Or due to an MLX-5 bug which tries to + * allocate 256kb via kmalloc for WR cookie array + */ CERROR("Failed to create CQ with %d CQEs: %ld\n", IBLND_CQ_ENTRIES(conn), PTR_ERR(cq)); goto failed_2; @@ -1320,9 +1324,8 @@ static void kiblnd_destroy_fmr_pool(struct kib_fmr_pool *fpo) { LASSERT(!fpo->fpo_map_count); - if (fpo->fpo_is_fmr) { - if (fpo->fmr.fpo_fmr_pool) - ib_destroy_fmr_pool(fpo->fmr.fpo_fmr_pool); + if (fpo->fpo_is_fmr && fpo->fmr.fpo_fmr_pool) { + ib_destroy_fmr_pool(fpo->fmr.fpo_fmr_pool); } else { struct kib_fast_reg_descriptor *frd; int i = 0; @@ -1654,7 +1657,7 @@ void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status) int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, struct kib_rdma_desc *rd, __u32 nob, __u64 iov, - struct kib_fmr *fmr, bool *is_fastreg) + struct kib_fmr *fmr) { __u64 *pages = tx->tx_pages; bool is_rx = (rd != tx->tx_rd); @@ -1674,7 +1677,6 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, if (fpo->fpo_is_fmr) { struct ib_pool_fmr *pfmr; - *is_fastreg = 0; spin_unlock(&fps->fps_lock); if (!tx_pages_mapped) { @@ -1694,7 +1696,6 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, } rc = PTR_ERR(pfmr); } else { - *is_fastreg = 1; if (!list_empty(&fpo->fast_reg.fpo_pool_list)) { struct kib_fast_reg_descriptor *frd; struct ib_reg_wr *wr; diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index aaf0118..a6008ea 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -121,9 +121,8 @@ struct kib_tunables { #define IBLND_OOB_CAPABLE(v) ((v) != IBLND_MSG_VERSION_1) #define IBLND_OOB_MSGS(v) (IBLND_OOB_CAPABLE(v) ? 2 : 0) -#define IBLND_FRAG_SHIFT (PAGE_SHIFT - 12) /* frag size on wire is in 4K units */ -#define IBLND_MSG_SIZE (4 << 10) /* max size of queued messages (inc hdr) */ -#define IBLND_MAX_RDMA_FRAGS (LNET_MAX_PAYLOAD >> 12)/* max # of fragments supported in 4K size */ +#define IBLND_MSG_SIZE (4 << 10) /* max size of queued messages (inc hdr) */ +#define IBLND_MAX_RDMA_FRAGS LNET_MAX_IOV /* max # of fragments supported */ /************************/ /* derived constants... */ @@ -141,8 +140,8 @@ struct kib_tunables { /* WRs and CQEs (per connection) */ #define IBLND_RECV_WRS(c) IBLND_RX_MSGS(c) #define IBLND_SEND_WRS(c) \ - (((c->ibc_max_frags + 1) << IBLND_FRAG_SHIFT) * \ - kiblnd_concurrent_sends(c->ibc_version, c->ibc_peer->ibp_ni)) + ((c->ibc_max_frags + 1) * kiblnd_concurrent_sends(c->ibc_version, \ + c->ibc_peer->ibp_ni)) #define IBLND_CQ_ENTRIES(c) (IBLND_RECV_WRS(c) + IBLND_SEND_WRS(c)) struct kib_hca_dev; @@ -288,7 +287,7 @@ struct kib_fmr_pool { time64_t fpo_deadline; /* deadline of this pool */ int fpo_failed; /* fmr pool is failed */ int fpo_map_count; /* # of mapped FMR */ - int fpo_is_fmr; + bool fpo_is_fmr; /* True if FMR pools allocated */ }; struct kib_fmr { @@ -515,7 +514,9 @@ struct kib_tx { /* transmit message */ int tx_nfrags; /* # entries in... */ struct scatterlist *tx_frags; /* dma_map_sg descriptor */ __u64 *tx_pages; /* rdma phys page addrs */ - struct kib_fmr fmr; /* FMR */ + /* gaps in fragments */ + bool tx_gaps; + struct kib_fmr tx_fmr; /* FMR */ int tx_dmadir; /* dma direction */ }; @@ -616,26 +617,6 @@ struct kib_peer_ni { int kiblnd_msg_queue_size(int version, struct lnet_ni *ni); -/* max # of fragments configured by user */ -static inline int -kiblnd_cfg_rdma_frags(struct lnet_ni *ni) -{ - struct lnet_ioctl_config_o2iblnd_tunables *tunables; - int mod; - - tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; - mod = tunables->lnd_map_on_demand; - return mod ? mod : IBLND_MAX_RDMA_FRAGS >> IBLND_FRAG_SHIFT; -} - -static inline int -kiblnd_rdma_frags(int version, struct lnet_ni *ni) -{ - return version == IBLND_MSG_VERSION_1 ? - (IBLND_MAX_RDMA_FRAGS >> IBLND_FRAG_SHIFT) : - kiblnd_cfg_rdma_frags(ni); -} - static inline int kiblnd_concurrent_sends(int version, struct lnet_ni *ni) { @@ -1011,7 +992,7 @@ static inline unsigned int kiblnd_sg_dma_len(struct ib_device *dev, int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, struct kib_rdma_desc *rd, __u32 nob, __u64 iov, - struct kib_fmr *fmr, bool *is_fastreg); + struct kib_fmr *fmr); void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status); int kiblnd_tunables_setup(struct lnet_ni *ni); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index b16153f..9d30f31 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -133,6 +133,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, LASSERT(!tx->tx_lntmsg[1]); LASSERT(!tx->tx_nfrags); + tx->tx_gaps = false; + return tx; } @@ -538,7 +540,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, { struct kib_hca_dev *hdev; struct kib_fmr_poolset *fps; - bool is_fastreg = 0; + struct kib_dev *dev; int cpt; int rc; int i; @@ -546,11 +548,42 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, LASSERT(tx->tx_pool); LASSERT(tx->tx_pool->tpo_pool.po_owner); + dev = net->ibn_dev; hdev = tx->tx_pool->tpo_hdev; cpt = tx->tx_pool->tpo_pool.po_owner->ps_cpt; + /* + * If we're dealing with FastReg, but the device doesn't + * support GAPS and the tx has GAPS, then there is no real point + * in trying to map the memory, because it'll just fail. So + * preemptively fail with an appropriate message + */ + if ((dev->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED) && + !(dev->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT) && + tx->tx_gaps) { + CERROR("Using FastReg with no GAPS support, but tx has gaps\n"); + return -EPROTONOSUPPORT; + } + + /* + * FMR does not support gaps but the tx has gaps then + * we should make sure that the number of fragments we'll be sending + * over fits within the number of fragments negotiated on the + * connection, otherwise, we won't be able to RDMA the data. + * We need to maintain the number of fragments negotiation on the + * connection for backwards compatibility. + */ + if (tx->tx_gaps && (dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED)) { + if (tx->tx_conn && + tx->tx_conn->ibc_max_frags <= rd->rd_nfrags) { + CERROR("TX number of frags (%d) is <= than connection number of frags (%d). Consider setting peer's map_on_demand to 256\n", + tx->tx_nfrags, tx->tx_conn->ibc_max_frags); + return -EFBIG; + } + } + fps = net->ibn_fmr_ps[cpt]; - rc = kiblnd_fmr_pool_map(fps, tx, rd, nob, 0, &tx->fmr, &is_fastreg); + rc = kiblnd_fmr_pool_map(fps, tx, rd, nob, 0, &tx->tx_fmr); if (rc) { CERROR("Can't map %u bytes: %d\n", nob, rc); return rc; @@ -560,15 +593,28 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, * If rd is not tx_rd, it's going to get sent to a peer_ni, * who will need the rkey */ - rd->rd_key = tx->fmr.fmr_key; - if (!is_fastreg) { + rd->rd_key = tx->tx_fmr.fmr_key; + /* + * for FastReg or FMR with no gaps we can accumulate all + * the fragments in one FastReg or FMR fragment. + */ + if (((dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED) && !tx->tx_gaps) || + (dev->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED)) { + /* FMR requires zero based address */ + if (dev->ibd_dev_caps & IBLND_DEV_CAPS_FMR_ENABLED) + rd->rd_frags[0].rf_addr &= ~hdev->ibh_page_mask; + rd->rd_frags[0].rf_nob = nob; + rd->rd_nfrags = 1; + } else { + /* + * We're transmitting with gaps using FMR. + * We'll need to use multiple fragments and identify the + * zero based address of each fragment. + */ for (i = 0; i < rd->rd_nfrags; i++) { rd->rd_frags[i].rf_addr &= ~hdev->ibh_page_mask; rd->rd_frags[i].rf_addr += i << hdev->ibh_page_shift; } - } else { - rd->rd_frags[0].rf_nob = nob; - rd->rd_nfrags = 1; } return 0; @@ -576,8 +622,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, static void kiblnd_unmap_tx(struct kib_tx *tx) { - if (tx->fmr.fmr_pfmr || tx->fmr.fmr_frd) - kiblnd_fmr_pool_unmap(&tx->fmr, tx->tx_status); + if (tx->tx_fmr.fmr_pfmr || tx->tx_fmr.fmr_frd) + kiblnd_fmr_pool_unmap(&tx->tx_fmr, tx->tx_status); if (tx->tx_nfrags) { kiblnd_dma_unmap_sg(tx->tx_pool->tpo_hdev->ibh_ibdev, @@ -656,6 +702,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, fragnob = min((int)(iov->iov_len - offset), nob); fragnob = min(fragnob, (int)PAGE_SIZE - page_offset); + if ((fragnob < (int)PAGE_SIZE - page_offset) && (niov > 1)) { + CDEBUG(D_NET, + "fragnob %d < available page %d: with remaining %d iovs\n", + fragnob, (int)PAGE_SIZE - page_offset, niov); + tx->tx_gaps = true; + } + sg_set_page(sg, page, fragnob, page_offset); sg = sg_next(sg); if (!sg) { @@ -704,6 +757,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, fragnob = min((int)(kiov->bv_len - offset), nob); + if ((fragnob < (int)(kiov->bv_len - offset)) && nkiov > 1) { + CDEBUG(D_NET, + "fragnob %d < available page %d: with remaining %d kiovs\n", + fragnob, (int)(kiov->bv_len - offset), nkiov); + tx->tx_gaps = true; + } + sg_set_page(sg, kiov->bv_page, fragnob, kiov->bv_offset + offset); sg = sg_next(sg); @@ -735,6 +795,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(tx->tx_queued); /* We rely on this for QP sizing */ LASSERT(tx->tx_nwrq > 0 && tx->tx_nsge >= 0); + LASSERT(tx->tx_nwrq <= 1 + conn->ibc_max_frags); LASSERT(!credit || credit == 1); LASSERT(conn->ibc_outstanding_credits >= 0); @@ -814,7 +875,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, /* close_conn will launch failover */ rc = -ENETDOWN; } else { - struct kib_fast_reg_descriptor *frd = tx->fmr.fmr_frd; + struct kib_fast_reg_descriptor *frd = tx->tx_fmr.fmr_frd; const struct ib_send_wr *bad = &tx->tx_wrq[tx->tx_nwrq - 1].wr; struct ib_send_wr *wrq = &tx->tx_wrq[0].wr; @@ -1042,15 +1103,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(!tx->tx_nwrq && !tx->tx_nsge); LASSERT(type == IBLND_MSG_GET_DONE || type == IBLND_MSG_PUT_DONE); - if (kiblnd_rd_size(srcrd) > conn->ibc_max_frags << PAGE_SHIFT) { - CERROR("RDMA is too large for peer_ni %s (%d), src size: %d dst size: %d\n", - libcfs_nid2str(conn->ibc_peer->ibp_nid), - conn->ibc_max_frags << PAGE_SHIFT, - kiblnd_rd_size(srcrd), kiblnd_rd_size(dstrd)); - rc = -EMSGSIZE; - goto too_big; - } - for (srcidx = dstidx = wrq_sge = sge_nob = 0; resid > 0; resid -= sge_nob) { int prev = dstidx; @@ -1067,10 +1119,10 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, break; } - if (tx->tx_nwrq >= IBLND_MAX_RDMA_FRAGS) { + if (tx->tx_nwrq >= conn->ibc_max_frags) { CERROR("RDMA has too many fragments for peer_ni %s (%d), src idx/frags: %d/%d dst idx/frags: %d/%d\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), - IBLND_MAX_RDMA_FRAGS, + conn->ibc_max_frags, srcidx, srcrd->rd_nfrags, dstidx, dstrd->rd_nfrags); rc = -EMSGSIZE; @@ -1110,7 +1162,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, } tx->tx_nsge++; } -too_big: + if (rc < 0) { /* no RDMA if completing with failure */ tx->tx_nsge = 0; tx->tx_nwrq = 0; @@ -2335,21 +2387,20 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, goto failed; } - max_frags = reqmsg->ibm_u.connparams.ibcp_max_frags >> IBLND_FRAG_SHIFT; - if (max_frags > kiblnd_rdma_frags(version, ni)) { + max_frags = reqmsg->ibm_u.connparams.ibcp_max_frags; + if (max_frags > IBLND_MAX_RDMA_FRAGS) { CWARN("Can't accept conn from %s (version %x): max message size %d is too large (%d wanted)\n", libcfs_nid2str(nid), version, max_frags, - kiblnd_rdma_frags(version, ni)); + IBLND_MAX_RDMA_FRAGS); if (version >= IBLND_MSG_VERSION) rej.ibr_why = IBLND_REJECT_RDMA_FRAGS; goto failed; - } else if (max_frags < kiblnd_rdma_frags(version, ni) && - !net->ibn_fmr_ps) { + } else if (max_frags < IBLND_MAX_RDMA_FRAGS && !net->ibn_fmr_ps) { CWARN("Can't accept conn from %s (version %x): max message size %d incompatible without FMR pool (%d wanted)\n", libcfs_nid2str(nid), version, max_frags, - kiblnd_rdma_frags(version, ni)); + IBLND_MAX_RDMA_FRAGS); if (version == IBLND_MSG_VERSION) rej.ibr_why = IBLND_REJECT_RDMA_FRAGS; @@ -2495,7 +2546,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, kiblnd_init_msg(ackmsg, IBLND_MSG_CONNACK, sizeof(ackmsg->ibm_u.connparams)); ackmsg->ibm_u.connparams.ibcp_queue_depth = conn->ibc_queue_depth; - ackmsg->ibm_u.connparams.ibcp_max_frags = conn->ibc_max_frags << IBLND_FRAG_SHIFT; + ackmsg->ibm_u.connparams.ibcp_max_frags = conn->ibc_max_frags; ackmsg->ibm_u.connparams.ibcp_max_msg_size = IBLND_MSG_SIZE; kiblnd_pack_msg(ni, ackmsg, version, 0, nid, reqmsg->ibm_srcstamp); @@ -2528,7 +2579,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, failed: if (ni) { rej.ibr_cp.ibcp_queue_depth = kiblnd_msg_queue_size(version, ni); - rej.ibr_cp.ibcp_max_frags = kiblnd_rdma_frags(version, ni); + rej.ibr_cp.ibcp_max_frags = IBLND_MAX_RDMA_FRAGS; lnet_ni_decref(ni); } @@ -2556,7 +2607,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (cp) { msg_size = cp->ibcp_max_msg_size; - frag_num = cp->ibcp_max_frags << IBLND_FRAG_SHIFT; + frag_num = cp->ibcp_max_frags; queue_dep = cp->ibcp_queue_depth; } @@ -2590,6 +2641,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, goto out; } tunables = &peer_ni->ibp_ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; + /* + * This check only makes sense if the kernel supports global + * memory registration. Otherwise, map_on_demand will never == 0 + */ if (!tunables->lnd_map_on_demand) { reason = "map_on_demand must be enabled"; goto out; @@ -2829,11 +2884,11 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, goto failed; } - if ((msg->ibm_u.connparams.ibcp_max_frags >> IBLND_FRAG_SHIFT) > + if ((msg->ibm_u.connparams.ibcp_max_frags) > conn->ibc_max_frags) { CERROR("%s has incompatible max_frags %d (<=%d wanted)\n", libcfs_nid2str(peer_ni->ibp_nid), - msg->ibm_u.connparams.ibcp_max_frags >> IBLND_FRAG_SHIFT, + msg->ibm_u.connparams.ibcp_max_frags, conn->ibc_max_frags); rc = -EPROTO; goto failed; @@ -2867,7 +2922,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, conn->ibc_credits = msg->ibm_u.connparams.ibcp_queue_depth; conn->ibc_reserved_credits = msg->ibm_u.connparams.ibcp_queue_depth; conn->ibc_queue_depth = msg->ibm_u.connparams.ibcp_queue_depth; - conn->ibc_max_frags = msg->ibm_u.connparams.ibcp_max_frags >> IBLND_FRAG_SHIFT; + conn->ibc_max_frags = msg->ibm_u.connparams.ibcp_max_frags; LASSERT(conn->ibc_credits + conn->ibc_reserved_credits + IBLND_OOB_MSGS(ver) <= IBLND_RX_MSGS(conn)); @@ -2924,7 +2979,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, memset(msg, 0, sizeof(*msg)); kiblnd_init_msg(msg, IBLND_MSG_CONNREQ, sizeof(msg->ibm_u.connparams)); msg->ibm_u.connparams.ibcp_queue_depth = conn->ibc_queue_depth; - msg->ibm_u.connparams.ibcp_max_frags = conn->ibc_max_frags << IBLND_FRAG_SHIFT; + msg->ibm_u.connparams.ibcp_max_frags = conn->ibc_max_frags; msg->ibm_u.connparams.ibcp_max_msg_size = IBLND_MSG_SIZE; kiblnd_pack_msg(peer_ni->ibp_ni, msg, version, diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c index 985ccdf..7fc6a8a 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -115,10 +115,37 @@ module_param(use_fastreg_gaps, bool, 0444); MODULE_PARM_DESC(use_fastreg_gaps, "Enable discontiguous fastreg fragment support. Expect performance drop"); -#define IBLND_DEFAULT_MAP_ON_DEMAND IBLND_MAX_RDMA_FRAGS +/* + * map_on_demand is a flag used to determine if we can use FMR or FastReg. + * This is applicable for kernels which support global memory regions. For + * later kernels this flag is always enabled, since we will always either + * use FMR or FastReg + * For kernels which support global memory regions map_on_demand defaults + * to 0 which means we will be using global memory regions exclusively. + * If it is set to a value other than 0, then we will behave as follows: + * 1. Always default the number of fragments to IBLND_MAX_RDMA_FRAGS + * 2. Create FMR/FastReg pools + * 3. Negotiate the supported number of fragments per connection + * 4. Attempt to transmit using global memory regions only if + * map-on-demand is not turned on, otherwise use FMR or FastReg + * 5. In case of transmitting tx with GAPS over FMR we will need to + * transmit it with multiple fragments. Look at the comments in + * kiblnd_fmr_map_tx() for an explanation of the behavior. + * + * For later kernels we default map_on_demand to 1 and not allow + * it to be set to 0, since there is no longer support for global memory + * regions. Behavior: + * 1. Default the number of fragments to IBLND_MAX_RDMA_FRAGS + * 2. Create FMR/FastReg pools + * 3. Negotiate the supported number of fragments per connection + * 4. Look at the comments in kiblnd_fmr_map_tx() for an explanation of + * the behavior when transmit with GAPS verses contiguous. + */ + +#define IBLND_DEFAULT_MAP_ON_DEMAND 1 static int map_on_demand = IBLND_DEFAULT_MAP_ON_DEMAND; module_param(map_on_demand, int, 0444); -MODULE_PARM_DESC(map_on_demand, "map on demand"); +MODULE_PARM_DESC(map_on_demand, "map on demand (obsolete)"); /* NB: this value is shared by all CPTs, it can grow at runtime */ static int fmr_pool_size = 512; @@ -234,6 +261,13 @@ int kiblnd_tunables_setup(struct lnet_ni *ni) net_tunables->lct_peer_tx_credits = net_tunables->lct_max_tx_credits; + /* + * For kernels which do not support global memory regions, always + * enable map_on_demand + */ + if (tunables->lnd_map_on_demand == 0) + tunables->lnd_map_on_demand = 1; + if (!tunables->lnd_peercredits_hiw) tunables->lnd_peercredits_hiw = peer_credits_hiw; @@ -243,19 +277,8 @@ int kiblnd_tunables_setup(struct lnet_ni *ni) if (tunables->lnd_peercredits_hiw >= net_tunables->lct_peer_tx_credits) tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits - 1; - if (tunables->lnd_map_on_demand <= 0 || - tunables->lnd_map_on_demand > IBLND_MAX_RDMA_FRAGS) { - /* Use the default */ - CWARN("Invalid map_on_demand (%d), expects 1 - %d. Using default of %d\n", - tunables->lnd_map_on_demand, - IBLND_MAX_RDMA_FRAGS, IBLND_DEFAULT_MAP_ON_DEMAND); - tunables->lnd_map_on_demand = IBLND_DEFAULT_MAP_ON_DEMAND; - } - - if (tunables->lnd_map_on_demand == 1) { - /* don't make sense to create map if only one fragment */ - tunables->lnd_map_on_demand = 2; - } + if (tunables->lnd_concurrent_sends == 0) + tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits; if (!tunables->lnd_concurrent_sends) { if (tunables->lnd_map_on_demand > 0 && @@ -299,7 +322,7 @@ int kiblnd_tunables_setup(struct lnet_ni *ni) void kiblnd_tunables_init(void) { default_tunables.lnd_version = 0; - default_tunables.lnd_peercredits_hiw = peer_credits_hiw, + default_tunables.lnd_peercredits_hiw = peer_credits_hiw; default_tunables.lnd_map_on_demand = map_on_demand; default_tunables.lnd_concurrent_sends = concurrent_sends; default_tunables.lnd_fmr_pool_size = fmr_pool_size; From patchwork Sun Oct 14 18:55:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640791 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E71E13AD for ; Sun, 14 Oct 2018 18:56:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9CCE291F2 for ; Sun, 14 Oct 2018 18:56:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB613293CC; Sun, 14 Oct 2018 18:56:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8F09828B9F for ; Sun, 14 Oct 2018 18:56:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 03D3C21FB47; Sun, 14 Oct 2018 11:55:49 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9756421F20E for ; Sun, 14 Oct 2018 11:55:37 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CD8172237; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C87D32A5; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:26 -0400 Message-Id: <1539543332-28679-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/10] lustre: lnd: use less CQ entries for each connection X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Alexey Lyashkov Currently we have a 2 work requests chains per transfer. It mean OFED stack will generate only 2 events if transfer will faild. Reduce number CQ entries to avoid extra resource consumption. Signed-off-by: Alexey Lyashkov WC-bug-id: https://jira.whamcloud.com/browse/LU-9810 Seagate-bug-id: MRP-4508 Reviewed-on: https://review.whamcloud.com/28279 Reviewed-by: Dmitry Eremin Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index a6008ea..cd64cfb 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -142,7 +142,9 @@ struct kib_tunables { #define IBLND_SEND_WRS(c) \ ((c->ibc_max_frags + 1) * kiblnd_concurrent_sends(c->ibc_version, \ c->ibc_peer->ibp_ni)) -#define IBLND_CQ_ENTRIES(c) (IBLND_RECV_WRS(c) + IBLND_SEND_WRS(c)) +#define IBLND_CQ_ENTRIES(c) \ + (IBLND_RECV_WRS(c) + 2 * kiblnd_concurrent_sends(c->ibc_version, \ + c->ibc_peer->ibp_ni)) struct kib_hca_dev; From patchwork Sun Oct 14 18:55:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3CF91508 for ; Sun, 14 Oct 2018 18:55:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7BC1D28B9F for ; Sun, 14 Oct 2018 18:55:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 704A0291F2; Sun, 14 Oct 2018 18:55:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D3E9528B9F for ; Sun, 14 Oct 2018 18:55:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5591321F958; Sun, 14 Oct 2018 11:55:43 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D674621F2FA for ; Sun, 14 Oct 2018 11:55:37 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CD1922236; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C95C82AB; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:27 -0400 Message-Id: <1539543332-28679-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/10] lustre: o2iblnd: limit cap.max_send_wr for MLX5 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Dmitry Eremin Decrease cap.max_send_wr until it is accepted by rdma_create_qp() Signed-off-by: Dmitry Eremin WC-bug-id: https://jira.whamcloud.com/browse/LU-7124 Reviewed-on: http://review.whamcloud.com/18347 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 8f984a0..99a4650 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -812,7 +812,14 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, conn->ibc_sched = sched; - rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); + do { + rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); + if (!rc || init_qp_attr->cap.max_send_wr < 16) + break; + + init_qp_attr->cap.max_send_wr -= init_qp_attr->cap.max_send_wr / 4; + } while (rc); + if (rc) { CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d, send_sge: %d, recv_sge: %d\n", rc, init_qp_attr->cap.max_send_wr, @@ -822,6 +829,10 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, goto failed_2; } + if (init_qp_attr->cap.max_send_wr != IBLND_SEND_WRS(conn)) + CDEBUG(D_NET, "original send wr %d, created with %d\n", + IBLND_SEND_WRS(conn), init_qp_attr->cap.max_send_wr); + kfree(init_qp_attr); /* 1 ref for caller and each rxmsg */ From patchwork Sun Oct 14 18:55:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D83C013AD for ; Sun, 14 Oct 2018 18:55:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9084628B9F for ; Sun, 14 Oct 2018 18:55:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8260D291F2; Sun, 14 Oct 2018 18:55:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0861828B9F for ; Sun, 14 Oct 2018 18:55:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CD28921F5E8; Sun, 14 Oct 2018 11:55:39 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2556821F2FA for ; Sun, 14 Oct 2018 11:55:38 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D2E8E2238; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CEAFA2AC; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:28 -0400 Message-Id: <1539543332-28679-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/10] lustre: lnd: calculate qp max_send_wrs properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata The maximum in-flight transfers can not exceed the negotiated queue depth. Instead of calculating the max_send_wrs to be the negotiated number of frags * concurrent sends, it should be the negotiated number of frags * queue depth. If that value is too large for successful qp creation then we reduce the queue depth in a loop until we successfully create the qp or the queue depth dips below 2. Due to the queue depth negotiation protocol it is guaranteed that the queue depth on both the active and the passive will match. This change resolves the discrepancy created by the previous code which reduces max_send_wr by a quarter. That could lead to: mlx5_ib_post_send:4184:(pid 26272): Failed to prepare WQE When the o2iblnd transfers a message which requires more WRs than the max that has been allocated. Signed-off-by: Amir Shehata Signed-off-by: Alexey Lyashkov WC-bug-id: https://jira.whamcloud.com/browse/LU-10213 Reviewed-on: https://review.whamcloud.com/30310 Reviewed-by: Alexey Lyashkov Reviewed-by: Dmitry Eremin Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 30 +++++++++++++++++----- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 4 +-- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 99a4650..43266d8 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -650,6 +650,19 @@ static struct kib_sched_info *kiblnd_get_scheduler(int cpt) return NULL; } +static unsigned int kiblnd_send_wrs(struct kib_conn *conn) +{ + /* + * One WR for the LNet message + * And ibc_max_frags for the transfer WRs + */ + unsigned int ret = 1 + conn->ibc_max_frags; + + /* account for a maximum of ibc_queue_depth in-flight transfers */ + ret *= conn->ibc_queue_depth; + return ret; +} + struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, struct rdma_cm_id *cmid, int state, int version) @@ -801,8 +814,6 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, init_qp_attr->event_handler = kiblnd_qp_event; init_qp_attr->qp_context = conn; - init_qp_attr->cap.max_send_wr = IBLND_SEND_WRS(conn); - init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); init_qp_attr->cap.max_send_sge = *kiblnd_tunables.kib_wrq_sge; init_qp_attr->cap.max_recv_sge = 1; init_qp_attr->sq_sig_type = IB_SIGNAL_REQ_WR; @@ -813,11 +824,14 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, conn->ibc_sched = sched; do { + init_qp_attr->cap.max_send_wr = kiblnd_send_wrs(conn); + init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); + rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); - if (!rc || init_qp_attr->cap.max_send_wr < 16) + if (!rc || conn->ibc_queue_depth < 2) break; - init_qp_attr->cap.max_send_wr -= init_qp_attr->cap.max_send_wr / 4; + conn->ibc_queue_depth--; } while (rc); if (rc) { @@ -829,9 +843,11 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, goto failed_2; } - if (init_qp_attr->cap.max_send_wr != IBLND_SEND_WRS(conn)) - CDEBUG(D_NET, "original send wr %d, created with %d\n", - IBLND_SEND_WRS(conn), init_qp_attr->cap.max_send_wr); + if (conn->ibc_queue_depth != peer_ni->ibp_queue_depth) + CWARN("peer %s - queue depth reduced from %u to %u to allow for qp creation\n", + libcfs_nid2str(peer_ni->ibp_nid), + peer_ni->ibp_queue_depth, + conn->ibc_queue_depth); kfree(init_qp_attr); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index cd64cfb..c6c8106 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -139,9 +139,7 @@ struct kib_tunables { /* WRs and CQEs (per connection) */ #define IBLND_RECV_WRS(c) IBLND_RX_MSGS(c) -#define IBLND_SEND_WRS(c) \ - ((c->ibc_max_frags + 1) * kiblnd_concurrent_sends(c->ibc_version, \ - c->ibc_peer->ibp_ni)) + #define IBLND_CQ_ENTRIES(c) \ (IBLND_RECV_WRS(c) + 2 * kiblnd_concurrent_sends(c->ibc_version, \ c->ibc_peer->ibp_ni)) From patchwork Sun Oct 14 18:55:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640795 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F3EF13AD for ; Sun, 14 Oct 2018 18:56:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA9F028B9F for ; Sun, 14 Oct 2018 18:56:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF834293FC; Sun, 14 Oct 2018 18:56:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 31BE828B9F for ; Sun, 14 Oct 2018 18:56:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8AB2C21FCDC; Sun, 14 Oct 2018 11:55:51 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 78CE721F42E for ; Sun, 14 Oct 2018 11:55:38 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D79CA2239; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D3EE22AD; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:29 -0400 Message-Id: <1539543332-28679-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/10] lustre: lnd: remove concurrent_sends tunable X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata Concurrent sends tunable was intended to limit the number of in-flight transfers per connection. However queue depth does the exact same job. So for example if the queue depth is negotiated to 16 and concurrent_sends is set to 32, the maximum number of in-flight transfers doesn't exceed 16. There is no need to keep concurrent_sends around since it doesn't add any unique functionality Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-10291 Reviewed-on: https://review.whamcloud.com/30312 WC-bug-id: https://jira.whamcloud.com/browse/LU-10459 Reviewed-on: https://review.whamcloud.com/30751 Reviewed-by: Alexey Lyashkov Reviewed-by: Dmitry Eremin Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 23 +---------------- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 5 ++-- .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 29 +--------------------- 3 files changed, 4 insertions(+), 53 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index c6c8106..c882345 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -141,8 +141,7 @@ struct kib_tunables { #define IBLND_RECV_WRS(c) IBLND_RX_MSGS(c) #define IBLND_CQ_ENTRIES(c) \ - (IBLND_RECV_WRS(c) + 2 * kiblnd_concurrent_sends(c->ibc_version, \ - c->ibc_peer->ibp_ni)) + (IBLND_RECV_WRS(c) + 2 * c->ibc_queue_depth) struct kib_hca_dev; @@ -617,26 +616,6 @@ struct kib_peer_ni { int kiblnd_msg_queue_size(int version, struct lnet_ni *ni); -static inline int -kiblnd_concurrent_sends(int version, struct lnet_ni *ni) -{ - struct lnet_ioctl_config_o2iblnd_tunables *tunables; - int concurrent_sends; - - tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; - concurrent_sends = tunables->lnd_concurrent_sends; - - if (version == IBLND_MSG_VERSION_1) { - if (concurrent_sends > IBLND_MSG_QUEUE_SIZE_V1 * 2) - return IBLND_MSG_QUEUE_SIZE_V1 * 2; - - if (concurrent_sends < IBLND_MSG_QUEUE_SIZE_V1 / 2) - return IBLND_MSG_QUEUE_SIZE_V1 / 2; - } - - return concurrent_sends; -} - static inline void kiblnd_hdev_addref_locked(struct kib_hca_dev *hdev) { diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index 9d30f31..1f31798 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -787,7 +787,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, { struct kib_msg *msg = tx->tx_msg; struct kib_peer_ni *peer_ni = conn->ibc_peer; - struct lnet_ni *ni = peer_ni->ibp_ni; int ver = conn->ibc_version; int rc; int done; @@ -803,7 +802,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(conn->ibc_credits >= 0); LASSERT(conn->ibc_credits <= conn->ibc_queue_depth); - if (conn->ibc_nsends_posted == kiblnd_concurrent_sends(ver, ni)) { + if (conn->ibc_nsends_posted == conn->ibc_queue_depth) { /* tx completions outstanding... */ CDEBUG(D_NET, "%s: posted enough\n", libcfs_nid2str(peer_ni->ibp_nid)); @@ -953,7 +952,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, return; } - LASSERT(conn->ibc_nsends_posted <= kiblnd_concurrent_sends(ver, ni)); + LASSERT(conn->ibc_nsends_posted <= conn->ibc_queue_depth); LASSERT(!IBLND_OOB_CAPABLE(ver) || conn->ibc_noops_posted <= IBLND_OOB_MSGS(ver)); LASSERT(conn->ibc_reserved_credits >= 0); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c index 7fc6a8a..47e8a60 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -109,7 +109,7 @@ static int concurrent_sends; module_param(concurrent_sends, int, 0444); -MODULE_PARM_DESC(concurrent_sends, "send work-queue sizing"); +MODULE_PARM_DESC(concurrent_sends, "send work-queue sizing (obsolete)"); static bool use_fastreg_gaps; module_param(use_fastreg_gaps, bool, 0444); @@ -277,32 +277,6 @@ int kiblnd_tunables_setup(struct lnet_ni *ni) if (tunables->lnd_peercredits_hiw >= net_tunables->lct_peer_tx_credits) tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits - 1; - if (tunables->lnd_concurrent_sends == 0) - tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits; - - if (!tunables->lnd_concurrent_sends) { - if (tunables->lnd_map_on_demand > 0 && - tunables->lnd_map_on_demand <= IBLND_MAX_RDMA_FRAGS / 8) { - tunables->lnd_concurrent_sends = - net_tunables->lct_peer_tx_credits * 2; - } else { - tunables->lnd_concurrent_sends = - net_tunables->lct_peer_tx_credits; - } - } - - if (tunables->lnd_concurrent_sends > net_tunables->lct_peer_tx_credits * 2) - tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits * 2; - - if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits / 2) - tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits / 2; - - if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits) { - CWARN("Concurrent sends %d is lower than message queue size: %d, performance may drop slightly.\n", - tunables->lnd_concurrent_sends, - net_tunables->lct_peer_tx_credits); - } - if (!tunables->lnd_fmr_pool_size) tunables->lnd_fmr_pool_size = fmr_pool_size; if (!tunables->lnd_fmr_flush_trigger) @@ -324,7 +298,6 @@ void kiblnd_tunables_init(void) default_tunables.lnd_version = 0; default_tunables.lnd_peercredits_hiw = peer_credits_hiw; default_tunables.lnd_map_on_demand = map_on_demand; - default_tunables.lnd_concurrent_sends = concurrent_sends; default_tunables.lnd_fmr_pool_size = fmr_pool_size; default_tunables.lnd_fmr_flush_trigger = fmr_flush_trigger; default_tunables.lnd_fmr_cache = fmr_cache; From patchwork Sun Oct 14 18:55:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640789 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3230A13AD for ; Sun, 14 Oct 2018 18:55:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DE69928B9F for ; Sun, 14 Oct 2018 18:55:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D2B75291F2; Sun, 14 Oct 2018 18:55:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 822E928B9F for ; Sun, 14 Oct 2018 18:55:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C7B4C21FA74; Sun, 14 Oct 2018 11:55:46 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA02621F42E for ; Sun, 14 Oct 2018 11:55:38 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id DA69E223C; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D89532A5; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:30 -0400 Message-Id: <1539543332-28679-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/10] lustre: lnd: correct WR fast reg accounting X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata Ensure that enough WRs are allocated for the fast reg case which needs two additional WRs per transfer: the first for memory window registration and the second for memory window invalidation. Failure to allocate these causes the following problem: mlx5_warn:mlx5_0:begin_wqe:4085(pid 9590): work queue overflow Signed-off-by: Amir Shehata Signed-off-by: Alexey Lyashkov WC-bug-id: https://jira.whamcloud.com/browse/LU-9943 Reviewed-on: https://review.whamcloud.com/30311 Reviewed-by: Alexey Lyashkov Reviewed-by: Dmitry Eremin Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 43266d8..66aa45f 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -656,8 +656,13 @@ static unsigned int kiblnd_send_wrs(struct kib_conn *conn) * One WR for the LNet message * And ibc_max_frags for the transfer WRs */ + u32 dev_caps = conn->ibc_hdev->ibh_dev->ibd_dev_caps; unsigned int ret = 1 + conn->ibc_max_frags; + /* FastReg needs two extra WRs for map and invalidate */ + if (dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED) + ret += 2; + /* account for a maximum of ibc_queue_depth in-flight transfers */ ret *= conn->ibc_queue_depth; return ret; From patchwork Sun Oct 14 18:55:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640793 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E49917E3 for ; Sun, 14 Oct 2018 18:56:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 45CF728B9F for ; Sun, 14 Oct 2018 18:56:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3A215293CC; Sun, 14 Oct 2018 18:56:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D5AB928BEA for ; Sun, 14 Oct 2018 18:56:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 33D9021FB6D; Sun, 14 Oct 2018 11:55:49 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 168C421F42E for ; Sun, 14 Oct 2018 11:55:39 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E1876223D; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DE1352AB; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:31 -0400 Message-Id: <1539543332-28679-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/10] lustre: o2ib: use splice in kiblnd_peer_connect_failed() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: "John L. Hammond" In kiblnd_peer_connect_failed() replace a backwards list_add() and list_del() with list_splice_init(). Signed-off-by: John L. Hammond WC-bug-id: https://jira.whamcloud.com/browse/LU-10819 Reviewed-on: https://review.whamcloud.com/31643 Reviewed-by: Andreas Dilger Reviewed-by: Dmitry Eremin Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index 1f31798..23ce59e 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -2118,8 +2118,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, peer_ni->ibp_reconnected = 0; if (list_empty(&peer_ni->ibp_conns)) { /* Take peer_ni's blocked transmits to complete with error */ - list_add(&zombies, &peer_ni->ibp_tx_queue); - list_del_init(&peer_ni->ibp_tx_queue); + list_splice_init(&peer_ni->ibp_tx_queue, &zombies); if (kiblnd_peer_active(peer_ni)) kiblnd_unlink_peer_locked(peer_ni); From patchwork Sun Oct 14 18:55:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640797 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51B1F17E3 for ; Sun, 14 Oct 2018 18:56:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08C4B28B9F for ; Sun, 14 Oct 2018 18:56:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F13CE293CC; Sun, 14 Oct 2018 18:56:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 66D0828BEA for ; Sun, 14 Oct 2018 18:56:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AB34E21FCF4; Sun, 14 Oct 2018 11:55:51 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 570A621F50B for ; Sun, 14 Oct 2018 11:55:39 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id E3EAF223E; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E1DF02AC; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:32 -0400 Message-Id: <1539543332-28679-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/10] lustre: lnet: make LNET_MAX_IOV dependent on page size X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Simmons , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP The default behavior of LNet is to always use 256 pages which is LNET_MAX_IOV and that LNET_MAX_PAYLOAD is always one megabyte. This assumes pages are always 4K in size which is not the case. This cause bulk I/O errors when using platforms like PowerPC or ARM which tend to use 64K pages. This is resolved by first making LNET_MAX_PAYLOAD always one megabyte since this is what the configuring sets it too by default and no one ever changes it. In theory it could set it to as high as 16MB but that will cause the I/O errors since the ptlrpc layer expects the packets to be always 1 megabyte in size. Also it would be better to make the maximum payload a per network setup configurations instead of for everything. Second we make LNET_MAX_IOV equal to LNET_MAX_PAYLOAD divided by the PAGE_SIZE. This way packets will always be the LNET_MAX_PAYLOAD in size but the number of pages used, LNET_MAX_IOV will vary depending on the platform it is creating packets on. Signed-off-by: James Simmons WC-bug-id: https://jira.whamcloud.com/browse/LU-10157 Reviewed-on: https://review.whamcloud.com/31559 Reviewed-by: Wang Shilong Reviewed-by: Ruth Klundt Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/include/linux/lnet/lib-types.h | 10 ++++------ drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h | 3 --- drivers/staging/lustre/lnet/Kconfig | 10 ---------- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 4 ++-- drivers/staging/lustre/lustre/include/lustre_net.h | 1 + 5 files changed, 7 insertions(+), 21 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 19f7b11..8951a53 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -37,6 +37,7 @@ #define __LNET_LIB_TYPES_H__ #include +#include #include #include #include @@ -46,12 +47,9 @@ #include /* Max payload size */ -#define LNET_MAX_PAYLOAD CONFIG_LNET_MAX_PAYLOAD -#if (LNET_MAX_PAYLOAD < LNET_MTU) -# error "LNET_MAX_PAYLOAD too small - error in configure --with-max-payload-mb" -#elif (LNET_MAX_PAYLOAD > (PAGE_SIZE * LNET_MAX_IOV)) -# error "LNET_MAX_PAYLOAD too large - error in configure --with-max-payload-mb" -#endif +#define LNET_MAX_PAYLOAD LNET_MTU + +#define LNET_MAX_IOV (LNET_MAX_PAYLOAD >> PAGE_SHIFT) /* forward refs */ struct lnet_libmd; diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h index 1ecf18e..e440100 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h @@ -522,9 +522,6 @@ struct lnet_md { #define LNET_MTU_BITS 20 #define LNET_MTU (1 << LNET_MTU_BITS) -/** limit on the number of fragments in discontiguous MDs */ -#define LNET_MAX_IOV 256 - /** * Options for the MD structure. See lnet_md::options. */ diff --git a/drivers/staging/lustre/lnet/Kconfig b/drivers/staging/lustre/lnet/Kconfig index ad049e6..6062a82 100644 --- a/drivers/staging/lustre/lnet/Kconfig +++ b/drivers/staging/lustre/lnet/Kconfig @@ -8,16 +8,6 @@ config LNET case of Lustre routers only the LNet layer is required. Lately other projects are also looking into using LNet as their networking API as well. -config LNET_MAX_PAYLOAD - int "Lustre lnet max transfer payload (default 1MB)" - depends on LNET - default "1048576" - help - This option defines the maximum size of payload in bytes that lnet - can put into its transport. - - If unsure, use default. - config LNET_SELFTEST tristate "Lustre networking self testing" depends on LNET diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 66aa45f..68a8963 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -1414,7 +1414,7 @@ static void kiblnd_destroy_fmr_pool_list(struct list_head *head) static int kiblnd_alloc_fmr_pool(struct kib_fmr_poolset *fps, struct kib_fmr_pool *fpo) { struct ib_fmr_pool_param param = { - .max_pages_per_fmr = LNET_MAX_PAYLOAD / PAGE_SIZE, + .max_pages_per_fmr = LNET_MAX_IOV, .page_shift = PAGE_SHIFT, .access = (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE), @@ -1474,7 +1474,7 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, frd->frd_mr = ib_alloc_mr(fpo->fpo_hdev->ibh_pd, fastreg_gaps ? IB_MR_TYPE_SG_GAPS : IB_MR_TYPE_MEM_REG, - LNET_MAX_PAYLOAD / PAGE_SIZE); + LNET_MAX_IOV); if (IS_ERR(frd->frd_mr)) { rc = PTR_ERR(frd->frd_mr); CERROR("Failed to allocate ib_alloc_mr: %d\n", rc); diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h index 2df72c7..ce7e98c 100644 --- a/drivers/staging/lustre/lustre/include/lustre_net.h +++ b/drivers/staging/lustre/lustre/include/lustre_net.h @@ -56,6 +56,7 @@ #include #include #include +#include #include #include #include