From patchwork Mon Sep 21 20:03:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790829 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92534139A for ; Mon, 21 Sep 2020 20:04:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6D7F621789 for ; Mon, 21 Sep 2020 20:04:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="twCzuVU1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726395AbgIUUEJ (ORCPT ); Mon, 21 Sep 2020 16:04:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEJ (ORCPT ); Mon, 21 Sep 2020 16:04:09 -0400 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21BEAC0613CF for ; Mon, 21 Sep 2020 13:04:09 -0700 (PDT) Received: by mail-ot1-x341.google.com with SMTP id q21so13502676ota.8 for ; Mon, 21 Sep 2020 13:04:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5cLnGC292Ixurvov8sHWXzneO4BwqkVlY7kWkcYGwbc=; b=twCzuVU1Hhku+Oo5YRpPrTB6SnsKRdTNT9sPL1nIaoryPMdrEiju47nVTStGN+QdHo G60qUA/yMuomaP74ogdjznL0ub22+j6ZMldxwVtzzpUjQfHEBAVGmRixyg+uzZN9/kMs /Nyk2C6ExczJsDcXpMw9JXkQmaPpGD6C9DpecIhNaajKF0lyETv0xN/eaG0x+7DnnLsG VKhflaeS/CT/Jlt0TVMf+nb01QIs+ehLrh9RyHmTMoGDJqLZYivx+9NROgirgE4u/v6S povT1DZaVmxqTgx0J1Kn5fk5Cb7M0WjsUL8au0oNEbAovnmGBsqc/fpwdXNB7ocTsge2 IHOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5cLnGC292Ixurvov8sHWXzneO4BwqkVlY7kWkcYGwbc=; b=cMrjh7KZ7VJyjn1dWpmEDmY6SF7h6swU+j7xUUioexE7f/COrcXrqVYzEzOsxp9GZp l54qF5m+W9xyu5bVud5CTIbCbrPAQviRObzEpeP1Ue57+BGMHWIaHWcMsAmwZpEyk82/ 0JE0dfoHypksai7Ezm8SC1S4kHGXp331n1VlLPIHTMuntFIWaXIIOASEFwvaKgd4vo8K p+OIFUBljR1/P0pu9og7LSzFtaA4bv29pS7/jyJCS03LTsiCaUktiBlDk2O1GF+TslqF SwG26hw8I12vTgYccvrFxjzEZ3TjQi3Cnz7UAjgElih2SxX1REGMpZcW50x1R1Sl03Ip cP6w== X-Gm-Message-State: AOAM532i87lWOC9Y6B6OQ3SVVKzYfPZ7r8quzskFTy/hqZmjPgN1h67/ YsU/ASxoTWYHejjA7w8nf7c= X-Google-Smtp-Source: ABdhPJwrZIYL7myNV3lQaFkTqVOYAP/Dgvr7zy+Ng5T21PDM2p3F28cInpiRuGMlUoG79dIQBKgcyA== X-Received: by 2002:a05:6830:13d7:: with SMTP id e23mr744039otq.98.1600718648319; Mon, 21 Sep 2020 13:04:08 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id b1sm1188789oop.47.2020.09.21.13.04.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:07 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 01/12] rdma_rxe: Separat MEM into MR and MW objects. Date: Mon, 21 Sep 2020 15:03:45 -0500 Message-Id: <20200921200356.8627-2-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org In the original rxe implementation it was intended to use a common object to represent MRs and MWs but it became clear that they are different enough to separate these into two objects. This allows replacing the mem name with mr for MRs which is more consistent with the style for the other objects and less likely to be confusing. This is a long patch that mostly changes mem to mr where it makes sense and adds a new rxe_mw struct. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_comp.c | 4 +- drivers/infiniband/sw/rxe/rxe_loc.h | 26 +-- drivers/infiniband/sw/rxe/rxe_mr.c | 264 +++++++++++++------------- drivers/infiniband/sw/rxe/rxe_pool.c | 8 +- drivers/infiniband/sw/rxe/rxe_req.c | 6 +- drivers/infiniband/sw/rxe/rxe_resp.c | 30 +-- drivers/infiniband/sw/rxe/rxe_verbs.c | 18 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 51 ++--- 8 files changed, 204 insertions(+), 203 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index 0a1e6393250b..5dc86c9e74c2 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -345,7 +345,7 @@ static inline enum comp_state do_read(struct rxe_qp *qp, ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE, &wqe->dma, payload_addr(pkt), - payload_size(pkt), to_mem_obj, NULL); + payload_size(pkt), to_mr_obj, NULL); if (ret) return COMPST_ERROR; @@ -365,7 +365,7 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp, ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE, &wqe->dma, &atomic_orig, - sizeof(u64), to_mem_obj, NULL); + sizeof(u64), to_mr_obj, NULL); if (ret) return COMPST_ERROR; else diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 0d758760b9ae..9ec6bff6863f 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -72,40 +72,40 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); /* rxe_mr.c */ enum copy_direction { - to_mem_obj, - from_mem_obj, + to_mr_obj, + from_mr_obj, }; -void rxe_mem_init_dma(struct rxe_pd *pd, - int access, struct rxe_mem *mem); +void rxe_mr_init_dma(struct rxe_pd *pd, + int access, struct rxe_mr *mr); -int rxe_mem_init_user(struct rxe_pd *pd, u64 start, +int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova, int access, struct ib_udata *udata, - struct rxe_mem *mr); + struct rxe_mr *mr); -int rxe_mem_init_fast(struct rxe_pd *pd, - int max_pages, struct rxe_mem *mem); +int rxe_mr_init_fast(struct rxe_pd *pd, + int max_pages, struct rxe_mr *mr); -int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr, +int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum copy_direction dir, u32 *crcp); int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum copy_direction dir, u32 *crcp); -void *iova_to_vaddr(struct rxe_mem *mem, u64 iova, int length); +void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length); enum lookup_type { lookup_local, lookup_remote, }; -struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key, +struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, enum lookup_type type); -int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length); +int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); -void rxe_mem_cleanup(struct rxe_pool_entry *arg); +void rxe_mr_cleanup(struct rxe_pool_entry *arg); int advance_dma_data(struct rxe_dma_info *dma, unsigned int length); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 708e2dff5eaa..368012904879 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -24,17 +24,17 @@ static u8 rxe_get_key(void) return key; } -int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length) +int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) { - switch (mem->type) { - case RXE_MEM_TYPE_DMA: + switch (mr->type) { + case RXE_MR_TYPE_DMA: return 0; - case RXE_MEM_TYPE_MR: - case RXE_MEM_TYPE_FMR: - if (iova < mem->iova || - length > mem->length || - iova > mem->iova + mem->length - length) + case RXE_MR_TYPE_MR: + case RXE_MR_TYPE_FMR: + if (iova < mr->iova || + length > mr->length || + iova > mr->iova + mr->length - length) return -EFAULT; return 0; @@ -47,90 +47,90 @@ int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length) | IB_ACCESS_REMOTE_WRITE \ | IB_ACCESS_REMOTE_ATOMIC) -static void rxe_mem_init(int access, struct rxe_mem *mem) +static void rxe_mr_init(int access, struct rxe_mr *mr) { - u32 lkey = mem->pelem.index << 8 | rxe_get_key(); + u32 lkey = mr->pelem.index << 8 | rxe_get_key(); u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0; - if (mem->pelem.pool->type == RXE_TYPE_MR) { - mem->ibmr.lkey = lkey; - mem->ibmr.rkey = rkey; + if (mr->pelem.pool->type == RXE_TYPE_MR) { + mr->ibmr.lkey = lkey; + mr->ibmr.rkey = rkey; } - mem->lkey = lkey; - mem->rkey = rkey; - mem->state = RXE_MEM_STATE_INVALID; - mem->type = RXE_MEM_TYPE_NONE; - mem->map_shift = ilog2(RXE_BUF_PER_MAP); + mr->lkey = lkey; + mr->rkey = rkey; + mr->state = RXE_MEM_STATE_INVALID; + mr->type = RXE_MR_TYPE_NONE; + mr->map_shift = ilog2(RXE_BUF_PER_MAP); } -void rxe_mem_cleanup(struct rxe_pool_entry *arg) +void rxe_mr_cleanup(struct rxe_pool_entry *arg) { - struct rxe_mem *mem = container_of(arg, typeof(*mem), pelem); + struct rxe_mr *mr = container_of(arg, typeof(*mr), pelem); int i; - ib_umem_release(mem->umem); + ib_umem_release(mr->umem); - if (mem->map) { - for (i = 0; i < mem->num_map; i++) - kfree(mem->map[i]); + if (mr->map) { + for (i = 0; i < mr->num_map; i++) + kfree(mr->map[i]); - kfree(mem->map); + kfree(mr->map); } } -static int rxe_mem_alloc(struct rxe_mem *mem, int num_buf) +static int rxe_mr_alloc(struct rxe_mr *mr, int num_buf) { int i; int num_map; - struct rxe_map **map = mem->map; + struct rxe_map **map = mr->map; num_map = (num_buf + RXE_BUF_PER_MAP - 1) / RXE_BUF_PER_MAP; - mem->map = kmalloc_array(num_map, sizeof(*map), GFP_KERNEL); - if (!mem->map) + mr->map = kmalloc_array(num_map, sizeof(*map), GFP_KERNEL); + if (!mr->map) goto err1; for (i = 0; i < num_map; i++) { - mem->map[i] = kmalloc(sizeof(**map), GFP_KERNEL); - if (!mem->map[i]) + mr->map[i] = kmalloc(sizeof(**map), GFP_KERNEL); + if (!mr->map[i]) goto err2; } BUILD_BUG_ON(!is_power_of_2(RXE_BUF_PER_MAP)); - mem->map_shift = ilog2(RXE_BUF_PER_MAP); - mem->map_mask = RXE_BUF_PER_MAP - 1; + mr->map_shift = ilog2(RXE_BUF_PER_MAP); + mr->map_mask = RXE_BUF_PER_MAP - 1; - mem->num_buf = num_buf; - mem->num_map = num_map; - mem->max_buf = num_map * RXE_BUF_PER_MAP; + mr->num_buf = num_buf; + mr->num_map = num_map; + mr->max_buf = num_map * RXE_BUF_PER_MAP; return 0; err2: for (i--; i >= 0; i--) - kfree(mem->map[i]); + kfree(mr->map[i]); - kfree(mem->map); + kfree(mr->map); err1: return -ENOMEM; } -void rxe_mem_init_dma(struct rxe_pd *pd, - int access, struct rxe_mem *mem) +void rxe_mr_init_dma(struct rxe_pd *pd, + int access, struct rxe_mr *mr) { - rxe_mem_init(access, mem); + rxe_mr_init(access, mr); - mem->pd = pd; - mem->access = access; - mem->state = RXE_MEM_STATE_VALID; - mem->type = RXE_MEM_TYPE_DMA; + mr->pd = pd; + mr->access = access; + mr->state = RXE_MEM_STATE_VALID; + mr->type = RXE_MR_TYPE_DMA; } -int rxe_mem_init_user(struct rxe_pd *pd, u64 start, +int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova, int access, struct ib_udata *udata, - struct rxe_mem *mem) + struct rxe_mr *mr) { struct rxe_map **map; struct rxe_phys_buf *buf = NULL; @@ -148,23 +148,23 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start, goto err1; } - mem->umem = umem; + mr->umem = umem; num_buf = ib_umem_num_pages(umem); - rxe_mem_init(access, mem); + rxe_mr_init(access, mr); - err = rxe_mem_alloc(mem, num_buf); + err = rxe_mr_alloc(mr, num_buf); if (err) { - pr_warn("err %d from rxe_mem_alloc\n", err); + pr_warn("err %d from rxe_mr_alloc\n", err); ib_umem_release(umem); goto err1; } - mem->page_shift = PAGE_SHIFT; - mem->page_mask = PAGE_SIZE - 1; + mr->page_shift = PAGE_SHIFT; + mr->page_mask = PAGE_SIZE - 1; num_buf = 0; - map = mem->map; + map = mr->map; if (length > 0) { buf = map[0]->buf; @@ -190,15 +190,15 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start, } } - mem->pd = pd; - mem->umem = umem; - mem->access = access; - mem->length = length; - mem->iova = iova; - mem->va = start; - mem->offset = ib_umem_offset(umem); - mem->state = RXE_MEM_STATE_VALID; - mem->type = RXE_MEM_TYPE_MR; + mr->pd = pd; + mr->umem = umem; + mr->access = access; + mr->length = length; + mr->iova = iova; + mr->va = start; + mr->offset = ib_umem_offset(umem); + mr->state = RXE_MEM_STATE_VALID; + mr->type = RXE_MR_TYPE_MR; return 0; @@ -206,24 +206,24 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start, return err; } -int rxe_mem_init_fast(struct rxe_pd *pd, - int max_pages, struct rxe_mem *mem) +int rxe_mr_init_fast(struct rxe_pd *pd, + int max_pages, struct rxe_mr *mr) { int err; - rxe_mem_init(0, mem); + rxe_mr_init(0, mr); /* In fastreg, we also set the rkey */ - mem->ibmr.rkey = mem->ibmr.lkey; + mr->ibmr.rkey = mr->ibmr.lkey; - err = rxe_mem_alloc(mem, max_pages); + err = rxe_mr_alloc(mr, max_pages); if (err) goto err1; - mem->pd = pd; - mem->max_buf = max_pages; - mem->state = RXE_MEM_STATE_FREE; - mem->type = RXE_MEM_TYPE_MR; + mr->pd = pd; + mr->max_buf = max_pages; + mr->state = RXE_MEM_STATE_FREE; + mr->type = RXE_MR_TYPE_MR; return 0; @@ -232,27 +232,27 @@ int rxe_mem_init_fast(struct rxe_pd *pd, } static void lookup_iova( - struct rxe_mem *mem, + struct rxe_mr *mr, u64 iova, int *m_out, int *n_out, size_t *offset_out) { - size_t offset = iova - mem->iova + mem->offset; + size_t offset = iova - mr->iova + mr->offset; int map_index; int buf_index; u64 length; - if (likely(mem->page_shift)) { - *offset_out = offset & mem->page_mask; - offset >>= mem->page_shift; - *n_out = offset & mem->map_mask; - *m_out = offset >> mem->map_shift; + if (likely(mr->page_shift)) { + *offset_out = offset & mr->page_mask; + offset >>= mr->page_shift; + *n_out = offset & mr->map_mask; + *m_out = offset >> mr->map_shift; } else { map_index = 0; buf_index = 0; - length = mem->map[map_index]->buf[buf_index].size; + length = mr->map[map_index]->buf[buf_index].size; while (offset >= length) { offset -= length; @@ -262,7 +262,7 @@ static void lookup_iova( map_index++; buf_index = 0; } - length = mem->map[map_index]->buf[buf_index].size; + length = mr->map[map_index]->buf[buf_index].size; } *m_out = map_index; @@ -271,48 +271,48 @@ static void lookup_iova( } } -void *iova_to_vaddr(struct rxe_mem *mem, u64 iova, int length) +void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length) { size_t offset; int m, n; void *addr; - if (mem->state != RXE_MEM_STATE_VALID) { - pr_warn("mem not in valid state\n"); + if (mr->state != RXE_MEM_STATE_VALID) { + pr_warn("mr not in valid state\n"); addr = NULL; goto out; } - if (!mem->map) { + if (!mr->map) { addr = (void *)(uintptr_t)iova; goto out; } - if (mem_check_range(mem, iova, length)) { + if (mr_check_range(mr, iova, length)) { pr_warn("range violation\n"); addr = NULL; goto out; } - lookup_iova(mem, iova, &m, &n, &offset); + lookup_iova(mr, iova, &m, &n, &offset); - if (offset + length > mem->map[m]->buf[n].size) { + if (offset + length > mr->map[m]->buf[n].size) { pr_warn("crosses page boundary\n"); addr = NULL; goto out; } - addr = (void *)(uintptr_t)mem->map[m]->buf[n].addr + offset; + addr = (void *)(uintptr_t)mr->map[m]->buf[n].addr + offset; out: return addr; } /* copy data from a range (vaddr, vaddr+length-1) to or from - * a mem object starting at iova. Compute incremental value of - * crc32 if crcp is not zero. caller must hold a reference to mem + * a mr object starting at iova. Compute incremental value of + * crc32 if crcp is not zero. caller must hold a reference to mr */ -int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr, int length, +int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum copy_direction dir, u32 *crcp) { int err; @@ -328,43 +328,43 @@ int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr, int length, if (length == 0) return 0; - if (mem->type == RXE_MEM_TYPE_DMA) { + if (mr->type == RXE_MR_TYPE_DMA) { u8 *src, *dest; - src = (dir == to_mem_obj) ? + src = (dir == to_mr_obj) ? addr : ((void *)(uintptr_t)iova); - dest = (dir == to_mem_obj) ? + dest = (dir == to_mr_obj) ? ((void *)(uintptr_t)iova) : addr; memcpy(dest, src, length); if (crcp) - *crcp = rxe_crc32(to_rdev(mem->pd->ibpd.device), + *crcp = rxe_crc32(to_rdev(mr->pd->ibpd.device), *crcp, dest, length); return 0; } - WARN_ON_ONCE(!mem->map); + WARN_ON_ONCE(!mr->map); - err = mem_check_range(mem, iova, length); + err = mr_check_range(mr, iova, length); if (err) { err = -EFAULT; goto err1; } - lookup_iova(mem, iova, &m, &i, &offset); + lookup_iova(mr, iova, &m, &i, &offset); - map = mem->map + m; + map = mr->map + m; buf = map[0]->buf + i; while (length > 0) { u8 *src, *dest; va = (u8 *)(uintptr_t)buf->addr + offset; - src = (dir == to_mem_obj) ? addr : va; - dest = (dir == to_mem_obj) ? va : addr; + src = (dir == to_mr_obj) ? addr : va; + dest = (dir == to_mr_obj) ? va : addr; bytes = buf->size - offset; @@ -374,7 +374,7 @@ int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr, int length, memcpy(dest, src, bytes); if (crcp) - crc = rxe_crc32(to_rdev(mem->pd->ibpd.device), + crc = rxe_crc32(to_rdev(mr->pd->ibpd.device), crc, dest, bytes); length -= bytes; @@ -416,7 +416,7 @@ int copy_data( struct rxe_sge *sge = &dma->sge[dma->cur_sge]; int offset = dma->sge_offset; int resid = dma->resid; - struct rxe_mem *mem = NULL; + struct rxe_mr *mr = NULL; u64 iova; int err; @@ -429,8 +429,8 @@ int copy_data( } if (sge->length && (offset < sge->length)) { - mem = lookup_mem(pd, access, sge->lkey, lookup_local); - if (!mem) { + mr = lookup_mr(pd, access, sge->lkey, lookup_local); + if (!mr) { err = -EINVAL; goto err1; } @@ -440,9 +440,9 @@ int copy_data( bytes = length; if (offset >= sge->length) { - if (mem) { - rxe_drop_ref(mem); - mem = NULL; + if (mr) { + rxe_drop_ref(mr); + mr = NULL; } sge++; dma->cur_sge++; @@ -454,9 +454,9 @@ int copy_data( } if (sge->length) { - mem = lookup_mem(pd, access, sge->lkey, + mr = lookup_mr(pd, access, sge->lkey, lookup_local); - if (!mem) { + if (!mr) { err = -EINVAL; goto err1; } @@ -471,7 +471,7 @@ int copy_data( if (bytes > 0) { iova = sge->addr + offset; - err = rxe_mem_copy(mem, iova, addr, bytes, dir, crcp); + err = rxe_mr_copy(mr, iova, addr, bytes, dir, crcp); if (err) goto err2; @@ -485,14 +485,14 @@ int copy_data( dma->sge_offset = offset; dma->resid = resid; - if (mem) - rxe_drop_ref(mem); + if (mr) + rxe_drop_ref(mr); return 0; err2: - if (mem) - rxe_drop_ref(mem); + if (mr) + rxe_drop_ref(mr); err1: return err; } @@ -530,31 +530,31 @@ int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) return 0; } -/* (1) find the mem (mr or mw) corresponding to lkey/rkey +/* (1) find the mr corresponding to lkey/rkey * depending on lookup_type - * (2) verify that the (qp) pd matches the mem pd - * (3) verify that the mem can support the requested access - * (4) verify that mem state is valid + * (2) verify that the (qp) pd matches the mr pd + * (3) verify that the mr can support the requested access + * (4) verify that mr state is valid */ -struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key, +struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, enum lookup_type type) { - struct rxe_mem *mem; + struct rxe_mr *mr; struct rxe_dev *rxe = to_rdev(pd->ibpd.device); int index = key >> 8; - mem = rxe_pool_get_index(&rxe->mr_pool, index); - if (!mem) + mr = rxe_pool_get_index(&rxe->mr_pool, index); + if (!mr) return NULL; - if (unlikely((type == lookup_local && mem->lkey != key) || - (type == lookup_remote && mem->rkey != key) || - mem->pd != pd || - (access && !(access & mem->access)) || - mem->state != RXE_MEM_STATE_VALID)) { - rxe_drop_ref(mem); - mem = NULL; + if (unlikely((type == lookup_local && mr->lkey != key) || + (type == lookup_remote && mr->rkey != key) || + mr->pd != pd || + (access && !(access & mr->access)) || + mr->state != RXE_MEM_STATE_VALID)) { + rxe_drop_ref(mr); + mr = NULL; } - return mem; + return mr; } diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index b374eb53e2fe..32ba47d143f3 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -8,8 +8,6 @@ #include "rxe_loc.h" /* info about object pools - * note that mr and mw share a single index space - * so that one can map an lkey to the correct type of object */ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_UC] = { @@ -50,15 +48,15 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { }, [RXE_TYPE_MR] = { .name = "rxe-mr", - .size = sizeof(struct rxe_mem), - .cleanup = rxe_mem_cleanup, + .size = sizeof(struct rxe_mr), + .cleanup = rxe_mr_cleanup, .flags = RXE_POOL_INDEX, .max_index = RXE_MAX_MR_INDEX, .min_index = RXE_MIN_MR_INDEX, }, [RXE_TYPE_MW] = { .name = "rxe-mw", - .size = sizeof(struct rxe_mem), + .size = sizeof(struct rxe_mw), .flags = RXE_POOL_INDEX, .max_index = RXE_MAX_MW_INDEX, .min_index = RXE_MIN_MW_INDEX, diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index e27585ce9eb7..57236d8c2146 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -465,7 +465,7 @@ static int fill_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe, } else { err = copy_data(qp->pd, 0, &wqe->dma, payload_addr(pkt), paylen, - from_mem_obj, + from_mr_obj, &crc); if (err) return err; @@ -597,7 +597,7 @@ int rxe_requester(void *arg) if (wqe->mask & WR_REG_MASK) { if (wqe->wr.opcode == IB_WR_LOCAL_INV) { struct rxe_dev *rxe = to_rdev(qp->ibqp.device); - struct rxe_mem *rmr; + struct rxe_mr *rmr; rmr = rxe_pool_get_index(&rxe->mr_pool, wqe->wr.ex.invalidate_rkey >> 8); @@ -613,7 +613,7 @@ int rxe_requester(void *arg) wqe->state = wqe_state_done; wqe->status = IB_WC_SUCCESS; } else if (wqe->wr.opcode == IB_WR_REG_MR) { - struct rxe_mem *rmr = to_rmr(wqe->wr.wr.reg.mr); + struct rxe_mr *rmr = to_rmr(wqe->wr.wr.reg.mr); rmr->state = RXE_MEM_STATE_VALID; rmr->access = wqe->wr.wr.reg.access; diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index c7e3b6a4af38..69867bf39cfb 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -390,7 +390,7 @@ static enum resp_states check_length(struct rxe_qp *qp, static enum resp_states check_rkey(struct rxe_qp *qp, struct rxe_pkt_info *pkt) { - struct rxe_mem *mem = NULL; + struct rxe_mr *mr = NULL; u64 va; u32 rkey; u32 resid; @@ -429,18 +429,18 @@ static enum resp_states check_rkey(struct rxe_qp *qp, resid = qp->resp.resid; pktlen = payload_size(pkt); - mem = lookup_mem(qp->pd, access, rkey, lookup_remote); - if (!mem) { + mr = lookup_mr(qp->pd, access, rkey, lookup_remote); + if (!mr) { state = RESPST_ERR_RKEY_VIOLATION; goto err; } - if (unlikely(mem->state == RXE_MEM_STATE_FREE)) { + if (unlikely(mr->state == RXE_MEM_STATE_FREE)) { state = RESPST_ERR_RKEY_VIOLATION; goto err; } - if (mem_check_range(mem, va, resid)) { + if (mr_check_range(mr, va, resid)) { state = RESPST_ERR_RKEY_VIOLATION; goto err; } @@ -468,12 +468,12 @@ static enum resp_states check_rkey(struct rxe_qp *qp, WARN_ON_ONCE(qp->resp.mr); - qp->resp.mr = mem; + qp->resp.mr = mr; return RESPST_EXECUTE; err: - if (mem) - rxe_drop_ref(mem); + if (mr) + rxe_drop_ref(mr); return state; } @@ -483,7 +483,7 @@ static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr, int err; err = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE, &qp->resp.wqe->dma, - data_addr, data_len, to_mem_obj, NULL); + data_addr, data_len, to_mr_obj, NULL); if (unlikely(err)) return (err == -ENOSPC) ? RESPST_ERR_LENGTH : RESPST_ERR_MALFORMED_WQE; @@ -498,8 +498,8 @@ static enum resp_states write_data_in(struct rxe_qp *qp, int err; int data_len = payload_size(pkt); - err = rxe_mem_copy(qp->resp.mr, qp->resp.va, payload_addr(pkt), - data_len, to_mem_obj, NULL); + err = rxe_mr_copy(qp->resp.mr, qp->resp.va, payload_addr(pkt), + data_len, to_mr_obj, NULL); if (err) { rc = RESPST_ERR_RKEY_VIOLATION; goto out; @@ -521,7 +521,7 @@ static enum resp_states process_atomic(struct rxe_qp *qp, u64 iova = atmeth_va(pkt); u64 *vaddr; enum resp_states ret; - struct rxe_mem *mr = qp->resp.mr; + struct rxe_mr *mr = qp->resp.mr; if (mr->state != RXE_MEM_STATE_VALID) { ret = RESPST_ERR_RKEY_VIOLATION; @@ -700,8 +700,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, if (!skb) return RESPST_ERR_RNR; - err = rxe_mem_copy(res->read.mr, res->read.va, payload_addr(&ack_pkt), - payload, from_mem_obj, &icrc); + err = rxe_mr_copy(res->read.mr, res->read.va, payload_addr(&ack_pkt), + payload, from_mr_obj, &icrc); if (err) pr_err("Failed copying memory\n"); @@ -883,7 +883,7 @@ static enum resp_states do_complete(struct rxe_qp *qp, } if (pkt->mask & RXE_IETH_MASK) { - struct rxe_mem *rmr; + struct rxe_mr *rmr; wc->wc_flags |= IB_WC_WITH_INVALIDATE; wc->ex.invalidate_rkey = ieth_rkey(pkt); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 5a4087b01757..626706f25efc 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -867,7 +867,7 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access) { struct rxe_dev *rxe = to_rdev(ibpd->device); struct rxe_pd *pd = to_rpd(ibpd); - struct rxe_mem *mr; + struct rxe_mr *mr; mr = rxe_alloc(&rxe->mr_pool); if (!mr) @@ -875,7 +875,7 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access) rxe_add_index(mr); rxe_add_ref(pd); - rxe_mem_init_dma(pd, access, mr); + rxe_mr_init_dma(pd, access, mr); return &mr->ibmr; } @@ -889,7 +889,7 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, int err; struct rxe_dev *rxe = to_rdev(ibpd->device); struct rxe_pd *pd = to_rpd(ibpd); - struct rxe_mem *mr; + struct rxe_mr *mr; mr = rxe_alloc(&rxe->mr_pool); if (!mr) { @@ -901,7 +901,7 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, rxe_add_ref(pd); - err = rxe_mem_init_user(pd, start, length, iova, + err = rxe_mr_init_user(pd, start, length, iova, access, udata, mr); if (err) goto err3; @@ -918,7 +918,7 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) { - struct rxe_mem *mr = to_rmr(ibmr); + struct rxe_mr *mr = to_rmr(ibmr); mr->state = RXE_MEM_STATE_ZOMBIE; rxe_drop_ref(mr->pd); @@ -932,7 +932,7 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, { struct rxe_dev *rxe = to_rdev(ibpd->device); struct rxe_pd *pd = to_rpd(ibpd); - struct rxe_mem *mr; + struct rxe_mr *mr; int err; if (mr_type != IB_MR_TYPE_MEM_REG) @@ -948,7 +948,7 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, rxe_add_ref(pd); - err = rxe_mem_init_fast(pd, max_num_sg, mr); + err = rxe_mr_init_fast(pd, max_num_sg, mr); if (err) goto err2; @@ -964,7 +964,7 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, static int rxe_set_page(struct ib_mr *ibmr, u64 addr) { - struct rxe_mem *mr = to_rmr(ibmr); + struct rxe_mr *mr = to_rmr(ibmr); struct rxe_map *map; struct rxe_phys_buf *buf; @@ -984,7 +984,7 @@ static int rxe_set_page(struct ib_mr *ibmr, u64 addr) static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset) { - struct rxe_mem *mr = to_rmr(ibmr); + struct rxe_mr *mr = to_rmr(ibmr); int n; mr->nbuf = 0; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 560a610bb0aa..dbc649c9c43f 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -39,7 +39,7 @@ struct rxe_ucontext { }; struct rxe_pd { - struct ib_pd ibpd; + struct ib_pd ibpd; struct rxe_pool_entry pelem; }; @@ -156,7 +156,7 @@ struct resp_res { struct sk_buff *skb; } atomic; struct { - struct rxe_mem *mr; + struct rxe_mr *mr; u64 va_org; u32 rkey; u32 length; @@ -183,7 +183,7 @@ struct rxe_resp_info { /* RDMA read / atomic only */ u64 va; - struct rxe_mem *mr; + struct rxe_mr *mr; u32 resid; u32 rkey; u32 length; @@ -269,31 +269,27 @@ enum rxe_mem_state { RXE_MEM_STATE_VALID, }; -enum rxe_mem_type { - RXE_MEM_TYPE_NONE, - RXE_MEM_TYPE_DMA, - RXE_MEM_TYPE_MR, - RXE_MEM_TYPE_FMR, - RXE_MEM_TYPE_MW, +enum rxe_mr_type { + RXE_MR_TYPE_NONE, + RXE_MR_TYPE_DMA, + RXE_MR_TYPE_MR, + RXE_MR_TYPE_FMR, }; #define RXE_BUF_PER_MAP (PAGE_SIZE / sizeof(struct rxe_phys_buf)) struct rxe_phys_buf { - u64 addr; - u64 size; + u64 addr; + u64 size; }; struct rxe_map { struct rxe_phys_buf buf[RXE_BUF_PER_MAP]; }; -struct rxe_mem { +struct rxe_mr { struct rxe_pool_entry pelem; - union { - struct ib_mr ibmr; - struct ib_mw ibmw; - }; + struct ib_mr ibmr; struct rxe_pd *pd; struct ib_umem *umem; @@ -302,7 +298,7 @@ struct rxe_mem { u32 rkey; enum rxe_mem_state state; - enum rxe_mem_type type; + enum rxe_mr_type type; u64 va; u64 iova; size_t length; @@ -323,6 +319,18 @@ struct rxe_mem { struct rxe_map **map; }; +struct rxe_mw { + struct rxe_pool_entry pelem; + struct ib_mw ibmw; + struct rxe_qp *qp; /* type 2B only */ + struct rxe_mr *mr; + spinlock_t lock; + enum rxe_mem_state state; + u32 access; + u64 addr; + u64 length; +}; + struct rxe_mc_grp { struct rxe_pool_entry pelem; spinlock_t mcg_lock; /* guard group */ @@ -428,14 +436,9 @@ static inline struct rxe_cq *to_rcq(struct ib_cq *cq) return cq ? container_of(cq, struct rxe_cq, ibcq) : NULL; } -static inline struct rxe_mem *to_rmr(struct ib_mr *mr) -{ - return mr ? container_of(mr, struct rxe_mem, ibmr) : NULL; -} - -static inline struct rxe_mem *to_rmw(struct ib_mw *mw) +static inline struct rxe_mr *to_rmr(struct ib_mr *mr) { - return mw ? container_of(mw, struct rxe_mem, ibmw) : NULL; + return mr ? container_of(mr, struct rxe_mr, ibmr) : NULL; } int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name); From patchwork Mon Sep 21 20:03:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790831 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2EC416BC for ; Mon, 21 Sep 2020 20:04:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C7BC0218AC for ; Mon, 21 Sep 2020 20:04:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="onkLPcuX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726471AbgIUUEK (ORCPT ); Mon, 21 Sep 2020 16:04:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEJ (ORCPT ); Mon, 21 Sep 2020 16:04:09 -0400 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B294AC061755 for ; Mon, 21 Sep 2020 13:04:09 -0700 (PDT) Received: by mail-oi1-x231.google.com with SMTP id v20so18380006oiv.3 for ; Mon, 21 Sep 2020 13:04:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VSY0rWV+YrVs2UJzkPk/9ZicS8svAtSxY1LJXcwGG08=; b=onkLPcuXHEABnzOyzQbrPoLt8+mYoaewyppPzZecAafLYmRkj6FEnOgIFKvRW+/fn1 RaDGxcpAdS9sftUuZ35ytTZ/feuh12XEP2yFaoZQXDhppcw+PcNzhokHw+5xCPGs0I5n dxKD/0soWE6E517VR1v3qbizcE7mUQXI/YNAXtcJbqkHgssJh24ueWRgdRlmTzSFV7eT UzpZcfB9iod93UFNo4cFDGV0D8BpViDTlxynqaPCEF2O9HWJfKiK5BEnPf+lQxMy1oa7 Riyt+EQ/cTedvVeoIhj42ccWwtpZCE7EAAiIJz4X3TB/ZHqTTDK9L6FkdG61Jl7YDrb9 kkUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VSY0rWV+YrVs2UJzkPk/9ZicS8svAtSxY1LJXcwGG08=; b=iZlkYUPjF215iDNc9Nh08f2BdN7bq2MSRtQfw3Aa+d85eZBnj3+soc5OTmMbV0zSuE Y+6dsGcOOqkr04H9R23kYrY76M1+nrLuDmfxAGQf26eL/VAtCu2YkimYzinPlGd5K4Nl ya9Vj+MI+qTfOvhETRyxf7TqPKWb00lOSCZIdXDXg9By78fMBX0kx2KpDFcAdeZ4l10f d0TKPRxRvzKu5cBy6vwrJPrxqMBDEF3FsTFmRmVm3Izny+7xA5LyLpUSJTP7H1pSUYsn UWZohnVNLulK6FwbZSMO6PMXZO9LR0eUYYu89WYrPkCp5iHk6/7/tVvuK59fE2hH02Al T2aQ== X-Gm-Message-State: AOAM531DI0VpmZi+YWcjWRpBpO3aF/tkdS4pGnuT6kGE8345WWF06Vo9 KHb0mN6ummrWiJc01AJHbvQ= X-Google-Smtp-Source: ABdhPJxeHv2lY2S2LdXb7tF9xycSkJGcw1r4FtHsUuCItrVbrmpaDMchrspxgLE66oVE4bVryxLblw== X-Received: by 2002:aca:38d6:: with SMTP id f205mr677857oia.6.1600718649101; Mon, 21 Sep 2020 13:04:09 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id a20sm7507747oos.13.2020.09.21.13.04.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:08 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 02/12] rdma_rxe: Enable MW objects Date: Mon, 21 Sep 2020 15:03:46 -0500 Message-Id: <20200921200356.8627-3-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Change parameters in rxe_param.h so that MAX_MW is the same as MAX_MR. Set device attribute in rxe.c so max_mw = MAX_MW. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_param.h | 10 ++++++---- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 43b327b53e26..fab291245366 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -52,6 +52,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.max_cq = RXE_MAX_CQ; rxe->attr.max_cqe = (1 << RXE_MAX_LOG_CQE) - 1; rxe->attr.max_mr = RXE_MAX_MR; + rxe->attr.max_mw = RXE_MAX_MW; rxe->attr.max_pd = RXE_MAX_PD; rxe->attr.max_qp_rd_atom = RXE_MAX_QP_RD_ATOM; rxe->attr.max_res_rd_atom = RXE_MAX_RES_RD_ATOM; diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h index 25ab50d9b7c2..4ebb3da8c07d 100644 --- a/drivers/infiniband/sw/rxe/rxe_param.h +++ b/drivers/infiniband/sw/rxe/rxe_param.h @@ -58,7 +58,8 @@ enum rxe_device_param { RXE_MAX_SGE_RD = 32, RXE_MAX_CQ = 16384, RXE_MAX_LOG_CQE = 15, - RXE_MAX_MR = 256 * 1024, + RXE_MAX_MR = 0x40000, + RXE_MAX_MW = 0x40000, RXE_MAX_PD = 0x7ffc, RXE_MAX_QP_RD_ATOM = 128, RXE_MAX_RES_RD_ATOM = 0x3f000, @@ -87,9 +88,10 @@ enum rxe_device_param { RXE_MAX_SRQ_INDEX = 0x00040000, RXE_MIN_MR_INDEX = 0x00000001, - RXE_MAX_MR_INDEX = 0x00040000, - RXE_MIN_MW_INDEX = 0x00040001, - RXE_MAX_MW_INDEX = 0x00060000, + RXE_MAX_MR_INDEX = RXE_MIN_MR_INDEX + RXE_MAX_MR - 1, + RXE_MIN_MW_INDEX = RXE_MIN_MR_INDEX + RXE_MAX_MR, + RXE_MAX_MW_INDEX = RXE_MIN_MW_INDEX + RXE_MAX_MW - 1, + RXE_MAX_PKT_PER_ACK = 64, RXE_MAX_UNACKED_PSNS = 128, From patchwork Mon Sep 21 20:03:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790833 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B228959D for ; Mon, 21 Sep 2020 20:04:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 98164218AC for ; Mon, 21 Sep 2020 20:04:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ju1spnjP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726696AbgIUUEL (ORCPT ); Mon, 21 Sep 2020 16:04:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEK (ORCPT ); Mon, 21 Sep 2020 16:04:10 -0400 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A0BCC061755 for ; Mon, 21 Sep 2020 13:04:10 -0700 (PDT) Received: by mail-oi1-x242.google.com with SMTP id x69so18338310oia.8 for ; Mon, 21 Sep 2020 13:04:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=J1TZIs7znNhkB5dcnBH7CahLIv96Wzdg/cqZF3s4VXM=; b=Ju1spnjPtIkGho224z8y8EhQkI6b7VQHI4oycKV3VyxzqL94jQWcP4v8cE+hbri4Tx EAt3XJpkC9Go3QlOtbPg3roNOwTzrGMFSJvqovYohCIveU87mUdmsFARdOhkw2slgTuW RBjdPTlwdHbEPy1pcI2hdZoy0FSQfWHQEHc9v979NNInrLaaD1yQLu9TIHhOH8liJHKj Vx1TLGn/3ziGC6bYJBAWw8O/Zpg2Jbn4/hqhEabgvqeXj9YuYX9J1g+wIu+c6henGVuh 9nXJSga2R3vgUA+icklSsxEje+6+JBLdckaWbCnGNh0Zc35G4ir8N+3N2Tvc10fqQTYf HVHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=J1TZIs7znNhkB5dcnBH7CahLIv96Wzdg/cqZF3s4VXM=; b=iSO/0OZoSdANbKbDX8VGq8h/xgtO0SakDO3RVvUKU57il6Hh/G/dvz4dh4NG2sYITf 64tr7KfBq0B/ZUkxza5wcOoeQd7jAL4FLSlMBpQT36ScupLSDrzY9nFZvLG5Yl7bYHgA I7BKVzU3ciQQ+VTBiuIeb0v2uOsMTvn0Tq8S8uLep9AvDyZpFcDp9fuG/Y5a8JADvfkf DvRwLpDAByo2/Bj88UzKj5nztJ+NkFAHOEwCJnpqP6XTYdki0svCe6KuSJlbaox6XC+v zLZnGEFR+MGbc1Q8QKO55+7m9BXCzIU5qXwMjU0V4NwPFw1Rt5t92zCKAzZjWXCsF20H AzXg== X-Gm-Message-State: AOAM531lT/W60Rq18KfzW80om7Q29XhY8pkwznEVEg5dHeQ5yd7Xpjhy jinsE8f6Qco3NBe4JjDxEvU= X-Google-Smtp-Source: ABdhPJw65B8eE+bJApfQXOj8eXP9zkZwR8lHo5FvzpB916EKZ9pzYt2jPxfUDPEVQV8F4Ps3nLQ7hA== X-Received: by 2002:aca:3a57:: with SMTP id h84mr653254oia.95.1600718649974; Mon, 21 Sep 2020 13:04:09 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id h24sm6208291otj.33.2020.09.21.13.04.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:09 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 03/12] rdma_rxe: Let pools support both keys and indices Date: Mon, 21 Sep 2020 15:03:47 -0500 Message-Id: <20200921200356.8627-4-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Allow both indices and keys to exist for objects in pools. Previously you were limited to one or the other. This will support allowing the keys on MWs to change. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_pool.c | 73 ++++++++++++++-------------- drivers/infiniband/sw/rxe/rxe_pool.h | 32 +++++++----- 2 files changed, 58 insertions(+), 47 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index 32ba47d143f3..30b8f037ee20 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -92,18 +92,18 @@ static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min) goto out; } - pool->max_index = max; - pool->min_index = min; + pool->index.max_index = max; + pool->index.min_index = min; size = BITS_TO_LONGS(max - min + 1) * sizeof(long); - pool->table = kmalloc(size, GFP_KERNEL); - if (!pool->table) { + pool->index.table = kmalloc(size, GFP_KERNEL); + if (!pool->index.table) { err = -ENOMEM; goto out; } - pool->table_size = size; - bitmap_zero(pool->table, max - min + 1); + pool->index.table_size = size; + bitmap_zero(pool->index.table, max - min + 1); out: return err; @@ -125,7 +125,8 @@ int rxe_pool_init( pool->max_elem = max_elem; pool->elem_size = ALIGN(size, RXE_POOL_ALIGN); pool->flags = rxe_type_info[type].flags; - pool->tree = RB_ROOT; + pool->index.tree = RB_ROOT; + pool->key.tree = RB_ROOT; pool->cleanup = rxe_type_info[type].cleanup; atomic_set(&pool->num_elem, 0); @@ -143,8 +144,8 @@ int rxe_pool_init( } if (rxe_type_info[type].flags & RXE_POOL_KEY) { - pool->key_offset = rxe_type_info[type].key_offset; - pool->key_size = rxe_type_info[type].key_size; + pool->key.key_offset = rxe_type_info[type].key_offset; + pool->key.key_size = rxe_type_info[type].key_size; } pool->state = RXE_POOL_STATE_VALID; @@ -158,7 +159,7 @@ static void rxe_pool_release(struct kref *kref) struct rxe_pool *pool = container_of(kref, struct rxe_pool, ref_cnt); pool->state = RXE_POOL_STATE_INVALID; - kfree(pool->table); + kfree(pool->index.table); } static void rxe_pool_put(struct rxe_pool *pool) @@ -183,27 +184,27 @@ void rxe_pool_cleanup(struct rxe_pool *pool) static u32 alloc_index(struct rxe_pool *pool) { u32 index; - u32 range = pool->max_index - pool->min_index + 1; + u32 range = pool->index.max_index - pool->index.min_index + 1; - index = find_next_zero_bit(pool->table, range, pool->last); + index = find_next_zero_bit(pool->index.table, range, pool->index.last); if (index >= range) - index = find_first_zero_bit(pool->table, range); + index = find_first_zero_bit(pool->index.table, range); WARN_ON_ONCE(index >= range); - set_bit(index, pool->table); - pool->last = index; - return index + pool->min_index; + set_bit(index, pool->index.table); + pool->index.last = index; + return index + pool->index.min_index; } static void insert_index(struct rxe_pool *pool, struct rxe_pool_entry *new) { - struct rb_node **link = &pool->tree.rb_node; + struct rb_node **link = &pool->index.tree.rb_node; struct rb_node *parent = NULL; struct rxe_pool_entry *elem; while (*link) { parent = *link; - elem = rb_entry(parent, struct rxe_pool_entry, node); + elem = rb_entry(parent, struct rxe_pool_entry, index_node); if (elem->index == new->index) { pr_warn("element already exists!\n"); @@ -216,25 +217,25 @@ static void insert_index(struct rxe_pool *pool, struct rxe_pool_entry *new) link = &(*link)->rb_right; } - rb_link_node(&new->node, parent, link); - rb_insert_color(&new->node, &pool->tree); + rb_link_node(&new->index_node, parent, link); + rb_insert_color(&new->index_node, &pool->index.tree); out: return; } static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) { - struct rb_node **link = &pool->tree.rb_node; + struct rb_node **link = &pool->key.tree.rb_node; struct rb_node *parent = NULL; struct rxe_pool_entry *elem; int cmp; while (*link) { parent = *link; - elem = rb_entry(parent, struct rxe_pool_entry, node); + elem = rb_entry(parent, struct rxe_pool_entry, key_node); - cmp = memcmp((u8 *)elem + pool->key_offset, - (u8 *)new + pool->key_offset, pool->key_size); + cmp = memcmp((u8 *)elem + pool->key.key_offset, + (u8 *)new + pool->key.key_offset, pool->key.key_size); if (cmp == 0) { pr_warn("key already exists!\n"); @@ -247,8 +248,8 @@ static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) link = &(*link)->rb_right; } - rb_link_node(&new->node, parent, link); - rb_insert_color(&new->node, &pool->tree); + rb_link_node(&new->key_node, parent, link); + rb_insert_color(&new->key_node, &pool->key.tree); out: return; } @@ -260,7 +261,7 @@ void rxe_add_key(void *arg, void *key) unsigned long flags; write_lock_irqsave(&pool->pool_lock, flags); - memcpy((u8 *)elem + pool->key_offset, key, pool->key_size); + memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size); insert_key(pool, elem); write_unlock_irqrestore(&pool->pool_lock, flags); } @@ -272,7 +273,7 @@ void rxe_drop_key(void *arg) unsigned long flags; write_lock_irqsave(&pool->pool_lock, flags); - rb_erase(&elem->node, &pool->tree); + rb_erase(&elem->key_node, &pool->key.tree); write_unlock_irqrestore(&pool->pool_lock, flags); } @@ -295,8 +296,8 @@ void rxe_drop_index(void *arg) unsigned long flags; write_lock_irqsave(&pool->pool_lock, flags); - clear_bit(elem->index - pool->min_index, pool->table); - rb_erase(&elem->node, &pool->tree); + clear_bit(elem->index - pool->index.min_index, pool->index.table); + rb_erase(&elem->index_node, &pool->index.tree); write_unlock_irqrestore(&pool->pool_lock, flags); } @@ -400,10 +401,10 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index) if (pool->state != RXE_POOL_STATE_VALID) goto out; - node = pool->tree.rb_node; + node = pool->index.tree.rb_node; while (node) { - elem = rb_entry(node, struct rxe_pool_entry, node); + elem = rb_entry(node, struct rxe_pool_entry, index_node); if (elem->index > index) node = node->rb_left; @@ -432,13 +433,13 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key) if (pool->state != RXE_POOL_STATE_VALID) goto out; - node = pool->tree.rb_node; + node = pool->key.tree.rb_node; while (node) { - elem = rb_entry(node, struct rxe_pool_entry, node); + elem = rb_entry(node, struct rxe_pool_entry, key_node); - cmp = memcmp((u8 *)elem + pool->key_offset, - key, pool->key_size); + cmp = memcmp((u8 *)elem + pool->key.key_offset, + key, pool->key.key_size); if (cmp > 0) node = node->rb_left; diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h index 432745ffc8d4..3d722aae5f15 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.h +++ b/drivers/infiniband/sw/rxe/rxe_pool.h @@ -56,8 +56,11 @@ struct rxe_pool_entry { struct kref ref_cnt; struct list_head list; - /* only used if indexed or keyed */ - struct rb_node node; + /* only used if keyed */ + struct rb_node key_node; + + /* only used if indexed */ + struct rb_node index_node; u32 index; }; @@ -74,15 +77,22 @@ struct rxe_pool { unsigned int max_elem; atomic_t num_elem; - /* only used if indexed or keyed */ - struct rb_root tree; - unsigned long *table; - size_t table_size; - u32 max_index; - u32 min_index; - u32 last; - size_t key_offset; - size_t key_size; + /* only used if indexed */ + struct { + struct rb_root tree; + unsigned long *table; + size_t table_size; + u32 last; + u32 max_index; + u32 min_index; + } index; + + /* only used if keyed */ + struct { + struct rb_root tree; + size_t key_offset; + size_t key_size; + } key; }; /* initialize a pool of objects with given limit on From patchwork Mon Sep 21 20:03:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790835 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A879A59D for ; Mon, 21 Sep 2020 20:04:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85A20218AC for ; Mon, 21 Sep 2020 20:04:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="l928y8RE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726749AbgIUUEL (ORCPT ); Mon, 21 Sep 2020 16:04:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEL (ORCPT ); Mon, 21 Sep 2020 16:04:11 -0400 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86DFEC061755 for ; Mon, 21 Sep 2020 13:04:11 -0700 (PDT) Received: by mail-ot1-x341.google.com with SMTP id 95so6232182ota.13 for ; Mon, 21 Sep 2020 13:04:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tcxzZui7odhz8dVkOnnQ8vNav1PyB+ikb2nXj8DOfy0=; b=l928y8REE0ISYiRkeIWoGySVp0fmEXmfKHJixCcoo6m4zB/zD6T8pWYkcTQcZgrpOk UgsskKr+Vg1jkAtxsqR2zDy3VmeVVlv03ehL0wn5cXyx/1kBN5LAPSP0ISIDgr80SLWU 4xMomY+8VTtg36tENBm39Kv4ZspdHvPlAcHcYARD1ysfsOqaGYa6lmbKnZ8n9hfvh0dh puyYcBWGV6XKfYoCsfqIE+J8R34KsCEHeSu/SJ+xyqMkpNPmaUYVR8FYvA7p/QIXo/zY wqQZTXkFX7TSDzKz2VwebvlI9egYmSoCTi+l8Qz7uGrJi4Y3ID7HXUhSW76DAi803yeL +ibA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tcxzZui7odhz8dVkOnnQ8vNav1PyB+ikb2nXj8DOfy0=; b=PqORtB0FS50daZEenVBrE8HIMOBjT55+FNTTiDwP1a0fKVF51uHCsSfKJnUqVZPbg7 S3WdSUP60exWYfuY8qEZyp+mPCySfpiID7U/OrUF/LFnV0aWE1H0cbzJLEhasBEbkWHp tdoxRV+6RsADJOCQmUWu+p/4HrTA0VPqaaTteHiviNyRKFKbKHEAh6PECt58fy0Zmtbs nSF80tWZbXmDjWGEtZJXcMjRY8j+6lkdBSZ4qVtt8LLnyOjYthFJKHldUYc2XbMlW7oe YYyn0RjpDyB6qdwFp530C2lbH/HwuGa3YG8T8MVdkyx0R/Ly5dgiIPbb6t6Jk+fv/Vjt 9/7g== X-Gm-Message-State: AOAM531OX02/ERvwjIU9FaXf6dtH8XcfKe6aktBPlfNkbK3YZEFtr6An mdbg7VCU48oEn0tivTOBcyI= X-Google-Smtp-Source: ABdhPJxqhbmLMHF0gNlJVhoLFV1b5blMbNQgr03/Nj7cKZcemj6JSyrrkKZKeZFcNkjeBk9Sub7YzA== X-Received: by 2002:a9d:3f06:: with SMTP id m6mr656609otc.173.1600718650825; Mon, 21 Sep 2020 13:04:10 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id c7sm6276242ots.20.2020.09.21.13.04.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:10 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 04/12] rdma_rxe: Add alloc_mw and dealloc_mw verbs Date: Mon, 21 Sep 2020 15:03:48 -0500 Message-Id: <20200921200356.8627-5-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org - Add a new file focused on memory windows, rxe_mw.c. - Add alloc_mw and dealloc_mw verbs and added them to the list of supported user space verbs. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/Makefile | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 8 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 77 ++++++++++----------- drivers/infiniband/sw/rxe/rxe_mw.c | 98 +++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_pool.c | 33 +++++---- drivers/infiniband/sw/rxe/rxe_pool.h | 2 +- drivers/infiniband/sw/rxe/rxe_req.c | 24 +++---- drivers/infiniband/sw/rxe/rxe_resp.c | 4 +- drivers/infiniband/sw/rxe/rxe_verbs.c | 52 +++++++++----- drivers/infiniband/sw/rxe/rxe_verbs.h | 8 +++ include/uapi/rdma/rdma_user_rxe.h | 10 +++ 11 files changed, 232 insertions(+), 85 deletions(-) create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 66af72dca759..1e24673e9318 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -15,6 +15,7 @@ rdma_rxe-y := \ rxe_qp.o \ rxe_cq.o \ rxe_mr.o \ + rxe_mw.o \ rxe_opcode.o \ rxe_mmap.o \ rxe_icrc.o \ diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 9ec6bff6863f..65f2e4a94956 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -109,6 +109,14 @@ void rxe_mr_cleanup(struct rxe_pool_entry *arg); int advance_dma_data(struct rxe_dma_info *dma, unsigned int length); +/* rxe_mw.c */ +struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd, enum ib_mw_type type, + struct ib_udata *udata); + +int rxe_dealloc_mw(struct ib_mw *ibmw); + +void rxe_mw_cleanup(struct rxe_pool_entry *arg); + /* rxe_net.c */ void rxe_loopback(struct sk_buff *skb); int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 368012904879..4c53badfa4e9 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -7,21 +7,18 @@ #include "rxe.h" #include "rxe_loc.h" -/* - * lfsr (linear feedback shift register) with period 255 +/* choose a unique non zero random number for lkey + * use high order bit to indicate MR vs MW */ -static u8 rxe_get_key(void) +static void rxe_set_mr_lkey(struct rxe_mr *mr) { - static u32 key = 1; - - key = key << 1; - - key |= (0 != (key & 0x100)) ^ (0 != (key & 0x10)) - ^ (0 != (key & 0x80)) ^ (0 != (key & 0x40)); - - key &= 0xff; - - return key; + u32 lkey; +again: + get_random_bytes(&lkey, sizeof(lkey)); + lkey &= ~IS_MW; + if (likely(lkey && (rxe_add_key(mr, &lkey) == 0))) + return; + goto again; } int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) @@ -49,36 +46,19 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) static void rxe_mr_init(int access, struct rxe_mr *mr) { - u32 lkey = mr->pelem.index << 8 | rxe_get_key(); - u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0; - - if (mr->pelem.pool->type == RXE_TYPE_MR) { - mr->ibmr.lkey = lkey; - mr->ibmr.rkey = rkey; - } - - mr->lkey = lkey; - mr->rkey = rkey; + rxe_add_index(mr); + rxe_set_mr_lkey(mr); + if (access & IB_ACCESS_REMOTE) + mr->ibmr.rkey = mr->ibmr.lkey; + + /* TODO should not have two copies of lkey and rkey in mr */ + mr->lkey = mr->ibmr.lkey; + mr->rkey = mr->ibmr.rkey; mr->state = RXE_MEM_STATE_INVALID; mr->type = RXE_MR_TYPE_NONE; mr->map_shift = ilog2(RXE_BUF_PER_MAP); } -void rxe_mr_cleanup(struct rxe_pool_entry *arg) -{ - struct rxe_mr *mr = container_of(arg, typeof(*mr), pelem); - int i; - - ib_umem_release(mr->umem); - - if (mr->map) { - for (i = 0; i < mr->num_map; i++) - kfree(mr->map[i]); - - kfree(mr->map); - } -} - static int rxe_mr_alloc(struct rxe_mr *mr, int num_buf) { int i; @@ -541,9 +521,8 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, { struct rxe_mr *mr; struct rxe_dev *rxe = to_rdev(pd->ibpd.device); - int index = key >> 8; - mr = rxe_pool_get_index(&rxe->mr_pool, index); + mr = rxe_pool_get_key(&rxe->mr_pool, &key); if (!mr) return NULL; @@ -558,3 +537,21 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, return mr; } + +void rxe_mr_cleanup(struct rxe_pool_entry *arg) +{ + struct rxe_mr *mr = container_of(arg, typeof(*mr), pelem); + int i; + + ib_umem_release(mr->umem); + + if (mr->map) { + for (i = 0; i < mr->num_map; i++) + kfree(mr->map[i]); + + kfree(mr->map); + } + + rxe_drop_index(mr); + rxe_drop_key(mr); +} diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c new file mode 100644 index 000000000000..b818f1e869da --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_mw.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2020 Hewlett Packard Enterprise, Inc. All rights reserved. + */ + +#include "rxe.h" +#include "rxe_loc.h" + +/* choose a unique non zero random number for rkey + * use high order bit to indicate MR vs MW + */ +static void rxe_set_mw_rkey(struct rxe_mw *mw) +{ + u32 rkey; +again: + get_random_bytes(&rkey, sizeof(rkey)); + rkey |= IS_MW; + if (likely((rkey & ~IS_MW) && + (rxe_add_key(mw, &rkey) == 0))) + return; + goto again; +} + +struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd, enum ib_mw_type type, + struct ib_udata *udata) +{ + struct rxe_pd *pd = to_rpd(ibpd); + struct rxe_dev *rxe = to_rdev(ibpd->device); + struct rxe_mw *mw; + struct rxe_alloc_mw_resp __user *uresp = NULL; + + if (udata) { + if (udata->outlen < sizeof(*uresp)) + return ERR_PTR(-EINVAL); + uresp = udata->outbuf; + } + + if (unlikely((type != IB_MW_TYPE_1) && + (type != IB_MW_TYPE_2))) + return ERR_PTR(-EINVAL); + + rxe_add_ref(pd); + + mw = rxe_alloc(&rxe->mw_pool); + if (unlikely(!mw)) { + rxe_drop_ref(pd); + return ERR_PTR(-ENOMEM); + } + + rxe_add_index(mw); + rxe_set_mw_rkey(mw); /* sets mw->ibmw.rkey */ + + spin_lock_init(&mw->lock); + mw->qp = NULL; + mw->mr = NULL; + mw->addr = 0; + mw->length = 0; + mw->ibmw.pd = ibpd; + mw->ibmw.type = type; + mw->state = (type == IB_MW_TYPE_2) ? + RXE_MEM_STATE_FREE : + RXE_MEM_STATE_VALID; + + if (uresp) { + if (copy_to_user(&uresp->index, &mw->pelem.index, + sizeof(uresp->index))) { + rxe_drop_ref(mw); + rxe_drop_ref(pd); + return ERR_PTR(-EFAULT); + } + } + + return &mw->ibmw; +} + +int rxe_dealloc_mw(struct ib_mw *ibmw) +{ + struct rxe_mw *mw = to_rmw(ibmw); + struct rxe_pd *pd = to_rpd(ibmw->pd); + unsigned long flags; + + spin_lock_irqsave(&mw->lock, flags); + mw->state = RXE_MEM_STATE_INVALID; + spin_unlock_irqrestore(&mw->lock, flags); + + rxe_drop_ref(pd); + rxe_drop_ref(mw); + + return 0; +} + +void rxe_mw_cleanup(struct rxe_pool_entry *arg) +{ + struct rxe_mw *mw = container_of(arg, typeof(*mw), pelem); + + rxe_drop_index(mw); + rxe_drop_key(mw); +} diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index 30b8f037ee20..4bcb19a7b918 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -7,13 +7,12 @@ #include "rxe.h" #include "rxe_loc.h" -/* info about object pools - */ +/* info about object pools */ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_UC] = { .name = "rxe-uc", .size = sizeof(struct rxe_ucontext), - .flags = RXE_POOL_NO_ALLOC, + .flags = RXE_POOL_NO_ALLOC, }, [RXE_TYPE_PD] = { .name = "rxe-pd", @@ -43,23 +42,30 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_CQ] = { .name = "rxe-cq", .size = sizeof(struct rxe_cq), - .flags = RXE_POOL_NO_ALLOC, + .flags = RXE_POOL_NO_ALLOC, .cleanup = rxe_cq_cleanup, }, [RXE_TYPE_MR] = { .name = "rxe-mr", .size = sizeof(struct rxe_mr), .cleanup = rxe_mr_cleanup, - .flags = RXE_POOL_INDEX, + .flags = RXE_POOL_INDEX + | RXE_POOL_KEY, .max_index = RXE_MAX_MR_INDEX, .min_index = RXE_MIN_MR_INDEX, + .key_offset = offsetof(struct rxe_mr, ibmr.lkey), + .key_size = sizeof(u32), }, [RXE_TYPE_MW] = { .name = "rxe-mw", .size = sizeof(struct rxe_mw), - .flags = RXE_POOL_INDEX, + .cleanup = rxe_mw_cleanup, + .flags = RXE_POOL_INDEX + | RXE_POOL_KEY, .max_index = RXE_MAX_MW_INDEX, .min_index = RXE_MIN_MW_INDEX, + .key_offset = offsetof(struct rxe_mw, ibmw.rkey), + .key_size = sizeof(u32), }, [RXE_TYPE_MC_GRP] = { .name = "rxe-mc_grp", @@ -223,7 +229,7 @@ static void insert_index(struct rxe_pool *pool, struct rxe_pool_entry *new) return; } -static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) +static int insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) { struct rb_node **link = &pool->key.tree.rb_node; struct rb_node *parent = NULL; @@ -239,7 +245,7 @@ static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) if (cmp == 0) { pr_warn("key already exists!\n"); - goto out; + return -EAGAIN; } if (cmp > 0) @@ -250,20 +256,23 @@ static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) rb_link_node(&new->key_node, parent, link); rb_insert_color(&new->key_node, &pool->key.tree); -out: - return; + + return 0; } -void rxe_add_key(void *arg, void *key) +int rxe_add_key(void *arg, void *key) { + int ret; struct rxe_pool_entry *elem = arg; struct rxe_pool *pool = elem->pool; unsigned long flags; write_lock_irqsave(&pool->pool_lock, flags); memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size); - insert_key(pool, elem); + ret = insert_key(pool, elem); write_unlock_irqrestore(&pool->pool_lock, flags); + + return ret; } void rxe_drop_key(void *arg) diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h index 3d722aae5f15..5be975e3d5d3 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.h +++ b/drivers/infiniband/sw/rxe/rxe_pool.h @@ -122,7 +122,7 @@ void rxe_drop_index(void *elem); /* assign a key to a keyed object and insert object into * pool's rb tree */ -void rxe_add_key(void *elem, void *key); +int rxe_add_key(void *elem, void *key); /* remove elem from rb tree */ void rxe_drop_key(void *elem); diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 57236d8c2146..682f30bb3495 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -597,29 +597,29 @@ int rxe_requester(void *arg) if (wqe->mask & WR_REG_MASK) { if (wqe->wr.opcode == IB_WR_LOCAL_INV) { struct rxe_dev *rxe = to_rdev(qp->ibqp.device); - struct rxe_mr *rmr; + struct rxe_mr *mr; - rmr = rxe_pool_get_index(&rxe->mr_pool, - wqe->wr.ex.invalidate_rkey >> 8); - if (!rmr) { + mr = rxe_pool_get_key(&rxe->mr_pool, + &wqe->wr.ex.invalidate_rkey); + if (!mr) { pr_err("No mr for key %#x\n", wqe->wr.ex.invalidate_rkey); wqe->state = wqe_state_error; wqe->status = IB_WC_MW_BIND_ERR; goto exit; } - rmr->state = RXE_MEM_STATE_FREE; - rxe_drop_ref(rmr); + mr->state = RXE_MEM_STATE_FREE; + rxe_drop_ref(mr); wqe->state = wqe_state_done; wqe->status = IB_WC_SUCCESS; } else if (wqe->wr.opcode == IB_WR_REG_MR) { - struct rxe_mr *rmr = to_rmr(wqe->wr.wr.reg.mr); + struct rxe_mr *mr = to_rmr(wqe->wr.wr.reg.mr); - rmr->state = RXE_MEM_STATE_VALID; - rmr->access = wqe->wr.wr.reg.access; - rmr->lkey = wqe->wr.wr.reg.key; - rmr->rkey = wqe->wr.wr.reg.key; - rmr->iova = wqe->wr.wr.reg.mr->iova; + mr->state = RXE_MEM_STATE_VALID; + mr->access = wqe->wr.wr.reg.access; + mr->lkey = wqe->wr.wr.reg.key; + mr->rkey = wqe->wr.wr.reg.key; + mr->iova = wqe->wr.wr.reg.mr->iova; wqe->state = wqe_state_done; wqe->status = IB_WC_SUCCESS; } else { diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 69867bf39cfb..885b5bf6dc2e 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -888,8 +888,8 @@ static enum resp_states do_complete(struct rxe_qp *qp, wc->wc_flags |= IB_WC_WITH_INVALIDATE; wc->ex.invalidate_rkey = ieth_rkey(pkt); - rmr = rxe_pool_get_index(&rxe->mr_pool, - wc->ex.invalidate_rkey >> 8); + rmr = rxe_pool_get_key(&rxe->mr_pool, + &wc->ex.invalidate_rkey); if (unlikely(!rmr)) { pr_err("Bad rkey %#x invalidation\n", wc->ex.invalidate_rkey); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 626706f25efc..96fea64ba02d 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -869,12 +869,14 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access) struct rxe_pd *pd = to_rpd(ibpd); struct rxe_mr *mr; + rxe_add_ref(pd); + mr = rxe_alloc(&rxe->mr_pool); - if (!mr) + if (!mr) { + rxe_drop_ref(pd); return ERR_PTR(-ENOMEM); + } - rxe_add_index(mr); - rxe_add_ref(pd); rxe_mr_init_dma(pd, access, mr); return &mr->ibmr; @@ -890,6 +892,17 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, struct rxe_dev *rxe = to_rdev(ibpd->device); struct rxe_pd *pd = to_rpd(ibpd); struct rxe_mr *mr; + struct rxe_reg_mr_resp __user *uresp = NULL; + + if (udata) { + if (udata->outlen < sizeof(*uresp)) { + err = -EINVAL; + goto err1; + } + uresp = udata->outbuf; + } + + rxe_add_ref(pd); mr = rxe_alloc(&rxe->mr_pool); if (!mr) { @@ -897,22 +910,25 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, goto err2; } - rxe_add_index(mr); - - rxe_add_ref(pd); - err = rxe_mr_init_user(pd, start, length, iova, - access, udata, mr); + access, udata, mr); if (err) goto err3; - return &mr->ibmr; + if (uresp) { + if (copy_to_user(&uresp->index, &mr->pelem.index, + sizeof(uresp->index))) { + err = -EFAULT; + goto err3; + } + } + return &mr->ibmr; err3: - rxe_drop_ref(pd); - rxe_drop_index(mr); rxe_drop_ref(mr); err2: + rxe_drop_ref(pd); +err1: return ERR_PTR(err); } @@ -922,7 +938,6 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) mr->state = RXE_MEM_STATE_ZOMBIE; rxe_drop_ref(mr->pd); - rxe_drop_index(mr); rxe_drop_ref(mr); return 0; } @@ -938,16 +953,14 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, if (mr_type != IB_MR_TYPE_MEM_REG) return ERR_PTR(-EINVAL); + rxe_add_ref(pd); + mr = rxe_alloc(&rxe->mr_pool); if (!mr) { err = -ENOMEM; goto err1; } - rxe_add_index(mr); - - rxe_add_ref(pd); - err = rxe_mr_init_fast(pd, max_num_sg, mr); if (err) goto err2; @@ -955,10 +968,9 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, return &mr->ibmr; err2: - rxe_drop_ref(pd); - rxe_drop_index(mr); rxe_drop_ref(mr); err1: + rxe_drop_ref(pd); return ERR_PTR(err); } @@ -1105,6 +1117,8 @@ static const struct ib_device_ops rxe_dev_ops = { .reg_user_mr = rxe_reg_user_mr, .req_notify_cq = rxe_req_notify_cq, .resize_cq = rxe_resize_cq, + .alloc_mw = rxe_alloc_mw, + .dealloc_mw = rxe_dealloc_mw, INIT_RDMA_OBJ_SIZE(ib_ah, rxe_ah, ibah), INIT_RDMA_OBJ_SIZE(ib_cq, rxe_cq, ibcq), @@ -1166,6 +1180,8 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) | BIT_ULL(IB_USER_VERBS_CMD_DESTROY_AH) | BIT_ULL(IB_USER_VERBS_CMD_ATTACH_MCAST) | BIT_ULL(IB_USER_VERBS_CMD_DETACH_MCAST) + | BIT_ULL(IB_USER_VERBS_CMD_ALLOC_MW) + | BIT_ULL(IB_USER_VERBS_CMD_DEALLOC_MW) ; ib_set_device_ops(dev, &rxe_dev_ops); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index dbc649c9c43f..2233630fea7f 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -319,6 +319,9 @@ struct rxe_mr { struct rxe_map **map; }; +/* use high order bit to separate MW and MR rkeys */ +#define IS_MW (1 << 31) + struct rxe_mw { struct rxe_pool_entry pelem; struct ib_mw ibmw; @@ -441,6 +444,11 @@ static inline struct rxe_mr *to_rmr(struct ib_mr *mr) return mr ? container_of(mr, struct rxe_mr, ibmr) : NULL; } +static inline struct rxe_mw *to_rmw(struct ib_mw *mw) +{ + return mw ? container_of(mw, struct rxe_mw, ibmw) : NULL; +} + int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name); void rxe_mc_cleanup(struct rxe_pool_entry *arg); diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h index d8f2e0e46dab..4ad0fa0b2ab9 100644 --- a/include/uapi/rdma/rdma_user_rxe.h +++ b/include/uapi/rdma/rdma_user_rxe.h @@ -175,4 +175,14 @@ struct rxe_modify_srq_cmd { __aligned_u64 mmap_info_addr; }; +struct rxe_reg_mr_resp { + __u32 index; + __u32 reserved; +}; + +struct rxe_alloc_mw_resp { + __u32 index; + __u32 reserved; +}; + #endif /* RDMA_USER_RXE_H */ From patchwork Mon Sep 21 20:03:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790837 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D075E59D for ; Mon, 21 Sep 2020 20:04:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AAB28218AC for ; Mon, 21 Sep 2020 20:04:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ROqDHxtu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726810AbgIUUEM (ORCPT ); Mon, 21 Sep 2020 16:04:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEM (ORCPT ); Mon, 21 Sep 2020 16:04:12 -0400 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76082C061755 for ; Mon, 21 Sep 2020 13:04:12 -0700 (PDT) Received: by mail-ot1-x343.google.com with SMTP id n61so13485647ota.10 for ; Mon, 21 Sep 2020 13:04:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dCA8ffeEmbUKdEeQabMKRtMOY8BqmEV20DYGleD0gCk=; b=ROqDHxtu2GG/Z/s9GEn95SQLQncKTboFw+8vkMnMMAySoNIyYTyrSHThmw9cln0Rhg 9vxwGqj1UgpdnsXmtWNYZKqhO9oroujdkz+qz7QesaYJKSagNrLtGeS15tklpjIWJPDt lcofuqJS/gzcpoEc1jSzapcCE/+ako0WASYpIgMhAVJ+lLOqH6Ehlab3MuUOO0kBw3+H S4+oczZCM3MEOa4IlSSmKRU73ENP5dNd1Pl7YxGYBZi/m4A5NSFJrO1z2qSWvxew9cas p/GJRBxFD2GR25sBzM8qroAaNo9DeH931PYommgft3Y/wGeuh2b/ozsLs/20eCAWFWBY Z3eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dCA8ffeEmbUKdEeQabMKRtMOY8BqmEV20DYGleD0gCk=; b=lZSW1BFGYSikKSOrq1wLlP1mYpuIJFwEyUUJ9a7fTTpIo6L+DLAyzk1mrBWMcXuWb2 Tu0y0mO1drg1BQUuJjgWxMKu1HDIHh/7YlWUdcaHhZkU2DhBzMqTKqwM0wAyMi0qDHeS 8CJQmLLiUAVV+kQHVB0rHGhxDPNi/M36wuZWCamzofiV+AkRAwMmYNYPtVA/0gyYdnz2 5dJj0LuYZfux+XB67PjZdCDHygRSZjl+neQJPdbb870csEJDWgvUeRo8iuEKGE4ZG/HU mrgOjWxHAKsfpCbuGCgCIIyb0TtI8ridbHMuVJCxmEaOtWH5eZ+VmJ8D92ii7Cj/eSGN xDOQ== X-Gm-Message-State: AOAM532uSyi7hTgjd4OUkimTAhBrp+ijjqv9MXnuSM1bIqCGugqFkYdX Fshryi5l2MldiSTpnDCpO2Q= X-Google-Smtp-Source: ABdhPJzIIBO5bPpKdhuOF43yNbbdVi80sK5tPOePv+uvmyiNUy9CZyt30T3/dcSDXI41klgO2oHZaQ== X-Received: by 2002:a05:6830:14c:: with SMTP id j12mr682364otp.146.1600718651680; Mon, 21 Sep 2020 13:04:11 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id g7sm6114167otl.59.2020.09.21.13.04.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:11 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 05/12] rdma_rxe: Add bind_mw and invalidate_mw verbs Date: Mon, 21 Sep 2020 15:03:49 -0500 Message-Id: <20200921200356.8627-6-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org - Add code to implement ibv_bind_mw (for type 1 MWs) and post send queue bind_mw (for type 2 MWs). - Add code to implement local (post send) and remote (send with invalidate) invalidate operations. - Add rules checking for MW operations from IBA. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_comp.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 6 + drivers/infiniband/sw/rxe/rxe_mr.c | 13 +- drivers/infiniband/sw/rxe/rxe_mw.c | 289 ++++++++++++++++++++++++- drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- drivers/infiniband/sw/rxe/rxe_opcode.h | 1 - drivers/infiniband/sw/rxe/rxe_req.c | 126 ++++++++--- drivers/infiniband/sw/rxe/rxe_resp.c | 81 +++++-- drivers/infiniband/sw/rxe/rxe_verbs.c | 2 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 7 + include/uapi/rdma/rdma_user_rxe.h | 34 ++- 11 files changed, 508 insertions(+), 63 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index 5dc86c9e74c2..8b81d3b24a8a 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -103,6 +103,7 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode) case IB_WR_RDMA_READ_WITH_INV: return IB_WC_RDMA_READ; case IB_WR_LOCAL_INV: return IB_WC_LOCAL_INV; case IB_WR_REG_MR: return IB_WC_REG_MR; + case IB_WR_BIND_MW: return IB_WC_BIND_MW; default: return 0xff; diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 65f2e4a94956..1ee6270d3f2a 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -109,6 +109,8 @@ void rxe_mr_cleanup(struct rxe_pool_entry *arg); int advance_dma_data(struct rxe_dma_info *dma, unsigned int length); +int rxe_invalidate_mr(struct rxe_qp *qp, struct rxe_mr *mr); + /* rxe_mw.c */ struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd, enum ib_mw_type type, struct ib_udata *udata); @@ -117,6 +119,10 @@ int rxe_dealloc_mw(struct ib_mw *ibmw); void rxe_mw_cleanup(struct rxe_pool_entry *arg); +int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe); + +int rxe_invalidate_mw(struct rxe_qp *qp, struct rxe_mw *mw); + /* rxe_net.c */ void rxe_loopback(struct sk_buff *skb); int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 4c53badfa4e9..3f7c9b84f99b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -538,12 +538,23 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, return mr; } +int rxe_invalidate_mr(struct rxe_qp *qp, struct rxe_mr *mr) +{ + /* TODO there are API rules being ignored here + * cleanup later. Current project is not trying + * to fix MR + */ + mr->state = RXE_MEM_STATE_FREE; + return 0; +} + void rxe_mr_cleanup(struct rxe_pool_entry *arg) { struct rxe_mr *mr = container_of(arg, typeof(*mr), pelem); int i; - ib_umem_release(mr->umem); + if (mr->umem) + ib_umem_release(mr->umem); if (mr->map) { for (i = 0; i < mr->num_map; i++) diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c index b818f1e869da..51bc71c98654 100644 --- a/drivers/infiniband/sw/rxe/rxe_mw.c +++ b/drivers/infiniband/sw/rxe/rxe_mw.c @@ -30,7 +30,7 @@ struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd, enum ib_mw_type type, struct rxe_alloc_mw_resp __user *uresp = NULL; if (udata) { - if (udata->outlen < sizeof(*uresp)) + if (unlikely(udata->outlen < sizeof(*uresp))) return ERR_PTR(-EINVAL); uresp = udata->outbuf; } @@ -62,10 +62,9 @@ struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd, enum ib_mw_type type, RXE_MEM_STATE_VALID; if (uresp) { - if (copy_to_user(&uresp->index, &mw->pelem.index, - sizeof(uresp->index))) { + if (unlikely(copy_to_user(&uresp->index, &mw->pelem.index, + sizeof(uresp->index)))) { rxe_drop_ref(mw); - rxe_drop_ref(pd); return ERR_PTR(-EFAULT); } } @@ -73,22 +72,298 @@ struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd, enum ib_mw_type type, return &mw->ibmw; } +/* cleanup mw in case someone is still holding a ref */ +static void do_dealloc_mw(struct rxe_mw *mw) +{ + if (mw->mr) { + rxe_drop_ref(mw->mr); + atomic_dec(&mw->mr->num_mw); + mw->mr = NULL; + } + + mw->qp = NULL; + mw->access = 0; + mw->addr = 0; + mw->length = 0; + mw->state = RXE_MEM_STATE_INVALID; +} + int rxe_dealloc_mw(struct ib_mw *ibmw) { struct rxe_mw *mw = to_rmw(ibmw); - struct rxe_pd *pd = to_rpd(ibmw->pd); unsigned long flags; spin_lock_irqsave(&mw->lock, flags); - mw->state = RXE_MEM_STATE_INVALID; + + do_dealloc_mw(mw); + + spin_unlock_irqrestore(&mw->lock, flags); + + rxe_drop_ref(mw); + + return 0; +} + +/* Check the rules for bind MW oepration. */ +static int check_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe, + struct rxe_mw *mw, struct rxe_mr *mr) +{ + /* check to see if bind operation came through + * ibv_bind_mw verbs API. + */ + switch (mw->ibmw.type) { + case IB_MW_TYPE_1: + /* o10-37.2.34 */ + if (unlikely(!(wqe->wr.wr.umw.flags & RXE_BIND_MW))) { + pr_err_once("attempt to bind type 1 MW with send WR\n"); + return -EINVAL; + } + break; + case IB_MW_TYPE_2: + /* o10-37.2.35 */ + if (unlikely(wqe->wr.wr.umw.flags & RXE_BIND_MW)) { + pr_err_once("attempt to bind type 2 MW with verbs API\n"); + return -EINVAL; + } + + /* C10-72 */ + if (unlikely(qp->pd != to_rpd(mw->ibmw.pd))) { + pr_err_once("attempt to bind type 2 MW with qp with different PD\n"); + return -EINVAL; + } + + /* o10-37.2.40 */ + if (unlikely(wqe->wr.wr.umw.length == 0)) { + pr_err_once("attempt to invalidate type 2 MW by binding with zero length\n"); + return -EINVAL; + } + + if (unlikely(!mr)) { + pr_err_once("attempt to bind MW to a NULL mr\n"); + return -EINVAL; + } + break; + default: + return -EINVAL; + } + + if (unlikely((mw->ibmw.type == IB_MW_TYPE_1) && + (mw->state != RXE_MEM_STATE_VALID))) { + pr_err_once("attempt to bind a type 1 MW not in the valid state\n"); + return -EINVAL; + } + + /* o10-36.2.2 */ + if (unlikely((mw->access & IB_ZERO_BASED) && + (mw->ibmw.type == IB_MW_TYPE_1))) { + pr_err_once("attempt to bind a zero based type 1 MW\n"); + return -EINVAL; + } + + if (unlikely((wqe->wr.wr.umw.rkey & 0xff) == (mw->ibmw.rkey & 0xff))) { + pr_err_once("attempt to bind MW with same key\n"); + return -EINVAL; + } + + /* remaining checks only apply to a nonzero MR */ + if (!mr) + return 0; + + if (unlikely(mr->access & IB_ZERO_BASED)) { + pr_err_once("attempt to bind MW to zero based MR\n"); + return -EINVAL; + } + + /* o10-37.2.30 */ + if (unlikely((mw->ibmw.type == IB_MW_TYPE_2) && + (mw->state != RXE_MEM_STATE_FREE))) { + pr_err_once("attempt to bind a type 2 MW not in the free state\n"); + return -EINVAL; + } + + /* C10-73 */ + if (unlikely(!(mr->access & IB_ACCESS_MW_BIND))) { + pr_err_once("attempt to bind an MW to an MR without bind access\n"); + return -EINVAL; + } + + /* C10-74 */ + if (unlikely((mw->access & (IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_ATOMIC)) && + !(mr->access & IB_ACCESS_LOCAL_WRITE))) { + pr_err_once("attempt to bind an writeable MW to an MR without local write access\n"); + return -EINVAL; + } + + /* C10-75 */ + if (mw->access & IB_ZERO_BASED) { + if (unlikely(wqe->wr.wr.umw.length > mr->length)) { + pr_err_once("attempt to bind a ZB MW outside of the MR\n"); + return -EINVAL; + } + } else { + if (unlikely((wqe->wr.wr.umw.addr < mr->iova) || + ((wqe->wr.wr.umw.addr + wqe->wr.wr.umw.length) > + (mr->iova + mr->length)))) { + pr_err_once("attempt to bind a VA MW outside of the MR\n"); + return -EINVAL; + } + } + + return 0; +} + +static int do_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe, + struct rxe_mw *mw, struct rxe_mr *mr) +{ + u32 rkey; + u32 new_rkey; + struct rxe_mw *duplicate_mw; + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + + /* key part of new rkey is provided by user for type 2 + * and ibv_bind_mw() for type 1 MWs + * there is a very rare chance that the new rkey will + * collide with an existing MW. Return an error if this + * occurs + */ + rkey = mw->ibmw.rkey; + new_rkey = (rkey & 0xffffff00) | (wqe->wr.wr.umw.rkey & 0x000000ff); + duplicate_mw = rxe_pool_get_key(&rxe->mw_pool, &new_rkey); + if (duplicate_mw) { + pr_err_once("new MW key is a duplicate, try another\n"); + rxe_drop_ref(duplicate_mw); + return -EINVAL; + } + + rxe_drop_key(mw); + rxe_add_key(mw, &new_rkey); + + mw->access = wqe->wr.wr.umw.access; + mw->state = RXE_MEM_STATE_VALID; + mw->addr = wqe->wr.wr.umw.addr; + mw->length = wqe->wr.wr.umw.length; + + if (mw->mr) { + rxe_drop_ref(mw->mr); + atomic_dec(&mw->mr->num_mw); + mw->mr = NULL; + } + + if (mw->length) { + mw->mr = mr; + atomic_inc(&mr->num_mw); + rxe_add_ref(mr); + } + + if (mw->ibmw.type == IB_MW_TYPE_2) + mw->qp = qp; + + return 0; +} + +int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe) +{ + int ret; + struct rxe_mw *mw; + struct rxe_mr *mr; + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + unsigned long flags; + + if (qp->is_user) { + mw = rxe_pool_get_index(&rxe->mw_pool, + wqe->wr.wr.umw.mw_index); + if (!mw) { + pr_err_once("mw with index = %d not found\n", + wqe->wr.wr.umw.mw_index); + ret = -EINVAL; + goto err1; + } + mr = rxe_pool_get_index(&rxe->mr_pool, + wqe->wr.wr.umw.mr_index); + if (!mr && wqe->wr.wr.umw.length) { + pr_err_once("mr with index = %d not found\n", + wqe->wr.wr.umw.mr_index); + ret = -EINVAL; + goto err2; + } + } else { + mw = to_rmw(wqe->wr.wr.kmw.mw); + rxe_add_ref(mw); + if (wqe->wr.wr.kmw.mr) { + mr = to_rmr(wqe->wr.wr.kmw.mr); + rxe_add_ref(mr); + } else { + mr = NULL; + } + } + + spin_lock_irqsave(&mw->lock, flags); + + ret = check_bind_mw(qp, wqe, mw, mr); + if (ret) + goto err3; + + ret = do_bind_mw(qp, wqe, mw, mr); +err3: spin_unlock_irqrestore(&mw->lock, flags); - rxe_drop_ref(pd); + if (mr) + rxe_drop_ref(mr); +err2: rxe_drop_ref(mw); +err1: + return ret; +} + +static int check_invalidate_mw(struct rxe_qp *qp, struct rxe_mw *mw) +{ + if (unlikely(mw->state != RXE_MEM_STATE_VALID)) { + pr_err_once("attempt to invalidate a MW that is not valid\n"); + return -EINVAL; + } + + /* o10-37.2.26 */ + if (unlikely(mw->ibmw.type == IB_MW_TYPE_1)) { + pr_err_once("attempt to invalidate a type 1 MW\n"); + return -EINVAL; + } return 0; } +static void do_invalidate_mw(struct rxe_mw *mw) +{ + mw->qp = NULL; + + rxe_drop_ref(mw->mr); + atomic_dec(&mw->mr->num_mw); + mw->mr = NULL; + + mw->access = 0; + mw->addr = 0; + mw->length = 0; + mw->state = RXE_MEM_STATE_FREE; +} + +int rxe_invalidate_mw(struct rxe_qp *qp, struct rxe_mw *mw) +{ + int ret; + unsigned long flags; + + spin_lock_irqsave(&mw->lock, flags); + + ret = check_invalidate_mw(qp, mw); + if (ret) + goto err; + + do_invalidate_mw(mw); +err: + spin_unlock_irqrestore(&mw->lock, flags); + + return ret; +} + void rxe_mw_cleanup(struct rxe_pool_entry *arg) { struct rxe_mw *mw = container_of(arg, typeof(*mw), pelem); diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c index 0cb4b01fd910..5532f01ae5a3 100644 --- a/drivers/infiniband/sw/rxe/rxe_opcode.c +++ b/drivers/infiniband/sw/rxe/rxe_opcode.c @@ -87,13 +87,20 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = { [IB_WR_LOCAL_INV] = { .name = "IB_WR_LOCAL_INV", .mask = { - [IB_QPT_RC] = WR_REG_MASK, + [IB_QPT_RC] = WR_LOCAL_MASK, }, }, [IB_WR_REG_MR] = { .name = "IB_WR_REG_MR", .mask = { - [IB_QPT_RC] = WR_REG_MASK, + [IB_QPT_RC] = WR_LOCAL_MASK, + }, + }, + [IB_WR_BIND_MW] = { + .name = "IB_WR_BIND_MW", + .mask = { + [IB_QPT_RC] = WR_LOCAL_MASK, + [IB_QPT_UC] = WR_LOCAL_MASK, }, }, }; diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h index 1041ac9a9233..440e34f446bd 100644 --- a/drivers/infiniband/sw/rxe/rxe_opcode.h +++ b/drivers/infiniband/sw/rxe/rxe_opcode.h @@ -20,7 +20,6 @@ enum rxe_wr_mask { WR_READ_MASK = BIT(3), WR_WRITE_MASK = BIT(4), WR_LOCAL_MASK = BIT(5), - WR_REG_MASK = BIT(6), WR_READ_OR_WRITE_MASK = WR_READ_MASK | WR_WRITE_MASK, WR_READ_WRITE_OR_SEND_MASK = WR_READ_OR_WRITE_MASK | WR_SEND_MASK, diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 682f30bb3495..1944d2bbc43b 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -524,9 +524,9 @@ static void save_state(struct rxe_send_wqe *wqe, struct rxe_send_wqe *rollback_wqe, u32 *rollback_psn) { - rollback_wqe->state = wqe->state; + rollback_wqe->state = wqe->state; rollback_wqe->first_psn = wqe->first_psn; - rollback_wqe->last_psn = wqe->last_psn; + rollback_wqe->last_psn = wqe->last_psn; *rollback_psn = qp->req.psn; } @@ -556,9 +556,38 @@ static void update_state(struct rxe_qp *qp, struct rxe_send_wqe *wqe, jiffies + qp->qp_timeout_jiffies); } +static int invalidate_key(struct rxe_qp *qp, u32 key) +{ + int ret; + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + struct rxe_mw *mw; + struct rxe_mr *mr; + + if (key & IS_MW) { + mw = rxe_pool_get_key(&rxe->mw_pool, &key); + if (!mw) { + pr_err("No mw for key %#x\n", key); + return -EINVAL; + } + ret = rxe_invalidate_mw(qp, mw); + rxe_drop_ref(mw); + } else { + mr = rxe_pool_get_key(&rxe->mr_pool, &key); + if (!mr) { + pr_err("No mr for key %#x\n", key); + return -EINVAL; + } + ret = rxe_invalidate_mr(qp, mr); + rxe_drop_ref(mr); + } + + return ret; +} + int rxe_requester(void *arg) { struct rxe_qp *qp = (struct rxe_qp *)arg; + struct rxe_mr *mr; struct rxe_pkt_info pkt; struct sk_buff *skb; struct rxe_send_wqe *wqe; @@ -569,6 +598,7 @@ int rxe_requester(void *arg) int ret; struct rxe_send_wqe rollback_wqe; u32 rollback_psn; + u32 key; rxe_add_ref(qp); @@ -594,42 +624,47 @@ int rxe_requester(void *arg) if (unlikely(!wqe)) goto exit; - if (wqe->mask & WR_REG_MASK) { - if (wqe->wr.opcode == IB_WR_LOCAL_INV) { - struct rxe_dev *rxe = to_rdev(qp->ibqp.device); - struct rxe_mr *mr; - - mr = rxe_pool_get_key(&rxe->mr_pool, - &wqe->wr.ex.invalidate_rkey); - if (!mr) { - pr_err("No mr for key %#x\n", - wqe->wr.ex.invalidate_rkey); - wqe->state = wqe_state_error; - wqe->status = IB_WC_MW_BIND_ERR; - goto exit; + if (wqe->mask & WR_LOCAL_MASK) { + switch (wqe->wr.opcode) { + case IB_WR_LOCAL_INV: + key = wqe->wr.ex.invalidate_rkey; + ret = invalidate_key(qp, key); + if (ret) { + wqe->status = IB_WC_LOC_QP_OP_ERR; + goto err; } - mr->state = RXE_MEM_STATE_FREE; - rxe_drop_ref(mr); - wqe->state = wqe_state_done; - wqe->status = IB_WC_SUCCESS; - } else if (wqe->wr.opcode == IB_WR_REG_MR) { - struct rxe_mr *mr = to_rmr(wqe->wr.wr.reg.mr); - + break; + case IB_WR_REG_MR: + mr = to_rmr(wqe->wr.wr.reg.mr); mr->state = RXE_MEM_STATE_VALID; mr->access = wqe->wr.wr.reg.access; mr->lkey = wqe->wr.wr.reg.key; mr->rkey = wqe->wr.wr.reg.key; mr->iova = wqe->wr.wr.reg.mr->iova; - wqe->state = wqe_state_done; - wqe->status = IB_WC_SUCCESS; - } else { - goto exit; + break; + case IB_WR_BIND_MW: + ret = rxe_bind_mw(qp, wqe); + if (ret) { + wqe->state = wqe_state_done; + wqe->status = IB_WC_MW_BIND_ERR; + goto err; + } + break; + default: + pr_err_once("unexpected LOCAL WR opcode = %d\n", + wqe->wr.opcode); + goto err; } + + wqe->state = wqe_state_done; + wqe->status = IB_WC_SUCCESS; + qp->req.wqe_index = next_index(qp->sq.queue, + qp->req.wqe_index); + if ((wqe->wr.send_flags & IB_SEND_SIGNALED) || qp->sq_sig_type == IB_SIGNAL_ALL_WR) rxe_run_task(&qp->comp.task, 1); - qp->req.wqe_index = next_index(qp->sq.queue, - qp->req.wqe_index); + goto next_wqe; } @@ -649,6 +684,7 @@ int rxe_requester(void *arg) opcode = next_opcode(qp, wqe, wqe->wr.opcode); if (unlikely(opcode < 0)) { wqe->status = IB_WC_LOC_QP_OP_ERR; + /* TODO this should be goto err */ goto exit; } @@ -678,8 +714,7 @@ int rxe_requester(void *arg) wqe->state = wqe_state_done; wqe->status = IB_WC_SUCCESS; __rxe_do_task(&qp->comp.task); - rxe_drop_ref(qp); - return 0; + goto again; } payload = mtu; } @@ -687,12 +722,14 @@ int rxe_requester(void *arg) skb = init_req_packet(qp, wqe, opcode, payload, &pkt); if (unlikely(!skb)) { pr_err("qp#%d Failed allocating skb\n", qp_num(qp)); + wqe->status = IB_WC_LOC_PROT_ERR; goto err; } if (fill_packet(qp, wqe, &pkt, skb, payload)) { pr_debug("qp#%d Error during fill packet\n", qp_num(qp)); kfree_skb(skb); + wqe->status = IB_WC_LOC_PROT_ERR; goto err; } @@ -716,6 +753,7 @@ int rxe_requester(void *arg) goto exit; } + wqe->status = IB_WC_LOC_PROT_ERR; goto err; } @@ -724,11 +762,35 @@ int rxe_requester(void *arg) goto next_wqe; err: - wqe->status = IB_WC_LOC_PROT_ERR; + /* we come here if an error occurred while processing + * a send wqe. The completer will put the qp in error + * state and no more wqes will be processed unless + * the qp is cleaned up and restarted. We do not want + * to be called again + */ wqe->state = wqe_state_error; __rxe_do_task(&qp->comp.task); + ret = -EAGAIN; + goto done; exit: + /* we come here if either there are no more wqes in the send + * queue or we are blocked waiting for some resource or event. + * The current wqe will be restarted or new wqe started when + * there is work to do or we can complete the current wqe. + */ + ret = -EAGAIN; + goto done; + +again: + /* we come here if we are done with the current wqe but want to + * get called again. Mostly we loop back to next wqe so should + * be all one way or the other + */ + ret = 0; + goto done; + +done: rxe_drop_ref(qp); - return -EAGAIN; + return ret; } diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 885b5bf6dc2e..4d688a50d301 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -35,6 +35,7 @@ enum resp_states { RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, RESPST_ERR_RNR, RESPST_ERR_RKEY_VIOLATION, + RESPST_ERR_INVALIDATE_RKEY, RESPST_ERR_LENGTH, RESPST_ERR_CQ_OVERFLOW, RESPST_ERROR, @@ -68,6 +69,7 @@ static char *resp_state_name[] = { [RESPST_ERR_TOO_MANY_RDMA_ATM_REQ] = "ERR_TOO_MANY_RDMA_ATM_REQ", [RESPST_ERR_RNR] = "ERR_RNR", [RESPST_ERR_RKEY_VIOLATION] = "ERR_RKEY_VIOLATION", + [RESPST_ERR_INVALIDATE_RKEY] = "ERR_INVALIDATE_RKEY_VIOLATION", [RESPST_ERR_LENGTH] = "ERR_LENGTH", [RESPST_ERR_CQ_OVERFLOW] = "ERR_CQ_OVERFLOW", [RESPST_ERROR] = "ERROR", @@ -751,6 +753,39 @@ static void build_rdma_network_hdr(union rdma_network_hdr *hdr, memcpy(&hdr->ibgrh, ipv6_hdr(skb), sizeof(hdr->ibgrh)); } +static int invalidate_rkey(struct rxe_qp *qp, u32 rkey) +{ + int ret; + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + struct rxe_mw *mw; + struct rxe_mr *mr; + + if (rkey & IS_MW) { + mw = rxe_pool_get_key(&rxe->mw_pool, &rkey); + if (!mw) { + pr_err("No mw for rkey %#x\n", rkey); + goto err; + } + ret = rxe_invalidate_mw(qp, mw); + rxe_drop_ref(mw); + } else { + mr = rxe_pool_get_key(&rxe->mr_pool, &rkey); + if (!mr || mr->ibmr.rkey != rkey) { + pr_err("No mr for rkey %#x\n", rkey); + goto err; + } + ret = rxe_invalidate_mr(qp, mr); + rxe_drop_ref(mr); + } + + if (ret) + goto err; + + return 0; +err: + return RESPST_ERR_INVALIDATE_RKEY; +} + /* Executes a new request. A retried request never reach that function (send * and writes are discarded, and reads and atomics are retried elsewhere. */ @@ -790,6 +825,14 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt) WARN_ON_ONCE(1); } + if (pkt->mask & RXE_IETH_MASK) { + u32 rkey = ieth_rkey(pkt); + + err = invalidate_rkey(qp, rkey); + if (err) + return err; + } + /* next expected psn, read handles this separately */ qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK; qp->resp.ack_psn = qp->resp.psn; @@ -822,15 +865,20 @@ static enum resp_states do_complete(struct rxe_qp *qp, memset(&cqe, 0, sizeof(cqe)); if (qp->rcq->is_user) { - uwc->status = qp->resp.status; - uwc->qp_num = qp->ibqp.qp_num; - uwc->wr_id = wqe->wr_id; + uwc->status = qp->resp.status; + uwc->qp_num = qp->ibqp.qp_num; + uwc->wr_id = wqe->wr_id; } else { - wc->status = qp->resp.status; - wc->qp = &qp->ibqp; - wc->wr_id = wqe->wr_id; + wc->status = qp->resp.status; + wc->qp = &qp->ibqp; + wc->wr_id = wqe->wr_id; } + /* TODO nothing is returned for error WQEs but + * at least some of these have important things + * to report (for example send with invalidate but + * rkey fails) Fix this when I clean up MR logic + */ if (wc->status == IB_WC_SUCCESS) { rxe_counter_inc(rxe, RXE_CNT_RDMA_RECV); wc->opcode = (pkt->mask & RXE_IMMDT_MASK && @@ -883,20 +931,8 @@ static enum resp_states do_complete(struct rxe_qp *qp, } if (pkt->mask & RXE_IETH_MASK) { - struct rxe_mr *rmr; - wc->wc_flags |= IB_WC_WITH_INVALIDATE; wc->ex.invalidate_rkey = ieth_rkey(pkt); - - rmr = rxe_pool_get_key(&rxe->mr_pool, - &wc->ex.invalidate_rkey); - if (unlikely(!rmr)) { - pr_err("Bad rkey %#x invalidation\n", - wc->ex.invalidate_rkey); - return RESPST_ERROR; - } - rmr->state = RXE_MEM_STATE_FREE; - rxe_drop_ref(rmr); } wc->qp = &qp->ibqp; @@ -1314,6 +1350,15 @@ int rxe_responder(void *arg) } break; + case RESPST_ERR_INVALIDATE_RKEY: + /* RC Only - Class C. */ + /* Class J */ + qp->resp.goto_error = 1; + /* is there a better choice */ + qp->resp.status = IB_WC_REM_INV_REQ_ERR; + state = RESPST_COMPLETE; + break; + case RESPST_ERR_LENGTH: if (qp_type(qp) == IB_QPT_RC) { /* Class C */ diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 96fea64ba02d..21582507ed32 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -577,7 +577,7 @@ static int init_send_wqe(struct rxe_qp *qp, const struct ib_send_wr *ibwr, p += sge->length; } - } else if (mask & WR_REG_MASK) { + } else if (mask & WR_LOCAL_MASK) { wqe->mask = mask; wqe->state = wqe_state_posted; return 0; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 2233630fea7f..2fb5581edd8a 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -316,9 +316,16 @@ struct rxe_mr { u32 max_buf; u32 num_map; + atomic_t num_mw; + struct rxe_map **map; }; +enum rxe_send_flags { + /* flag indicaes bind call came through verbs API */ + RXE_BIND_MW = (1 << 0), +}; + /* use high order bit to separate MW and MR rkeys */ #define IS_MW (1 << 31) diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h index 4ad0fa0b2ab9..d49125682359 100644 --- a/include/uapi/rdma/rdma_user_rxe.h +++ b/include/uapi/rdma/rdma_user_rxe.h @@ -93,7 +93,39 @@ struct rxe_send_wr { __u32 remote_qkey; __u16 pkey_index; } ud; - /* reg is only used by the kernel and is not part of the uapi */ + struct { + __aligned_u64 addr; + __aligned_u64 length; + union { + __u32 mr_index; + __aligned_u64 reserved1; + }; + union { + __u32 mw_index; + __aligned_u64 reserved2; + }; + __u32 rkey; + __u32 access; + __u32 flags; + } umw; + /* The following are only used by the kernel + * and are not part of the uapi + */ + struct { + __aligned_u64 addr; + __aligned_u64 length; + union { + struct ib_mr *mr; + __aligned_u64 reserved1; + }; + union { + struct ib_mw *mw; + __aligned_u64 reserved2; + }; + __u32 rkey; + __u32 access; + __u32 flags; + } kmw; struct { union { struct ib_mr *mr; From patchwork Mon Sep 21 20:03:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790839 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A0F07139A for ; Mon, 21 Sep 2020 20:04:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7DF8D21BE5 for ; Mon, 21 Sep 2020 20:04:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="N7xd07PG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726818AbgIUUEN (ORCPT ); Mon, 21 Sep 2020 16:04:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEN (ORCPT ); Mon, 21 Sep 2020 16:04:13 -0400 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 321FAC061755 for ; Mon, 21 Sep 2020 13:04:13 -0700 (PDT) Received: by mail-oi1-x244.google.com with SMTP id x14so18363471oic.9 for ; Mon, 21 Sep 2020 13:04:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RYmpaMLJIxjK0+w6KaJphvnvRg8MR9Hpu48zjGtxNOk=; b=N7xd07PGtS9pphLNFLALJiLJm+XVdp2Te0GorlP494H05qYqOhA0vDs/Cb+C4c5i29 xIr0VOg+LNlFQGKDkHkny2KNswRVPyD2l7LqwBHSNQ4WCHZ0bFxHjJPVb66XOkYlMiDc nEwhJ1P66hYX38Wp/P4++xnAsE1m1bmqDc3d6O9hPy3loIzgkJwYtsHZxMig0F63jYLw 7whfFpzT5XGY8ORfdBQofSqDFa48c1DiRfPqkPcFK3TmBhnnEKLT8hI17sNmGpnz9gti 9BoKNbvEZTc6fe9nbTccsegtKBUIm9nF8gAlbWyLi9DE+fHLqQSTNg8eCG42EzpYA+Xf s3Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RYmpaMLJIxjK0+w6KaJphvnvRg8MR9Hpu48zjGtxNOk=; b=LAPUZr0VvxNzaqyoj+he0e7mi7q9/ohNMUqkXtGMBAJk6fsQSzbhUGdZgGRMWbYpM0 czXjxML3mhePTAsXkyVTzNz4gQcehuAkgHlRK0Z4/BCK1M0rEdTXDaP3HHnMsicTUer0 xdpeMPFYZOTG8h2bMwU8dN4tlGe2OWaKsmSy8c51aE988VESo7ZktQqCeKQx8YLFYCvF HIBe8nmO3CA+sWf34xzUr4nvAfitGXi8+QFO+6URLvkHPibxUX717SXf77PJsLQXwowj 84BYwSSvD69u2BY1dmCTokKHBKIaL5MSfvYJlcnQfPnHSx0KOqqLIwfn5b9pGtsVNyVQ kGyg== X-Gm-Message-State: AOAM530iXeegJR/wTKXiKx49FiqAxhUKQLiylCJR0htbzKTRw+sYInBU JBBTC9JVYqDHWw54XOfGAdA= X-Google-Smtp-Source: ABdhPJzNTcbhvYASdOe4oG5KCGUMwTKrLlYV1T35+vN2WMZQpr8zUFAsAk65FfxKbHwBiKtL8iBK1Q== X-Received: by 2002:aca:d549:: with SMTP id m70mr638531oig.49.1600718652518; Mon, 21 Sep 2020 13:04:12 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id a2sm7505418ooo.26.2020.09.21.13.04.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:12 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 06/12] Add memory access through MWs Date: Mon, 21 Sep 2020 15:03:50 -0500 Message-Id: <20200921200356.8627-7-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Implement memory access through MWs. Add rules checks from IBA. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_loc.h | 19 +++--- drivers/infiniband/sw/rxe/rxe_mr.c | 82 +++++++++++++++----------- drivers/infiniband/sw/rxe/rxe_mw.c | 57 +++++++++++++++--- drivers/infiniband/sw/rxe/rxe_req.c | 15 ++--- drivers/infiniband/sw/rxe/rxe_resp.c | 83 ++++++++++++++++++++------- drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 6 files changed, 177 insertions(+), 80 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 1ee6270d3f2a..b824aeeb8b6b 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -100,29 +100,30 @@ enum lookup_type { lookup_remote, }; -struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, - enum lookup_type type); - -int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length); - -void rxe_mr_cleanup(struct rxe_pool_entry *arg); - int advance_dma_data(struct rxe_dma_info *dma, unsigned int length); int rxe_invalidate_mr(struct rxe_qp *qp, struct rxe_mr *mr); +int rxe_mr_check_access(struct rxe_qp *qp, struct rxe_mr *mr, + int access, u64 va, u32 resid); + +void rxe_mr_cleanup(struct rxe_pool_entry *arg); + /* rxe_mw.c */ struct ib_mw *rxe_alloc_mw(struct ib_pd *ibpd, enum ib_mw_type type, struct ib_udata *udata); int rxe_dealloc_mw(struct ib_mw *ibmw); -void rxe_mw_cleanup(struct rxe_pool_entry *arg); - int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe); int rxe_invalidate_mw(struct rxe_qp *qp, struct rxe_mw *mw); +int rxe_mw_check_access(struct rxe_qp *qp, struct rxe_mw *mw, + int access, u64 va, u32 resid); + +void rxe_mw_cleanup(struct rxe_pool_entry *arg); + /* rxe_net.c */ void rxe_loopback(struct sk_buff *skb); int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 3f7c9b84f99b..3a7f814f4b81 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -21,7 +21,7 @@ static void rxe_set_mr_lkey(struct rxe_mr *mr) goto again; } -int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) +static int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) { switch (mr->type) { case RXE_MR_TYPE_DMA: @@ -380,6 +380,25 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, return err; } +static struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 lkey) +{ + struct rxe_mr *mr; + struct rxe_dev *rxe = to_rdev(pd->ibpd.device); + + mr = rxe_pool_get_key(&rxe->mr_pool, &lkey); + if (!mr) + return NULL; + + if (unlikely((mr->ibmr.lkey != lkey) || (mr->pd != pd) || + (access && !(access & mr->access)) || + (mr->state != RXE_MEM_STATE_VALID))) { + rxe_drop_ref(mr); + return NULL; + } + + return mr; +} + /* copy data in or out of a wqe, i.e. sg list * under the control of a dma descriptor */ @@ -409,7 +428,7 @@ int copy_data( } if (sge->length && (offset < sge->length)) { - mr = lookup_mr(pd, access, sge->lkey, lookup_local); + mr = lookup_mr(pd, access, sge->lkey); if (!mr) { err = -EINVAL; goto err1; @@ -434,8 +453,7 @@ int copy_data( } if (sge->length) { - mr = lookup_mr(pd, access, sge->lkey, - lookup_local); + mr = lookup_mr(pd, access, sge->lkey); if (!mr) { err = -EINVAL; goto err1; @@ -510,34 +528,6 @@ int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) return 0; } -/* (1) find the mr corresponding to lkey/rkey - * depending on lookup_type - * (2) verify that the (qp) pd matches the mr pd - * (3) verify that the mr can support the requested access - * (4) verify that mr state is valid - */ -struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, - enum lookup_type type) -{ - struct rxe_mr *mr; - struct rxe_dev *rxe = to_rdev(pd->ibpd.device); - - mr = rxe_pool_get_key(&rxe->mr_pool, &key); - if (!mr) - return NULL; - - if (unlikely((type == lookup_local && mr->lkey != key) || - (type == lookup_remote && mr->rkey != key) || - mr->pd != pd || - (access && !(access & mr->access)) || - mr->state != RXE_MEM_STATE_VALID)) { - rxe_drop_ref(mr); - mr = NULL; - } - - return mr; -} - int rxe_invalidate_mr(struct rxe_qp *qp, struct rxe_mr *mr) { /* TODO there are API rules being ignored here @@ -548,6 +538,34 @@ int rxe_invalidate_mr(struct rxe_qp *qp, struct rxe_mr *mr) return 0; } +int rxe_mr_check_access(struct rxe_qp *qp, struct rxe_mr *mr, + int access, u64 va, u32 resid) +{ + int ret; + struct rxe_pd *pd = to_rpd(mr->ibmr.pd); + + if (unlikely(mr->state != RXE_MEM_STATE_VALID)) { + pr_err("attempt to access a MR that is not in the valid state\n"); + return -EINVAL; + } + + /* C10-56 */ + if (unlikely(pd != qp->pd)) { + pr_err("attempt to access a MR with a different PD than the QP\n"); + return -EINVAL; + } + + /* C10-57 */ + if (unlikely(access && !(access & mr->access))) { + pr_err("attempt to access a MR without required access rights\n"); + return -EINVAL; + } + + ret = mr_check_range(mr, va, resid); + + return ret; +} + void rxe_mr_cleanup(struct rxe_pool_entry *arg) { struct rxe_mr *mr = container_of(arg, typeof(*mr), pelem); diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c index 51bc71c98654..50f3152d3b57 100644 --- a/drivers/infiniband/sw/rxe/rxe_mw.c +++ b/drivers/infiniband/sw/rxe/rxe_mw.c @@ -318,11 +318,6 @@ int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe) static int check_invalidate_mw(struct rxe_qp *qp, struct rxe_mw *mw) { - if (unlikely(mw->state != RXE_MEM_STATE_VALID)) { - pr_err_once("attempt to invalidate a MW that is not valid\n"); - return -EINVAL; - } - /* o10-37.2.26 */ if (unlikely(mw->ibmw.type == IB_MW_TYPE_1)) { pr_err_once("attempt to invalidate a type 1 MW\n"); @@ -336,9 +331,11 @@ static void do_invalidate_mw(struct rxe_mw *mw) { mw->qp = NULL; - rxe_drop_ref(mw->mr); - atomic_dec(&mw->mr->num_mw); - mw->mr = NULL; + if (mw->mr) { + atomic_dec(&mw->mr->num_mw); + mw->mr = NULL; + rxe_drop_ref(mw->mr); + } mw->access = 0; mw->addr = 0; @@ -364,6 +361,50 @@ int rxe_invalidate_mw(struct rxe_qp *qp, struct rxe_mw *mw) return ret; } +int rxe_mw_check_access(struct rxe_qp *qp, struct rxe_mw *mw, + int access, u64 va, u32 resid) +{ + struct rxe_pd *pd = to_rpd(mw->ibmw.pd); + + if (unlikely(mw->state != RXE_MEM_STATE_VALID)) { + pr_err_once("attempt to access a MW that is not valid\n"); + return -EINVAL; + } + + /* C10-76.2.1 */ + if (unlikely((mw->ibmw.type == IB_MW_TYPE_1) && (pd != qp->pd))) { + pr_err_once("attempt to access a type 1 MW with a different PD than the QP\n"); + return -EINVAL; + } + + /* o10-37.2.43 */ + if (unlikely((mw->ibmw.type == IB_MW_TYPE_2) && (mw->qp != qp))) { + pr_err_once("attempt to access a type 2 MW that is associated with a different QP\n"); + return -EINVAL; + } + + /* C10-77 */ + if (unlikely(access && !(access & mw->access))) { + pr_err_once("attempt to access a MW without sufficient access\n"); + return -EINVAL; + } + + if (mw->access & IB_ZERO_BASED) { + if (unlikely((va + resid) > mw->length)) { + pr_err_once("attempt to access a ZB MW out of bounds\n"); + return -EINVAL; + } + } else { + if (unlikely((va < mw->addr) || + ((va + resid) > (mw->addr + mw->length)))) { + pr_err_once("attempt to access a VA MW out of bounds\n"); + return -EINVAL; + } + } + + return 0; +} + void rxe_mw_cleanup(struct rxe_pool_entry *arg) { struct rxe_mw *mw = container_of(arg, typeof(*mw), pelem); diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 1944d2bbc43b..e4de0f4a3d69 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -645,7 +645,6 @@ int rxe_requester(void *arg) case IB_WR_BIND_MW: ret = rxe_bind_mw(qp, wqe); if (ret) { - wqe->state = wqe_state_done; wqe->status = IB_WC_MW_BIND_ERR; goto err; } @@ -653,6 +652,7 @@ int rxe_requester(void *arg) default: pr_err_once("unexpected LOCAL WR opcode = %d\n", wqe->wr.opcode); + wqe->status = IB_WC_LOC_QP_OP_ERR; goto err; } @@ -698,13 +698,7 @@ int rxe_requester(void *arg) payload = (mask & RXE_WRITE_OR_SEND) ? wqe->dma.resid : 0; if (payload > mtu) { if (qp_type(qp) == IB_QPT_UD) { - /* C10-93.1.1: If the total sum of all the buffer lengths specified for a - * UD message exceeds the MTU of the port as returned by QueryHCA, the CI - * shall not emit any packets for this message. Further, the CI shall not - * generate an error due to this condition. - */ - - /* fake a successful UD send */ + /* C10-93.1.1: fake a successful UD send */ wqe->first_psn = qp->req.psn; wqe->last_psn = qp->req.psn; qp->req.psn = (qp->req.psn + 1) & BTH_PSN_MASK; @@ -769,6 +763,8 @@ int rxe_requester(void *arg) * to be called again */ wqe->state = wqe_state_error; + qp->req.wqe_index = next_index(qp->sq.queue, + qp->req.wqe_index); __rxe_do_task(&qp->comp.task); ret = -EAGAIN; goto done; @@ -784,8 +780,7 @@ int rxe_requester(void *arg) again: /* we come here if we are done with the current wqe but want to - * get called again. Mostly we loop back to next wqe so should - * be all one way or the other + * get called again. */ ret = 0; goto done; diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 4d688a50d301..91595c23bc16 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -393,6 +393,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp, struct rxe_pkt_info *pkt) { struct rxe_mr *mr = NULL; + struct rxe_mw *mw = NULL; + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); u64 va; u32 rkey; u32 resid; @@ -400,6 +402,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp, int mtu = qp->mtu; enum resp_states state; int access; + unsigned long flags; if (pkt->mask & (RXE_READ_MASK | RXE_WRITE_MASK)) { if (pkt->mask & RXE_RETH_MASK) { @@ -407,6 +410,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp, qp->resp.rkey = reth_rkey(pkt); qp->resp.resid = reth_len(pkt); qp->resp.length = reth_len(pkt); + qp->resp.offset = 0; } access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ : IB_ACCESS_REMOTE_WRITE; @@ -414,6 +418,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp, qp->resp.va = atmeth_va(pkt); qp->resp.rkey = atmeth_rkey(pkt); qp->resp.resid = sizeof(u64); + qp->resp.offset = 0; access = IB_ACCESS_REMOTE_ATOMIC; } else { return RESPST_EXECUTE; @@ -431,20 +436,46 @@ static enum resp_states check_rkey(struct rxe_qp *qp, resid = qp->resp.resid; pktlen = payload_size(pkt); - mr = lookup_mr(qp->pd, access, rkey, lookup_remote); - if (!mr) { - state = RESPST_ERR_RKEY_VIOLATION; - goto err; - } + /* check rkey on each packet because someone could + * have invalidated, deallocated or unregistered it + * since the last packet + */ + if (rkey & IS_MW) { + mw = rxe_pool_get_key(&rxe->mw_pool, &rkey); + if (!mw) { + pr_err_once("no MW found with rkey = 0x%08x\n", rkey); + state = RESPST_ERR_RKEY_VIOLATION; + goto err; + } - if (unlikely(mr->state == RXE_MEM_STATE_FREE)) { - state = RESPST_ERR_RKEY_VIOLATION; - goto err; - } + spin_lock_irqsave(&mw->lock, flags); + if (rxe_mw_check_access(qp, mw, access, va, resid)) { + spin_unlock_irqrestore(&mw->lock, flags); + rxe_drop_ref(mw); + state = RESPST_ERR_RKEY_VIOLATION; + goto err; + } - if (mr_check_range(mr, va, resid)) { - state = RESPST_ERR_RKEY_VIOLATION; - goto err; + mr = mw->mr; + rxe_add_ref(mr); + + if (mw->access & IB_ZERO_BASED) + qp->resp.offset = mw->addr; + + spin_unlock_irqrestore(&mw->lock, flags); + rxe_drop_ref(mw); + } else { + mr = rxe_pool_get_key(&rxe->mr_pool, &rkey); + if (!mr || (mr->rkey != rkey)) { + pr_err_once("no MR found with rkey = 0x%08x\n", rkey); + state = RESPST_ERR_RKEY_VIOLATION; + goto err; + } + + if (rxe_mr_check_access(qp, mr, access, va, resid)) { + state = RESPST_ERR_RKEY_VIOLATION; + goto err; + } } if (pkt->mask & RXE_WRITE_MASK) { @@ -500,8 +531,8 @@ static enum resp_states write_data_in(struct rxe_qp *qp, int err; int data_len = payload_size(pkt); - err = rxe_mr_copy(qp->resp.mr, qp->resp.va, payload_addr(pkt), - data_len, to_mr_obj, NULL); + err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset, + payload_addr(pkt), data_len, to_mr_obj, NULL); if (err) { rc = RESPST_ERR_RKEY_VIOLATION; goto out; @@ -520,7 +551,6 @@ static DEFINE_SPINLOCK(atomic_ops_lock); static enum resp_states process_atomic(struct rxe_qp *qp, struct rxe_pkt_info *pkt) { - u64 iova = atmeth_va(pkt); u64 *vaddr; enum resp_states ret; struct rxe_mr *mr = qp->resp.mr; @@ -530,7 +560,7 @@ static enum resp_states process_atomic(struct rxe_qp *qp, goto out; } - vaddr = iova_to_vaddr(mr, iova, sizeof(u64)); + vaddr = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64)); /* check vaddr is 8 bytes aligned. */ if (!vaddr || (uintptr_t)vaddr & 7) { @@ -655,8 +685,10 @@ static enum resp_states read_reply(struct rxe_qp *qp, res->type = RXE_READ_MASK; res->replay = 0; - res->read.va = qp->resp.va; - res->read.va_org = qp->resp.va; + res->read.va = qp->resp.va + + qp->resp.offset; + res->read.va_org = qp->resp.va + + qp->resp.offset; res->first_psn = req_pkt->psn; @@ -1336,7 +1368,10 @@ int rxe_responder(void *arg) /* Class C */ do_class_ac_error(qp, AETH_NAK_REM_ACC_ERR, IB_WC_REM_ACCESS_ERR); - state = RESPST_COMPLETE; + if (qp->resp.wqe) + state = RESPST_COMPLETE; + else + state = RESPST_ACKNOWLEDGE; } else { qp->resp.drop_msg = 1; if (qp->srq) { @@ -1364,7 +1399,10 @@ int rxe_responder(void *arg) /* Class C */ do_class_ac_error(qp, AETH_NAK_INVALID_REQ, IB_WC_REM_INV_REQ_ERR); - state = RESPST_COMPLETE; + if (qp->resp.wqe) + state = RESPST_COMPLETE; + else + state = RESPST_ACKNOWLEDGE; } else if (qp->srq) { /* UC/UD - class E */ qp->resp.status = IB_WC_REM_INV_REQ_ERR; @@ -1380,7 +1418,10 @@ int rxe_responder(void *arg) /* All, Class A. */ do_class_ac_error(qp, AETH_NAK_REM_OP_ERR, IB_WC_LOC_QP_OP_ERR); - state = RESPST_COMPLETE; + if (qp->resp.wqe) + state = RESPST_COMPLETE; + else + state = RESPST_ACKNOWLEDGE; break; case RESPST_ERR_CQ_OVERFLOW: diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 2fb5581edd8a..b24a9a0878c2 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -183,6 +183,7 @@ struct rxe_resp_info { /* RDMA read / atomic only */ u64 va; + u64 offset; struct rxe_mr *mr; u32 resid; u32 rkey; From patchwork Mon Sep 21 20:03:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F04A139A for ; Mon, 21 Sep 2020 20:04:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4DDE721BE5 for ; Mon, 21 Sep 2020 20:04:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Faq/QW59" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726886AbgIUUEO (ORCPT ); Mon, 21 Sep 2020 16:04:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEO (ORCPT ); Mon, 21 Sep 2020 16:04:14 -0400 Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 128A3C061755 for ; Mon, 21 Sep 2020 13:04:14 -0700 (PDT) Received: by mail-ot1-x332.google.com with SMTP id m13so9023365otl.9 for ; Mon, 21 Sep 2020 13:04:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0aLyHV/ABoDUMEneY5/wFXBDJz3SZ7nw1wdXFy4o2xI=; b=Faq/QW595ZVhZvDugqrgcF1WxIZFtKzc9ND4MB0VMf6ZjScOJmoSI/EG5/GkNzL8Dp CmJDmcuT+Z16/ERIgOHSIh2elkmz3cjWAjWpJOULqLoU3uR5uBe1AaJWhclIPD6PPiz5 LinIn0q7JuytXs3x8ZDAZsY3PT9mz0x5zWjyDTFj3Vtu62obS7dufI6WhJZKjEojIxz/ rjxMALFCPV8I1mLY0ZXmstu8hCAB3fRA0OVheLLUQgqKFQ5wn136GXokXBBuAPqY7+YO na+JrxwvpLYB4VPQEfrGShNX30aOPsOP1daxRw/Fcl7dVVrLLq8Wo7+3DLmSnH0eAuNi OMhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0aLyHV/ABoDUMEneY5/wFXBDJz3SZ7nw1wdXFy4o2xI=; b=mhBVd7uaPp/LyTkqP3jD7NZO9wi7eeaYH+mS5LIzkJ17vOn8EFL6qQnF26SX0qteXX /tXXYyOLd39ImWdK6R4Eyc1wZ1qievnVa28tfUC/8t/sH2gjUG1ziOCxSwTVHeYPznw5 +L5HmQVH6CAzRkPwjRizDPtHKhF5+2Sk27f1Pje6s5u+/Cz0Ox7IYpZejXqKrzhbdimv i/V5kzeRfTfQy/oM9fqLi5RD7ghvQhFJ2iPohrl0+8CYJKACTG2AU25xGFt1rwn9aNRc rpCq5nVigntDa3cP5FQ58fieGvZvZv85/USfWhnhCMCQCScIcrzUYd9hJuIx+2ri5a3e BPYw== X-Gm-Message-State: AOAM533FH8yIiHPRW1LR2tmxjL5X+7vacGN92VkADQozQ0OnAFgO94Eh SXCvzd5fHwHZZ0O79zH2tHY= X-Google-Smtp-Source: ABdhPJyufV90nNM+Wtn9xq6wV8SdROw6J8lWNqcyiCAPOZASDflkC5+fY7CGS2f8fFKJ2m2DpkBISQ== X-Received: by 2002:a9d:2d81:: with SMTP id g1mr690893otb.352.1600718653457; Mon, 21 Sep 2020 13:04:13 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id i5sm6717154otj.19.2020.09.21.13.04.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:13 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 07/12] rdma_rxe: Add support for ibv_query_device_ex Date: Mon, 21 Sep 2020 15:03:51 -0500 Message-Id: <20200921200356.8627-8-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Add code to initialize new struct members in ib_device_attr as place holders. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 101 ++++++++++++++++++-------- drivers/infiniband/sw/rxe/rxe_verbs.c | 7 +- 2 files changed, 75 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index fab291245366..8d2be78e72ef 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -38,40 +38,77 @@ void rxe_dealloc(struct ib_device *ib_dev) /* initialize rxe device parameters */ static void rxe_init_device_param(struct rxe_dev *rxe) { - rxe->max_inline_data = RXE_MAX_INLINE_DATA; - - rxe->attr.vendor_id = RXE_VENDOR_ID; - rxe->attr.max_mr_size = RXE_MAX_MR_SIZE; - rxe->attr.page_size_cap = RXE_PAGE_SIZE_CAP; - rxe->attr.max_qp = RXE_MAX_QP; - rxe->attr.max_qp_wr = RXE_MAX_QP_WR; - rxe->attr.device_cap_flags = RXE_DEVICE_CAP_FLAGS; - rxe->attr.max_send_sge = RXE_MAX_SGE; - rxe->attr.max_recv_sge = RXE_MAX_SGE; - rxe->attr.max_sge_rd = RXE_MAX_SGE_RD; - rxe->attr.max_cq = RXE_MAX_CQ; - rxe->attr.max_cqe = (1 << RXE_MAX_LOG_CQE) - 1; - rxe->attr.max_mr = RXE_MAX_MR; - rxe->attr.max_mw = RXE_MAX_MW; - rxe->attr.max_pd = RXE_MAX_PD; - rxe->attr.max_qp_rd_atom = RXE_MAX_QP_RD_ATOM; - rxe->attr.max_res_rd_atom = RXE_MAX_RES_RD_ATOM; - rxe->attr.max_qp_init_rd_atom = RXE_MAX_QP_INIT_RD_ATOM; - rxe->attr.atomic_cap = IB_ATOMIC_HCA; - rxe->attr.max_mcast_grp = RXE_MAX_MCAST_GRP; - rxe->attr.max_mcast_qp_attach = RXE_MAX_MCAST_QP_ATTACH; - rxe->attr.max_total_mcast_qp_attach = RXE_MAX_TOT_MCAST_QP_ATTACH; - rxe->attr.max_ah = RXE_MAX_AH; - rxe->attr.max_srq = RXE_MAX_SRQ; - rxe->attr.max_srq_wr = RXE_MAX_SRQ_WR; - rxe->attr.max_srq_sge = RXE_MAX_SRQ_SGE; - rxe->attr.max_fast_reg_page_list_len = RXE_MAX_FMR_PAGE_LIST_LEN; - rxe->attr.max_pkeys = RXE_MAX_PKEYS; - rxe->attr.local_ca_ack_delay = RXE_LOCAL_CA_ACK_DELAY; - addrconf_addr_eui48((unsigned char *)&rxe->attr.sys_image_guid, - rxe->ndev->dev_addr); + struct ib_device_attr *a = &rxe->attr; + rxe->max_inline_data = RXE_MAX_INLINE_DATA; rxe->max_ucontext = RXE_MAX_UCONTEXT; + + a->atomic_cap = IB_ATOMIC_HCA; + a->cq_caps.max_cq_moderation_count = 0; + a->cq_caps.max_cq_moderation_period = 0; + a->device_cap_flags = RXE_DEVICE_CAP_FLAGS; + a->fw_ver = 0; + a->hca_core_clock = 0; + a->hw_ver = 0; + a->local_ca_ack_delay = RXE_LOCAL_CA_ACK_DELAY; + a->masked_atomic_cap = 0; + a->max_ah = RXE_MAX_AH; + a->max_cqe = (1 << RXE_MAX_LOG_CQE) - 1; + a->max_cq = RXE_MAX_CQ; + a->max_dm_size = 0; + a->max_ee_init_rd_atom = 0; + a->max_ee = 0; + a->max_ee_rd_atom = 0; + a->max_fast_reg_page_list_len = RXE_MAX_FMR_PAGE_LIST_LEN; + a->max_mcast_grp = RXE_MAX_MCAST_GRP; + a->max_mcast_qp_attach = RXE_MAX_MCAST_QP_ATTACH; + a->max_mr = RXE_MAX_MR; + a->max_mr_size = RXE_MAX_MR_SIZE; + a->max_mw = RXE_MAX_MW; + a->max_pd = RXE_MAX_PD; + a->max_pi_fast_reg_page_list_len = 0; + a->max_pkeys = RXE_MAX_PKEYS; + a->max_qp_init_rd_atom = RXE_MAX_QP_INIT_RD_ATOM; + a->max_qp_rd_atom = RXE_MAX_QP_RD_ATOM; + a->max_qp = RXE_MAX_QP; + a->max_qp_wr = RXE_MAX_QP_WR; + a->max_raw_ethy_qp = 0; + a->max_raw_ipv6_qp = 0; + a->max_rdd = 0; + a->max_recv_sge = RXE_MAX_SGE; + a->max_res_rd_atom = RXE_MAX_RES_RD_ATOM; + a->max_send_sge = RXE_MAX_SGE; + a->max_sge_rd = RXE_MAX_SGE_RD; + a->max_sgl_rd = 0; + a->max_srq = RXE_MAX_SRQ; + a->max_srq_sge = RXE_MAX_SRQ_SGE; + a->max_srq_wr = RXE_MAX_SRQ_WR; + a->max_total_mcast_qp_attach = RXE_MAX_TOT_MCAST_QP_ATTACH; + a->max_wq_type_rq = 0; + a->odp_caps.general_caps = 0; + a->odp_caps.per_transport_caps.rc_odp_caps = 0; + a->odp_caps.per_transport_caps.uc_odp_caps = 0; + a->odp_caps.per_transport_caps.ud_odp_caps = 0; + a->odp_caps.per_transport_caps.xrc_odp_caps = 0; + a->page_size_cap = RXE_PAGE_SIZE_CAP; + a->raw_packet_caps = 0; + a->rss_caps.supported_qpts = 0; + a->rss_caps.max_rwq_indirection_tables = 0; + a->rss_caps.max_rwq_indirection_table_size = 0; + a->sig_guard_cap = 0; + a->sig_prot_cap = 0; + a->sys_image_guid = 0; + a->timestamp_mask = 0; + a->tm_caps.max_rndv_hdr_size = 0; + a->tm_caps.max_num_tags = 0; + a->tm_caps.flags = 0; + a->tm_caps.max_ops = 0; + a->tm_caps.max_sge = 0; + a->vendor_id = RXE_VENDOR_ID; + a->vendor_part_id = 0; + + addrconf_addr_eui48((unsigned char *)&a->sys_image_guid, + rxe->ndev->dev_addr); } /* initialize port attributes */ diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 21582507ed32..a77f2e0ef68f 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1149,7 +1149,8 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) dma_coerce_mask_and_coherent(&dev->dev, dma_get_required_mask(&dev->dev)); - dev->uverbs_cmd_mask = BIT_ULL(IB_USER_VERBS_CMD_GET_CONTEXT) + dev->uverbs_cmd_mask = + BIT_ULL(IB_USER_VERBS_CMD_GET_CONTEXT) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | BIT_ULL(IB_USER_VERBS_CMD_QUERY_DEVICE) | BIT_ULL(IB_USER_VERBS_CMD_QUERY_PORT) @@ -1184,6 +1185,10 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) | BIT_ULL(IB_USER_VERBS_CMD_DEALLOC_MW) ; + dev->uverbs_ex_cmd_mask = + BIT_ULL(IB_USER_VERBS_EX_CMD_QUERY_DEVICE) + ; + ib_set_device_ops(dev, &rxe_dev_ops); err = ib_device_set_netdev(&rxe->ib_dev, rxe->ndev, 1); if (err) From patchwork Mon Sep 21 20:03:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 00B1359D for ; Mon, 21 Sep 2020 20:04:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D4A2B21789 for ; Mon, 21 Sep 2020 20:04:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Rejvmj6j" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727014AbgIUUEP (ORCPT ); Mon, 21 Sep 2020 16:04:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEP (ORCPT ); Mon, 21 Sep 2020 16:04:15 -0400 Received: from mail-oo1-xc42.google.com (mail-oo1-xc42.google.com [IPv6:2607:f8b0:4864:20::c42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1582C061755 for ; Mon, 21 Sep 2020 13:04:14 -0700 (PDT) Received: by mail-oo1-xc42.google.com with SMTP id h9so3570880ooo.10 for ; Mon, 21 Sep 2020 13:04:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LykXjfSYW0X/ICRvJluBO74d7icLwRIQMOvc3amphfo=; b=Rejvmj6jMo2dSf0om+Vlmr6T4ZOL2Zjbg23SSlwCQmO1dUAV/koE4s5htD15mTiY+o iPUg3Nmt1Gk+S/TuR47xAKf2I0UkvVriBXkIG7OYZVAm/5O3u2pJm0vUU6bpCIedcRfp e2SBbZSya7nlMETMw721wnXNPXu/6yCENs++cmtp36gFZ3PcXdNhDkfbgl3x/fF3xIog dtTbYK5elMbZ4lguYuvKHvCaaasLG+J0B8NCfxMKOPCtiIfEHCwQAT1+hmI7ZCiJPDUs U9ReiDhuuhDrjN4sWDkmCmCnlzRuv9Ho26RwhG4JE+DTU1gWlxU7Z5iqEDT6kTF3YeY0 Et4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LykXjfSYW0X/ICRvJluBO74d7icLwRIQMOvc3amphfo=; b=c2QDO30pPLKSYhNs7udVdKsOVgV7xJaQ6fA3X2J3892G32peIERURsh03FfL83/ww8 NouA73eGVLbFpUaNNxolRIfag/uWj1HoWXISyLxzDBKCunRC+4K7RFrHmeB4xDbUyc98 dvHHH7+VHmomy+Xo5m3ny720NwIyz1asceM3GyA2CMtguTlGcHZ+cXGjEqsqyqc8k22Z m2M5xT4mxisrIAxsqn3wh/pfM7LGI3b5vDmNnsDfeLUu4WsZ+VcfKrfYu1JGiC245hKp v5+7y62Xmcos+hQdAdKH+IEk3tn1em6tGmoSY+OjHyCjoj7tSaHW9/vSwtXTpxG2Urs1 mAYg== X-Gm-Message-State: AOAM5319d+VEJkuSRltnQhwH/Z/m1+C34KH3YexydMJQk0dD51WkYhWp nmbixjkurOpn2avQpIEDP3g= X-Google-Smtp-Source: ABdhPJxDXnO5I7V40NKmV3rDvOekRt1DzfVY8KsLv+7L7S3ARC/U7VjkHk5vYV6zTcrbMiCpgPkXLg== X-Received: by 2002:a4a:de4b:: with SMTP id z11mr721689oot.34.1600718654332; Mon, 21 Sep 2020 13:04:14 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id h35sm6432201otb.81.2020.09.21.13.04.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:13 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 08/12] rdma_rxe: Add support for extended CQ operations Date: Mon, 21 Sep 2020 15:03:52 -0500 Message-Id: <20200921200356.8627-9-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Add private members to user/kernel wc struct to carry extensions used by cq_ex. Add timestamps on completion. Add ignore overrun support. Add commands to user API bitmasks. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_comp.c | 7 ++++- drivers/infiniband/sw/rxe/rxe_resp.c | 8 +++++- drivers/infiniband/sw/rxe/rxe_verbs.c | 10 ++++--- drivers/infiniband/sw/rxe/rxe_verbs.h | 3 ++- include/uapi/rdma/rdma_user_rxe.h | 38 ++++++++++++++++++++++----- 5 files changed, 52 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index 8b81d3b24a8a..72745ffcf118 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -390,7 +390,7 @@ static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe, wc->byte_len = wqe->dma.length; wc->qp = &qp->ibqp; } else { - struct ib_uverbs_wc *uwc = &cqe->uibwc; + struct rxe_uverbs_wc *uwc = &cqe->ruwc; uwc->wr_id = wqe->wr.wr_id; uwc->status = wqe->status; @@ -400,6 +400,11 @@ static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe, uwc->wc_flags = IB_WC_WITH_IMM; uwc->byte_len = wqe->dma.length; uwc->qp_num = qp->ibqp.qp_num; + if (qp->scq->flags & + IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION) { + uwc->timestamp = (u64)ktime_get(); + uwc->realtime = (u64)ktime_get_real(); + } } } diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 91595c23bc16..846baeec61be 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -887,7 +887,7 @@ static enum resp_states do_complete(struct rxe_qp *qp, { struct rxe_cqe cqe; struct ib_wc *wc = &cqe.ibwc; - struct ib_uverbs_wc *uwc = &cqe.uibwc; + struct rxe_uverbs_wc *uwc = &cqe.ruwc; struct rxe_recv_wqe *wqe = qp->resp.wqe; struct rxe_dev *rxe = to_rdev(qp->ibqp.device); @@ -943,6 +943,12 @@ static enum resp_states do_complete(struct rxe_qp *qp, uwc->src_qp = deth_sqp(pkt); uwc->port_num = qp->attr.port_num; + + if (qp->rcq->flags & + IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION) { + uwc->timestamp = (u64)ktime_get(); + uwc->realtime = (u64)ktime_get_real(); + } } else { struct sk_buff *skb = PKT_TO_SKB(pkt); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index a77f2e0ef68f..594d8353600a 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -749,7 +749,8 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, return err; } -static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, +static int rxe_create_cq(struct ib_cq *ibcq, + const struct ib_cq_init_attr *attr, struct ib_udata *udata) { int err; @@ -764,13 +765,12 @@ static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, uresp = udata->outbuf; } - if (attr->flags) - return -EINVAL; - err = rxe_cq_chk_attr(rxe, NULL, attr->cqe, attr->comp_vector); if (err) return err; + cq->flags = attr->flags; + err = rxe_cq_from_init(rxe, cq, attr->cqe, attr->comp_vector, udata, uresp); if (err) @@ -1187,6 +1187,8 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) dev->uverbs_ex_cmd_mask = BIT_ULL(IB_USER_VERBS_EX_CMD_QUERY_DEVICE) + | BIT_ULL(IB_USER_VERBS_EX_CMD_CREATE_CQ) + | BIT_ULL(IB_USER_VERBS_EX_CMD_MODIFY_CQ) ; ib_set_device_ops(dev, &rxe_dev_ops); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index b24a9a0878c2..784ae4102265 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -53,7 +53,7 @@ struct rxe_ah { struct rxe_cqe { union { struct ib_wc ibwc; - struct ib_uverbs_wc uibwc; + struct rxe_uverbs_wc ruwc; }; }; @@ -62,6 +62,7 @@ struct rxe_cq { struct rxe_pool_entry pelem; struct rxe_queue *queue; spinlock_t cq_lock; + u32 flags; u8 notify; bool is_dying; int is_user; diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h index d49125682359..95352e050ab4 100644 --- a/include/uapi/rdma/rdma_user_rxe.h +++ b/include/uapi/rdma/rdma_user_rxe.h @@ -98,29 +98,27 @@ struct rxe_send_wr { __aligned_u64 length; union { __u32 mr_index; - __aligned_u64 reserved1; + __aligned_u64 pad1; }; union { __u32 mw_index; - __aligned_u64 reserved2; + __aligned_u64 pad2; }; __u32 rkey; __u32 access; __u32 flags; } umw; - /* The following are only used by the kernel - * and are not part of the uapi - */ + /* below are only used by the kernel */ struct { __aligned_u64 addr; __aligned_u64 length; union { struct ib_mr *mr; - __aligned_u64 reserved1; + __aligned_u64 reserved1; }; union { struct ib_mw *mw; - __aligned_u64 reserved2; + __aligned_u64 reserved2; }; __u32 rkey; __u32 access; @@ -184,6 +182,32 @@ struct rxe_recv_wqe { struct rxe_dma_info dma; }; +struct rxe_uverbs_wc { + /* keep these the same as ib_uverbs_wc */ + __aligned_u64 wr_id; + __u32 status; + __u32 opcode; + __u32 vendor_err; + __u32 byte_len; + union { + __be32 imm_data; + __u32 invalidate_rkey; + } ex; + __u32 qp_num; + __u32 src_qp; + __u32 wc_flags; + __u16 pkey_index; + __u16 slid; + __u8 sl; + __u8 dlid_path_bits; + __u8 port_num; + __u8 reserved; + + /* any extras go here */ + __aligned_u64 timestamp; + __aligned_u64 realtime; +}; + struct rxe_create_cq_resp { struct mminfo mi; }; From patchwork Mon Sep 21 20:03:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 12BA159D for ; Mon, 21 Sep 2020 20:04:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC2A4218AC for ; Mon, 21 Sep 2020 20:04:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Whw81BzM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727030AbgIUUEQ (ORCPT ); Mon, 21 Sep 2020 16:04:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUEP (ORCPT ); Mon, 21 Sep 2020 16:04:15 -0400 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2F70C061755 for ; Mon, 21 Sep 2020 13:04:15 -0700 (PDT) Received: by mail-ot1-x342.google.com with SMTP id 95so6232375ota.13 for ; Mon, 21 Sep 2020 13:04:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=M7BIC7Z7C3jI/XKbqemRcZf71vYA9NpYcp+JzUeoZek=; b=Whw81BzMdTA3ams+Ctj55j5YUR5F/v3gQsF4NZlReU0qVfbROjIbCWNiJmvefmbm90 lOtSX9g9G6ks1Gil/7wHVN4ph8Y2zzo5JHxZMsbw26xOPvZlzeYSD+xDtawxRSOEvBxs M6bL2SLIQLA/0Sdp/nm5oGdwDThpTaFVPTTkuMBkXW+6Zsr1lrrnvr+oU4CmP+mpe/Ay Q3eFoUKbtJMhiYb/qiKXFFVAu2rcwkuRkTT8XzHsypWAG91wMNAwQrzYrOVOCXDBraVP ZRRQZQFfRII5MUgYLZjctFRrxCV7+JbwThahAySD7CBw8nVVPRehLXJ8RcGDhISc2qYa QiLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=M7BIC7Z7C3jI/XKbqemRcZf71vYA9NpYcp+JzUeoZek=; b=hFcq0CKNm3wqJ8JjSHM5JYElWnQovAFh3hqhvQ+rKoF6Pf59iehnm9ZB3kfFMrg7Mb 22/GzkTGUhNfZs/yM4VALKCh9EI8em8E1KcHaT1HUHf9GIpnhk7cngVEHk0L1zaiIW9k 1iOF/dSLYaly21oifMAeqAZ4xiJVwH2h+S/o3vkB2WITls37IY5zTqgn3VZBgf52Dx4N q1lVGhBZnL4ojyjbPhbWdNMyd+9Pfq5aSyTzNSP2zPvw9El2WkAWAUGvGF468brUzZ55 Atqp8Moo1XNEC3o89OSb9BN26AYVULTKEtNA192LQ7N0sYYPkWgC+26dcVacMrneLy/p 5tgA== X-Gm-Message-State: AOAM531YB/7Vuz/Fq3axztdrC3xC7cnTc2dygGn/tNFA0Ou1LmEUF4Hf uWTymwD5/0+T4YVbwlaNNyA= X-Google-Smtp-Source: ABdhPJzqR8d3kbFeoBWh/V8UywO0FdwR3kmfDwHIGaELktsff+NdT5sxR1HgmTZcrnunolrHMt7ytA== X-Received: by 2002:a05:6830:1be9:: with SMTP id k9mr386638otb.315.1600718655181; Mon, 21 Sep 2020 13:04:15 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id l3sm6384054oth.36.2020.09.21.13.04.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:14 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 09/12] rdma_rxe: Add support for extended QP operations Date: Mon, 21 Sep 2020 15:03:53 -0500 Message-Id: <20200921200356.8627-10-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Add bits to user api command bitmask. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_verbs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 594d8353600a..7849d8d72d4c 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1187,6 +1187,8 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) dev->uverbs_ex_cmd_mask = BIT_ULL(IB_USER_VERBS_EX_CMD_QUERY_DEVICE) + | BIT_ULL(IB_USER_VERBS_EX_CMD_CREATE_QP) + | BIT_ULL(IB_USER_VERBS_EX_CMD_MODIFY_QP) | BIT_ULL(IB_USER_VERBS_EX_CMD_CREATE_CQ) | BIT_ULL(IB_USER_VERBS_EX_CMD_MODIFY_CQ) ; From patchwork Mon Sep 21 20:03:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790847 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 426BF139A for ; Mon, 21 Sep 2020 20:04:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1DB8F218AC for ; Mon, 21 Sep 2020 20:04:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eoAEVg0o" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727062AbgIUUER (ORCPT ); Mon, 21 Sep 2020 16:04:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUER (ORCPT ); Mon, 21 Sep 2020 16:04:17 -0400 Received: from mail-oo1-xc42.google.com (mail-oo1-xc42.google.com [IPv6:2607:f8b0:4864:20::c42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E154AC061755 for ; Mon, 21 Sep 2020 13:04:16 -0700 (PDT) Received: by mail-oo1-xc42.google.com with SMTP id o20so3579411ook.1 for ; Mon, 21 Sep 2020 13:04:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WOle5EDgVolusyFyWQIjzJSZO/Rl2MDCZ7vDNXAZnjQ=; b=eoAEVg0oojqeZQW76EVk4yZ3IEUW3WPsPq+nZ6iQumbUPXso5Ur3jNOsicJAYFAYqE dylMGaa2NbOnpNqDgfKIUF/idNSiP7RoWeiRyMx7J9ogsvWQxcNsMS9qDVpIvqXEzlyt cNk1zCMf6LntErO6o1VV5Jw/xTW/KCvtefM/30B1CXIJYLSCPxYu5nLzuSVGwnOv9Y6a Dj4qYd7UeTi4a1dckK5lpDIDWfQ9FkFywDXVQmklrqRA/gJPf7Q6WoOUyuUWbakEBp1d 6uXHAWCRPqUdswoknO6wby8LzmZvMnCgjSg8vgrERpJiN2sY3wjlxnNqDLGSwMitgag+ julg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WOle5EDgVolusyFyWQIjzJSZO/Rl2MDCZ7vDNXAZnjQ=; b=sG/DCZZGvxMC6xNavJJed+zVJIkV3z3xcbPe/76F5wkGaRiI4ff1d5YAG/A8OdR6m7 ghn/atmWaNhdNrZw3IPgLDht6VtpiDuSYRBLLgrcVzf1Za0AuvwknVh0+3v33JItcbSK J2y0wwxqvSlfe4wwMkOuzJA2Ynd+cOjEaHGnStYH/ImvTK0nhLWYolZS093uKW1MSV/O 5gE7tSahJAOHHDPUW0kgii0PpVT+RWBeTmUDVuSGzX4ZvoI7naBo0jalS5u5ivBcW/d3 X7eCiV54bAJYWie8cyOeztxfgHU+CLOafiDp7aLa1xV5ul2NrsZ0HrE1hZFjordHEGV1 ZnoQ== X-Gm-Message-State: AOAM532y6/sh3WnIsa4KywlFsfONOZRhEVU9zJ31Fy3Onl8xI9LgE5Lq qCiM0KpZRQ+CqUUuLyFBTGs= X-Google-Smtp-Source: ABdhPJwsPz/qaHp2GvI+DgED24mFf5cE9LAuqTjls+txo76qAIdSvonSNPWbwbhAi70sXjl4+GBZ3g== X-Received: by 2002:a4a:e08a:: with SMTP id w10mr731477oos.18.1600718656089; Mon, 21 Sep 2020 13:04:16 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id t9sm7654394ood.30.2020.09.21.13.04.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:15 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 10/12] rdma_rxe: Fix pool related bugs Date: Mon, 21 Sep 2020 15:03:54 -0500 Message-Id: <20200921200356.8627-11-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This patch does following: - Be more careful about type checking in pool APIs. Originally each object used in rxe had a pool_entry struct as the first member of it's struct. Since then about half have it first and the other half have it later in the struct. The pool APIs used void * as a hack to avoid type checking. This patch changes the APIs so the pool_entry struct is position independent. When used a parameter a macro is used to convert the object pointer to the entry pointer. The offset of the entry is stored for each pool and used to convert the entry address to the object address. - Provide locked and unlocked versions of the APIs. The current code uses a lock in each API to allow atomic updates to the pools. But some uses fail to work because they combined multiple pool operations and expect them to be atomic. As an example doing a lookup to see if a matching object exists for a multicast group followed by a create of a new group if it does not. By letting the caller take the pool lock and then combine unlocked operations this problem is fixed. - Replace calls to pool APIs with the typesafe versions. - Replaced a few calls to pr_warn with pr_err_once. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_cq.c | 12 +- drivers/infiniband/sw/rxe/rxe_mcast.c | 4 +- drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_mw.c | 18 ++- drivers/infiniband/sw/rxe/rxe_pool.c | 216 ++++++++++++++++++-------- drivers/infiniband/sw/rxe/rxe_pool.h | 75 ++++++--- drivers/infiniband/sw/rxe/rxe_recv.c | 4 +- drivers/infiniband/sw/rxe/rxe_req.c | 4 +- drivers/infiniband/sw/rxe/rxe_resp.c | 8 +- drivers/infiniband/sw/rxe/rxe_verbs.c | 10 +- 10 files changed, 239 insertions(+), 114 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c index 43394c3f29d4..68b0753ad63f 100644 --- a/drivers/infiniband/sw/rxe/rxe_cq.c +++ b/drivers/infiniband/sw/rxe/rxe_cq.c @@ -14,21 +14,21 @@ int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq, int count; if (cqe <= 0) { - pr_warn("cqe(%d) <= 0\n", cqe); + pr_err_once("%s: cqe(%d) <= 0\n", __func__, cqe); goto err1; } if (cqe > rxe->attr.max_cqe) { - pr_warn("cqe(%d) > max_cqe(%d)\n", - cqe, rxe->attr.max_cqe); + pr_err_once("%s: cqe(%d) > max_cqe(%d)\n", + __func__, cqe, rxe->attr.max_cqe); goto err1; } if (cq) { count = queue_count(cq->queue); if (cqe < count) { - pr_warn("cqe(%d) < current # elements in queue (%d)", - cqe, count); + pr_err_once("%s: cqe(%d) < current # elements in queue (%d)", + __func__, cqe, count); goto err1; } } @@ -63,7 +63,7 @@ int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe, cq->queue = rxe_queue_init(rxe, &cqe, sizeof(struct rxe_cqe)); if (!cq->queue) { - pr_warn("unable to create cq\n"); + pr_err_once("%s: unable to create cq\n", __func__); return -ENOMEM; } diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index c02315aed8d1..b09c6594045a 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -18,7 +18,7 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid, goto err1; } - grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid); + grp = rxe_get_key(&rxe->mc_grp_pool, mgid); if (grp) goto done; @@ -98,7 +98,7 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_mc_grp *grp; struct rxe_mc_elem *elem, *tmp; - grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid); + grp = rxe_get_key(&rxe->mc_grp_pool, mgid); if (!grp) goto err1; diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 3a7f814f4b81..0765a3ea1a75 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -385,7 +385,7 @@ static struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 lkey) struct rxe_mr *mr; struct rxe_dev *rxe = to_rdev(pd->ibpd.device); - mr = rxe_pool_get_key(&rxe->mr_pool, &lkey); + mr = rxe_get_key(&rxe->mr_pool, &lkey); if (!mr) return NULL; diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c index 50f3152d3b57..c58edf51a119 100644 --- a/drivers/infiniband/sw/rxe/rxe_mw.c +++ b/drivers/infiniband/sw/rxe/rxe_mw.c @@ -219,8 +219,10 @@ static int do_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe, u32 rkey; u32 new_rkey; struct rxe_mw *duplicate_mw; - struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + struct rxe_pool *pool = mw->pelem.pool; + unsigned long flags; + write_lock_irqsave(&pool->pool_lock, flags); /* key part of new rkey is provided by user for type 2 * and ibv_bind_mw() for type 1 MWs * there is a very rare chance that the new rkey will @@ -229,15 +231,17 @@ static int do_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe, */ rkey = mw->ibmw.rkey; new_rkey = (rkey & 0xffffff00) | (wqe->wr.wr.umw.rkey & 0x000000ff); - duplicate_mw = rxe_pool_get_key(&rxe->mw_pool, &new_rkey); + duplicate_mw = __get_key(pool, &new_rkey); if (duplicate_mw) { + write_unlock_irqrestore(&pool->pool_lock, flags); pr_err_once("new MW key is a duplicate, try another\n"); rxe_drop_ref(duplicate_mw); return -EINVAL; } - rxe_drop_key(mw); - rxe_add_key(mw, &new_rkey); + drop_key(mw); + add_key(mw, &new_rkey); + write_unlock_irqrestore(&pool->pool_lock, flags); mw->access = wqe->wr.wr.umw.access; mw->state = RXE_MEM_STATE_VALID; @@ -271,16 +275,14 @@ int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe) unsigned long flags; if (qp->is_user) { - mw = rxe_pool_get_index(&rxe->mw_pool, - wqe->wr.wr.umw.mw_index); + mw = rxe_get_index(&rxe->mw_pool, wqe->wr.wr.umw.mw_index); if (!mw) { pr_err_once("mw with index = %d not found\n", wqe->wr.wr.umw.mw_index); ret = -EINVAL; goto err1; } - mr = rxe_pool_get_index(&rxe->mr_pool, - wqe->wr.wr.umw.mr_index); + mr = rxe_get_index(&rxe->mr_pool, wqe->wr.wr.umw.mr_index); if (!mr && wqe->wr.wr.umw.length) { pr_err_once("mr with index = %d not found\n", wqe->wr.wr.umw.mr_index); diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index 4bcb19a7b918..1a079657e8d0 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -12,21 +12,25 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_UC] = { .name = "rxe-uc", .size = sizeof(struct rxe_ucontext), + .elem_offset = offsetof(struct rxe_ucontext, pelem), .flags = RXE_POOL_NO_ALLOC, }, [RXE_TYPE_PD] = { .name = "rxe-pd", .size = sizeof(struct rxe_pd), + .elem_offset = offsetof(struct rxe_pd, pelem), .flags = RXE_POOL_NO_ALLOC, }, [RXE_TYPE_AH] = { .name = "rxe-ah", .size = sizeof(struct rxe_ah), + .elem_offset = offsetof(struct rxe_ah, pelem), .flags = RXE_POOL_ATOMIC | RXE_POOL_NO_ALLOC, }, [RXE_TYPE_SRQ] = { .name = "rxe-srq", .size = sizeof(struct rxe_srq), + .elem_offset = offsetof(struct rxe_srq, pelem), .flags = RXE_POOL_INDEX | RXE_POOL_NO_ALLOC, .min_index = RXE_MIN_SRQ_INDEX, .max_index = RXE_MAX_SRQ_INDEX, @@ -34,6 +38,7 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_QP] = { .name = "rxe-qp", .size = sizeof(struct rxe_qp), + .elem_offset = offsetof(struct rxe_qp, pelem), .cleanup = rxe_qp_cleanup, .flags = RXE_POOL_INDEX, .min_index = RXE_MIN_QP_INDEX, @@ -42,12 +47,14 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_CQ] = { .name = "rxe-cq", .size = sizeof(struct rxe_cq), + .elem_offset = offsetof(struct rxe_cq, pelem), .flags = RXE_POOL_NO_ALLOC, .cleanup = rxe_cq_cleanup, }, [RXE_TYPE_MR] = { .name = "rxe-mr", .size = sizeof(struct rxe_mr), + .elem_offset = offsetof(struct rxe_mr, pelem), .cleanup = rxe_mr_cleanup, .flags = RXE_POOL_INDEX | RXE_POOL_KEY, @@ -59,6 +66,7 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_MW] = { .name = "rxe-mw", .size = sizeof(struct rxe_mw), + .elem_offset = offsetof(struct rxe_mw, pelem), .cleanup = rxe_mw_cleanup, .flags = RXE_POOL_INDEX | RXE_POOL_KEY, @@ -70,6 +78,7 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_MC_GRP] = { .name = "rxe-mc_grp", .size = sizeof(struct rxe_mc_grp), + .elem_offset = offsetof(struct rxe_mc_grp, pelem), .cleanup = rxe_mc_cleanup, .flags = RXE_POOL_KEY, .key_offset = offsetof(struct rxe_mc_grp, mgid), @@ -78,6 +87,7 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = { [RXE_TYPE_MC_ELEM] = { .name = "rxe-mc_elem", .size = sizeof(struct rxe_mc_elem), + .elem_offset = offsetof(struct rxe_mc_elem, pelem), .flags = RXE_POOL_ATOMIC, }, }; @@ -241,7 +251,8 @@ static int insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) elem = rb_entry(parent, struct rxe_pool_entry, key_node); cmp = memcmp((u8 *)elem + pool->key.key_offset, - (u8 *)new + pool->key.key_offset, pool->key.key_size); + (u8 *)new + pool->key.key_offset, + pool->key.key_size); if (cmp == 0) { pr_warn("key already exists!\n"); @@ -260,70 +271,91 @@ static int insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new) return 0; } -int rxe_add_key(void *arg, void *key) +/* add/drop index and key are called through macros */ +int __add_key(struct rxe_pool_entry *elem, void *key) { int ret; - struct rxe_pool_entry *elem = arg; struct rxe_pool *pool = elem->pool; - unsigned long flags; - write_lock_irqsave(&pool->pool_lock, flags); memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size); ret = insert_key(pool, elem); - write_unlock_irqrestore(&pool->pool_lock, flags); return ret; } -void rxe_drop_key(void *arg) +int __rxe_add_key(struct rxe_pool_entry *elem, void *key) { - struct rxe_pool_entry *elem = arg; + int ret; struct rxe_pool *pool = elem->pool; unsigned long flags; write_lock_irqsave(&pool->pool_lock, flags); - rb_erase(&elem->key_node, &pool->key.tree); + ret = __add_key(elem, key); write_unlock_irqrestore(&pool->pool_lock, flags); + + return ret; } -void rxe_add_index(void *arg) +void __drop_key(struct rxe_pool_entry *elem) +{ + struct rxe_pool *pool = elem->pool; + + rb_erase(&elem->key_node, &pool->key.tree); +} + + +void __rxe_drop_key(struct rxe_pool_entry *elem) { - struct rxe_pool_entry *elem = arg; struct rxe_pool *pool = elem->pool; unsigned long flags; write_lock_irqsave(&pool->pool_lock, flags); + __drop_key(elem); + write_unlock_irqrestore(&pool->pool_lock, flags); +} + +void __add_index(struct rxe_pool_entry *elem) +{ + struct rxe_pool *pool = elem->pool; + elem->index = alloc_index(pool); insert_index(pool, elem); - write_unlock_irqrestore(&pool->pool_lock, flags); } -void rxe_drop_index(void *arg) +void __rxe_add_index(struct rxe_pool_entry *elem) { - struct rxe_pool_entry *elem = arg; struct rxe_pool *pool = elem->pool; unsigned long flags; write_lock_irqsave(&pool->pool_lock, flags); + __add_index(elem); + write_unlock_irqrestore(&pool->pool_lock, flags); +} + +void __drop_index(struct rxe_pool_entry *elem) +{ + struct rxe_pool *pool = elem->pool; + clear_bit(elem->index - pool->index.min_index, pool->index.table); rb_erase(&elem->index_node, &pool->index.tree); - write_unlock_irqrestore(&pool->pool_lock, flags); } -void *rxe_alloc(struct rxe_pool *pool) +void __rxe_drop_index(struct rxe_pool_entry *elem) { - struct rxe_pool_entry *elem; + struct rxe_pool *pool = elem->pool; unsigned long flags; - might_sleep_if(!(pool->flags & RXE_POOL_ATOMIC)); + write_lock_irqsave(&pool->pool_lock, flags); + __drop_index(elem); + write_unlock_irqrestore(&pool->pool_lock, flags); +} + +int __add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem) +{ + if (pool->state != RXE_POOL_STATE_VALID) + return -EINVAL; - read_lock_irqsave(&pool->pool_lock, flags); - if (pool->state != RXE_POOL_STATE_VALID) { - read_unlock_irqrestore(&pool->pool_lock, flags); - return NULL; - } kref_get(&pool->ref_cnt); - read_unlock_irqrestore(&pool->pool_lock, flags); if (!ib_device_try_get(&pool->rxe->ib_dev)) goto out_put_pool; @@ -331,38 +363,39 @@ void *rxe_alloc(struct rxe_pool *pool) if (atomic_inc_return(&pool->num_elem) > pool->max_elem) goto out_cnt; - elem = kzalloc(rxe_type_info[pool->type].size, - (pool->flags & RXE_POOL_ATOMIC) ? - GFP_ATOMIC : GFP_KERNEL); - if (!elem) - goto out_cnt; - elem->pool = pool; kref_init(&elem->ref_cnt); - return elem; - + return 0; out_cnt: atomic_dec(&pool->num_elem); ib_device_put(&pool->rxe->ib_dev); out_put_pool: rxe_pool_put(pool); - return NULL; + return -EINVAL; } -int rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem) +int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem) { + int err; unsigned long flags; - might_sleep_if(!(pool->flags & RXE_POOL_ATOMIC)); + write_lock_irqsave(&pool->pool_lock, flags); + err = __add_to_pool(pool, elem); + write_unlock_irqrestore(&pool->pool_lock, flags); + + return err; +} + +void *__alloc(struct rxe_pool *pool) +{ + void *obj; + struct rxe_pool_entry *elem; + + if (pool->state != RXE_POOL_STATE_VALID) + return NULL; - read_lock_irqsave(&pool->pool_lock, flags); - if (pool->state != RXE_POOL_STATE_VALID) { - read_unlock_irqrestore(&pool->pool_lock, flags); - return -EINVAL; - } kref_get(&pool->ref_cnt); - read_unlock_irqrestore(&pool->pool_lock, flags); if (!ib_device_try_get(&pool->rxe->ib_dev)) goto out_put_pool; @@ -370,17 +403,52 @@ int rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem) if (atomic_inc_return(&pool->num_elem) > pool->max_elem) goto out_cnt; + obj = kzalloc(rxe_type_info[pool->type].size, GFP_ATOMIC); + if (!obj) + goto out_cnt; + + elem = (struct rxe_pool_entry *)((u8 *)obj + + rxe_type_info[pool->type].elem_offset); + elem->pool = pool; kref_init(&elem->ref_cnt); - return 0; + return obj; out_cnt: atomic_dec(&pool->num_elem); ib_device_put(&pool->rxe->ib_dev); out_put_pool: rxe_pool_put(pool); - return -EINVAL; + return NULL; +} + +void *rxe_alloc(struct rxe_pool *pool) +{ + void *obj; + struct rxe_pool_entry *elem; + unsigned long flags; + int err; + + /* allocate object outside of lock in case it sleeps */ + obj = kzalloc(rxe_type_info[pool->type].size, + (pool->flags & RXE_POOL_ATOMIC) ? + GFP_ATOMIC : GFP_KERNEL); + if (!obj) + goto out; + + elem = (struct rxe_pool_entry *)((u8 *)obj + + rxe_type_info[pool->type].elem_offset); + + write_lock_irqsave(&pool->pool_lock, flags); + err = __add_to_pool(pool, elem); + write_unlock_irqrestore(&pool->pool_lock, flags); + if (err) { + kfree(obj); + obj = NULL; + } +out: + return obj; } void rxe_elem_release(struct kref *kref) @@ -394,21 +462,20 @@ void rxe_elem_release(struct kref *kref) if (!(pool->flags & RXE_POOL_NO_ALLOC)) kfree(elem); + atomic_dec(&pool->num_elem); ib_device_put(&pool->rxe->ib_dev); rxe_pool_put(pool); } -void *rxe_pool_get_index(struct rxe_pool *pool, u32 index) +void *__get_index(struct rxe_pool *pool, u32 index) { - struct rb_node *node = NULL; + struct rb_node *node; struct rxe_pool_entry *elem = NULL; - unsigned long flags; - - read_lock_irqsave(&pool->pool_lock, flags); + void *obj; if (pool->state != RXE_POOL_STATE_VALID) - goto out; + return NULL; node = pool->index.tree.rb_node; @@ -425,22 +492,32 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index) } } -out: - read_unlock_irqrestore(&pool->pool_lock, flags); - return node ? elem : NULL; + obj = (u8 *)elem - rxe_type_info[pool->type].elem_offset; + + return node ? obj : NULL; } -void *rxe_pool_get_key(struct rxe_pool *pool, void *key) +void *rxe_get_index(struct rxe_pool *pool, u32 index) { - struct rb_node *node = NULL; - struct rxe_pool_entry *elem = NULL; - int cmp; + void *obj; unsigned long flags; read_lock_irqsave(&pool->pool_lock, flags); + obj = __get_index(pool, index); + read_unlock_irqrestore(&pool->pool_lock, flags); + + return obj; +} + +void *__get_key(struct rxe_pool *pool, void *key) +{ + struct rb_node *node; + struct rxe_pool_entry *elem = NULL; + int cmp; + void *obj; if (pool->state != RXE_POOL_STATE_VALID) - goto out; + return NULL; node = pool->key.tree.rb_node; @@ -450,18 +527,29 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key) cmp = memcmp((u8 *)elem + pool->key.key_offset, key, pool->key.key_size); - if (cmp > 0) + if (cmp > 0) { node = node->rb_left; - else if (cmp < 0) + } else if (cmp < 0) { node = node->rb_right; - else + } else { + kref_get(&elem->ref_cnt); break; + } } - if (node) - kref_get(&elem->ref_cnt); + obj = (u8 *)elem - rxe_type_info[pool->type].elem_offset; -out: + return node ? obj : NULL; +} + +void *rxe_get_key(struct rxe_pool *pool, void *key) +{ + void *obj; + unsigned long flags; + + read_lock_irqsave(&pool->pool_lock, flags); + obj = __get_key(pool, key); read_unlock_irqrestore(&pool->pool_lock, flags); - return node ? elem : NULL; + + return obj; } diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h index 5be975e3d5d3..446880026805 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.h +++ b/drivers/infiniband/sw/rxe/rxe_pool.h @@ -36,6 +36,7 @@ struct rxe_pool_entry; struct rxe_type_info { const char *name; size_t size; + size_t elem_offset; void (*cleanup)(struct rxe_pool_entry *obj); enum rxe_pool_flags flags; u32 max_index; @@ -105,35 +106,69 @@ int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, /* free resources from object pool */ void rxe_pool_cleanup(struct rxe_pool *pool); -/* allocate an object from pool */ -void *rxe_alloc(struct rxe_pool *pool); +/* in the following rxe_xxx() take pool->pool_lock + * and xxx() do not. Sequences of xxx() calls should be + * protected by taking the pool->pool_lock by caller + */ +void __add_index(struct rxe_pool_entry *elem); -/* connect already allocated object to pool */ -int rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem); +#define add_index(obj) __add_index(&(obj)->pelem) -/* assign an index to an indexed object and insert object into - * pool's rb tree - */ -void rxe_add_index(void *elem); +void __rxe_add_index(struct rxe_pool_entry *elem); + +#define rxe_add_index(obj) __rxe_add_index(&(obj)->pelem) + +void __drop_index(struct rxe_pool_entry *elem); + +#define drop_index(obj) __drop_index(&(obj)->pelem) + +void __rxe_drop_index(struct rxe_pool_entry *elem); + +#define rxe_drop_index(obj) __rxe_drop_index(&(obj)->pelem) + +int __add_key(struct rxe_pool_entry *elem, void *key); + +#define add_key(obj, key) __add_key(&(obj)->pelem, key) + +int __rxe_add_key(struct rxe_pool_entry *elem, void *key); -/* drop an index and remove object from rb tree */ -void rxe_drop_index(void *elem); +#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->pelem, key) -/* assign a key to a keyed object and insert object into - * pool's rb tree +void __drop_key(struct rxe_pool_entry *elem); + +#define drop_key(obj) __drop_key(&(obj)->pelem) + +void __rxe_drop_key(struct rxe_pool_entry *elem); + +#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->pelem) + +int __add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem); + +#define add_to_pool(pool, obj) __add_to_pool(pool, &(obj)->pelem) + +int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem); + +#define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->pelem) + +/* the following routines allocate new objects or + * lookup objects from an index or key and return + * the address if found. the rxe_XXX() routines take the + * pool->pool_lock the __xxx() do not. Sequences of + * unprotected pool operations should be protected by + * taking the pool->pool_lock by the caller */ -int rxe_add_key(void *elem, void *key); +void *__alloc(struct rxe_pool *pool); + +void *rxe_alloc(struct rxe_pool *pool); + +void *__get_index(struct rxe_pool *pool, u32 index); -/* remove elem from rb tree */ -void rxe_drop_key(void *elem); +void *rxe_get_index(struct rxe_pool *pool, u32 index); -/* lookup an indexed object from index. takes a reference on object */ -void *rxe_pool_get_index(struct rxe_pool *pool, u32 index); +void *__get_key(struct rxe_pool *pool, void *key); -/* lookup keyed object from key. takes a reference on the object */ -void *rxe_pool_get_key(struct rxe_pool *pool, void *key); +void *rxe_get_key(struct rxe_pool *pool, void *key); -/* cleanup an object when all references are dropped */ void rxe_elem_release(struct kref *kref); /* take a reference on an object */ diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index a3eed4da1540..50411b0069ba 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -185,7 +185,7 @@ static int hdr_check(struct rxe_pkt_info *pkt) if (qpn != IB_MULTICAST_QPN) { index = (qpn == 1) ? port->qp_gsi_index : qpn; - qp = rxe_pool_get_index(&rxe->qp_pool, index); + qp = rxe_get_index(&rxe->qp_pool, index); if (unlikely(!qp)) { pr_warn_ratelimited("no qp matches qpn 0x%x\n", qpn); goto err1; @@ -242,7 +242,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) memcpy(&dgid, &ipv6_hdr(skb)->daddr, sizeof(dgid)); /* lookup mcast group corresponding to mgid, takes a ref */ - mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid); + mcg = rxe_get_key(&rxe->mc_grp_pool, &dgid); if (!mcg) goto err1; /* mcast group not registered */ diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index e4de0f4a3d69..0428965c664b 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -564,7 +564,7 @@ static int invalidate_key(struct rxe_qp *qp, u32 key) struct rxe_mr *mr; if (key & IS_MW) { - mw = rxe_pool_get_key(&rxe->mw_pool, &key); + mw = rxe_get_key(&rxe->mw_pool, &key); if (!mw) { pr_err("No mw for key %#x\n", key); return -EINVAL; @@ -572,7 +572,7 @@ static int invalidate_key(struct rxe_qp *qp, u32 key) ret = rxe_invalidate_mw(qp, mw); rxe_drop_ref(mw); } else { - mr = rxe_pool_get_key(&rxe->mr_pool, &key); + mr = rxe_get_key(&rxe->mr_pool, &key); if (!mr) { pr_err("No mr for key %#x\n", key); return -EINVAL; diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 846baeec61be..58d3783063bb 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -441,7 +441,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp, * since the last packet */ if (rkey & IS_MW) { - mw = rxe_pool_get_key(&rxe->mw_pool, &rkey); + mw = rxe_get_key(&rxe->mw_pool, &rkey); if (!mw) { pr_err_once("no MW found with rkey = 0x%08x\n", rkey); state = RESPST_ERR_RKEY_VIOLATION; @@ -465,7 +465,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp, spin_unlock_irqrestore(&mw->lock, flags); rxe_drop_ref(mw); } else { - mr = rxe_pool_get_key(&rxe->mr_pool, &rkey); + mr = rxe_get_key(&rxe->mr_pool, &rkey); if (!mr || (mr->rkey != rkey)) { pr_err_once("no MR found with rkey = 0x%08x\n", rkey); state = RESPST_ERR_RKEY_VIOLATION; @@ -793,7 +793,7 @@ static int invalidate_rkey(struct rxe_qp *qp, u32 rkey) struct rxe_mr *mr; if (rkey & IS_MW) { - mw = rxe_pool_get_key(&rxe->mw_pool, &rkey); + mw = rxe_get_key(&rxe->mw_pool, &rkey); if (!mw) { pr_err("No mw for rkey %#x\n", rkey); goto err; @@ -801,7 +801,7 @@ static int invalidate_rkey(struct rxe_qp *qp, u32 rkey) ret = rxe_invalidate_mw(qp, mw); rxe_drop_ref(mw); } else { - mr = rxe_pool_get_key(&rxe->mr_pool, &rkey); + mr = rxe_get_key(&rxe->mr_pool, &rkey); if (!mr || mr->ibmr.rkey != rkey) { pr_err("No mr for rkey %#x\n", rkey); goto err; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 7849d8d72d4c..8f18b992983f 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -111,7 +111,7 @@ static int rxe_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata) struct rxe_dev *rxe = to_rdev(uctx->device); struct rxe_ucontext *uc = to_ruc(uctx); - return rxe_add_to_pool(&rxe->uc_pool, &uc->pelem); + return rxe_add_to_pool(&rxe->uc_pool, uc); } static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc) @@ -145,7 +145,7 @@ static int rxe_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) struct rxe_dev *rxe = to_rdev(ibpd->device); struct rxe_pd *pd = to_rpd(ibpd); - return rxe_add_to_pool(&rxe->pd_pool, &pd->pelem); + return rxe_add_to_pool(&rxe->pd_pool, pd); } static int rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) @@ -169,7 +169,7 @@ static int rxe_create_ah(struct ib_ah *ibah, if (err) return err; - err = rxe_add_to_pool(&rxe->ah_pool, &ah->pelem); + err = rxe_add_to_pool(&rxe->ah_pool, ah); if (err) return err; @@ -275,7 +275,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init, if (err) goto err1; - err = rxe_add_to_pool(&rxe->srq_pool, &srq->pelem); + err = rxe_add_to_pool(&rxe->srq_pool, srq); if (err) goto err1; @@ -776,7 +776,7 @@ static int rxe_create_cq(struct ib_cq *ibcq, if (err) return err; - return rxe_add_to_pool(&rxe->cq_pool, &cq->pelem); + return rxe_add_to_pool(&rxe->cq_pool, cq); } static int rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata) From patchwork Mon Sep 21 20:03:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 950CA1668 for ; Mon, 21 Sep 2020 20:04:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A2AA21789 for ; Mon, 21 Sep 2020 20:04:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AGvrRMzb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727070AbgIUUER (ORCPT ); Mon, 21 Sep 2020 16:04:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUER (ORCPT ); Mon, 21 Sep 2020 16:04:17 -0400 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92616C061755 for ; Mon, 21 Sep 2020 13:04:17 -0700 (PDT) Received: by mail-ot1-x342.google.com with SMTP id m12so13554600otr.0 for ; Mon, 21 Sep 2020 13:04:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FYwKkNRp9uCds/YjAuYWixK/yGwzj9/mfnhHuXBMfrg=; b=AGvrRMzbp+14ikjHWEL/alfwHBHcjbL5C9TqTO+UNMbtxLapEUA2jsCXPZbh2+6TiO j7MQevaSTtdoQbGAYsxPwnEHvYUrfjTiqtVSc1urtfm1jCSAIA04gG3bLJgPRR6xdKRl IMRoo3c39eUcZik2c5Cew2x/IvTMqiaAk39zVMj6VAPNzyKjcBqGQMzS3HpGoFZwXHwf xtCNwe1OlbAnWEAAYPgTV4+Vn8LyDL3S5ZZ1IZBdXwPY+3eaHXyGRUMXYzHZaf3FsOJ3 RQmT8GYBFiwY397ZXFC49HgFvvmhoUrEzAZA/QOqHzDEzsm1hEgDIZBgqvvg3PP3f93f yCUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FYwKkNRp9uCds/YjAuYWixK/yGwzj9/mfnhHuXBMfrg=; b=aqqYV6OnlklLmAA+AGc9f881Dp+/UmemRzVk8AdkImecE+5SkaC8arCFM2MeWMe5ud Ryc616CX6CnjLzIzvbe6G1EdXAaj0o3a1oAQ1efYEpR192uI/M4ST1hWXZK6px2ZtJIJ /YkjaJi5GuqnoOwjzJM1/QifchNo+Um7Lu6IMZ2UTJnWI+X/3js41eAGrZsGN70aamYC 19fcZs3qe9vrpkm3yl4tFAvBWrL2e5/7f2KczSEJLWUlLez8SDzQdXEplnu5/Q15+iyt //xco1OKQvp23pNpqvA4NAHm5JECnFpvu9q03zcIwooRGvgFMGu1LMF7ilfdUj6xPeRH BNFQ== X-Gm-Message-State: AOAM531IrTcih9fNl83sFTsW4sIjNCmma1Z2Jm05ck5E2go6bIepXfSj 4D0gAmgUb5nu2S9Fpg6Dcew= X-Google-Smtp-Source: ABdhPJy0JiKllYfCHkExIIJ9lqh2xHMzWCoS+czXuWCBk6RGYRvsndaWdtZ9JpIPa2mYnVVRcNFDNQ== X-Received: by 2002:a05:6830:1bf9:: with SMTP id k25mr730415otb.310.1600718656936; Mon, 21 Sep 2020 13:04:16 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id 187sm7007095oie.42.2020.09.21.13.04.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:16 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 11/12] rdma_rxe: Fix mcast group allocation bug Date: Mon, 21 Sep 2020 15:03:55 -0500 Message-Id: <20200921200356.8627-12-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This patch does the following: - Cleans up multicast group create to use an atomic sequence of lookup followed by allocate. This fixes an error that occurred when two QPs were attempting to attach to the same mcast address at the same time. - Fixes a bug in rxe_mcast_get_grp not initializing err = 0. If the group is found in get_key the routine can return an uninitialized value to caller. - Changes the variable elem to mce (for mcast elem). This is less likely to confuse readers because of the ambiguity with elem as pool element which is also used. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_mcast.c | 108 ++++++++++++++------------ 1 file changed, 60 insertions(+), 48 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index b09c6594045a..541cc68a8a94 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -7,44 +7,56 @@ #include "rxe.h" #include "rxe_loc.h" -int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid, - struct rxe_mc_grp **grp_p) +/* caller should hold mc_grp_pool->pool_lock */ +static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe, + struct rxe_pool *pool, + union ib_gid *mgid) { int err; struct rxe_mc_grp *grp; - if (rxe->attr.max_mcast_qp_attach == 0) { - err = -EINVAL; - goto err1; - } - - grp = rxe_get_key(&rxe->mc_grp_pool, mgid); - if (grp) - goto done; - - grp = rxe_alloc(&rxe->mc_grp_pool); - if (!grp) { - err = -ENOMEM; - goto err1; - } + grp = __alloc(&rxe->mc_grp_pool); + if (unlikely(!grp)) + return NULL; INIT_LIST_HEAD(&grp->qp_list); spin_lock_init(&grp->mcg_lock); grp->rxe = rxe; - - rxe_add_key(grp, mgid); + add_key(grp, mgid); err = rxe_mcast_add(rxe, mgid); - if (err) - goto err2; + if (unlikely(err)) { + drop_key(grp); + rxe_drop_ref(grp); + return NULL; + } + return grp; +} + +/* atomically lookup or create mc group */ +int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid, + struct rxe_mc_grp **grp_p) +{ + int err = 0; + struct rxe_mc_grp *grp; + struct rxe_pool *pool = &rxe->mc_grp_pool; + unsigned long flags; + + if (unlikely(rxe->attr.max_mcast_qp_attach == 0)) + return -EINVAL; + + write_lock_irqsave(&pool->pool_lock, flags); + grp = __get_key(pool, mgid); + if (grp) + goto done; + + grp = create_grp(rxe, pool, mgid); + if (unlikely(!grp)) + err = -ENOMEM; done: + write_unlock_irqrestore(&pool->pool_lock, flags); *grp_p = grp; - return 0; - -err2: - rxe_drop_ref(grp); -err1: return err; } @@ -52,13 +64,13 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_mc_grp *grp) { int err; - struct rxe_mc_elem *elem; + struct rxe_mc_elem *mce; /* check to see of the qp is already a member of the group */ spin_lock_bh(&qp->grp_lock); spin_lock_bh(&grp->mcg_lock); - list_for_each_entry(elem, &grp->qp_list, qp_list) { - if (elem->qp == qp) { + list_for_each_entry(mce, &grp->qp_list, qp_list) { + if (mce->qp == qp) { err = 0; goto out; } @@ -69,8 +81,8 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, goto out; } - elem = rxe_alloc(&rxe->mc_elem_pool); - if (!elem) { + mce = rxe_alloc(&rxe->mc_elem_pool); + if (!mce) { err = -ENOMEM; goto out; } @@ -79,11 +91,11 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, rxe_add_ref(grp); grp->num_qp++; - elem->qp = qp; - elem->grp = grp; + mce->qp = qp; + mce->grp = grp; - list_add(&elem->qp_list, &grp->qp_list); - list_add(&elem->grp_list, &qp->grp_list); + list_add(&mce->qp_list, &grp->qp_list); + list_add(&mce->grp_list, &qp->grp_list); err = 0; out: @@ -96,7 +108,7 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, union ib_gid *mgid) { struct rxe_mc_grp *grp; - struct rxe_mc_elem *elem, *tmp; + struct rxe_mc_elem *mce, *tmp; grp = rxe_get_key(&rxe->mc_grp_pool, mgid); if (!grp) @@ -105,15 +117,15 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, spin_lock_bh(&qp->grp_lock); spin_lock_bh(&grp->mcg_lock); - list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) { - if (elem->qp == qp) { - list_del(&elem->qp_list); - list_del(&elem->grp_list); + list_for_each_entry_safe(mce, tmp, &grp->qp_list, qp_list) { + if (mce->qp == qp) { + list_del(&mce->qp_list); + list_del(&mce->grp_list); grp->num_qp--; spin_unlock_bh(&grp->mcg_lock); spin_unlock_bh(&qp->grp_lock); - rxe_drop_ref(elem); + rxe_drop_ref(mce); rxe_drop_ref(grp); /* ref held by QP */ rxe_drop_ref(grp); /* ref from get_key */ return 0; @@ -130,7 +142,7 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, void rxe_drop_all_mcast_groups(struct rxe_qp *qp) { struct rxe_mc_grp *grp; - struct rxe_mc_elem *elem; + struct rxe_mc_elem *mce; while (1) { spin_lock_bh(&qp->grp_lock); @@ -138,24 +150,24 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp) spin_unlock_bh(&qp->grp_lock); break; } - elem = list_first_entry(&qp->grp_list, struct rxe_mc_elem, + mce = list_first_entry(&qp->grp_list, struct rxe_mc_elem, grp_list); - list_del(&elem->grp_list); + list_del(&mce->grp_list); spin_unlock_bh(&qp->grp_lock); - grp = elem->grp; + grp = mce->grp; spin_lock_bh(&grp->mcg_lock); - list_del(&elem->qp_list); + list_del(&mce->qp_list); grp->num_qp--; spin_unlock_bh(&grp->mcg_lock); rxe_drop_ref(grp); - rxe_drop_ref(elem); + rxe_drop_ref(mce); } } -void rxe_mc_cleanup(struct rxe_pool_entry *arg) +void rxe_mc_cleanup(struct rxe_pool_entry *pelem) { - struct rxe_mc_grp *grp = container_of(arg, typeof(*grp), pelem); + struct rxe_mc_grp *grp = container_of(pelem, struct rxe_mc_grp, pelem); struct rxe_dev *rxe = grp->rxe; rxe_drop_key(grp); From patchwork Mon Sep 21 20:03:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 11790851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6FF0859D for ; Mon, 21 Sep 2020 20:04:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54D9621BE5 for ; Mon, 21 Sep 2020 20:04:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="B3bYwBm8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727113AbgIUUES (ORCPT ); Mon, 21 Sep 2020 16:04:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726456AbgIUUES (ORCPT ); Mon, 21 Sep 2020 16:04:18 -0400 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C780C061755 for ; Mon, 21 Sep 2020 13:04:18 -0700 (PDT) Received: by mail-oi1-x244.google.com with SMTP id a3so18408750oib.4 for ; Mon, 21 Sep 2020 13:04:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jp7ZZYgIOdVyNmoMY+S8yOvgoAMhSFL6qtsTnPRp9iQ=; b=B3bYwBm8COFdgMgZj4UgpbkwHO4DiX3Kd3cV9rosSdPyp13/uvB0Rt3/YFxvtNtT81 2FkO6KMd4sz+BBD2n9RQRGEmltmxXqqNIvToGwUjIfWtpTFWNDQchcn2AiaaRswalln3 yuG5f72MRDkgLID1iwl7dTySWiHLAZuuLtw4iZclxL2mjO0n42p1ZAiQphIEm15aT478 EUSWrQqI/oAkEXshHbTuzZKkFKlRp0l8JeMS4Xb8gZpiuVTfRDvBwUv/qvYh/ZcUGoY5 uDcDMVyrroJflliszoLESoEfR76pxDmPB/fkUDA3AijzEIe5+/RSrxCjSXf4cI9yAqNq c3Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jp7ZZYgIOdVyNmoMY+S8yOvgoAMhSFL6qtsTnPRp9iQ=; b=dxhet1ldbttYay3nUpxv/fACW4AvAAtHQQZOfywn2G0PtdY4P0pLcBT18zZ/4hIOou 7k38uVI+zzGVVrykDJ5mUdze7lOvLfRAvrnnsJunqO41MpdruMjjH6AhOV1+Zgv8ZcMX 9CgKj6UOaZzm8oiar2NvPBywknIEpodTaPJkq76dtUpopMK1OCGrQuPEmT9ok2FxLNgg KwZBzMwPAFdWwAvSivC/7G9VqnrldTIvWC1E4lb9/AfvC9dACGLrkqWPcon9DeMGJE7o icpASvfHmJtpPVdSNWMcmnee068irVufJPxaruct3qLq3BO+7VlT5NFKMVNiVLgutX7W J7lA== X-Gm-Message-State: AOAM532AYnWUYGzRQzpRD4wfL1gjG01zgACmrPNEzFaZeNP7wCvqZFGl hRkFBAFfWI9kDmMXZRSm0lw= X-Google-Smtp-Source: ABdhPJzDZCELtW8kT62Qd0XrNbl0xvcMY+KiEYqPQ3rE+xohP+DgCRN29lN56r/jmydirjYCYARijA== X-Received: by 2002:a05:6808:8f3:: with SMTP id d19mr636421oic.36.1600718657819; Mon, 21 Sep 2020 13:04:17 -0700 (PDT) Received: from localhost ([2605:6000:8b03:f000:9211:f781:8595:d131]) by smtp.gmail.com with ESMTPSA id u68sm24285otb.9.2020.09.21.13.04.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 13:04:17 -0700 (PDT) From: Bob Pearson X-Google-Original-From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v6 12/12] rdma_rxe: Fix bugs in the multicast receive path Date: Mon, 21 Sep 2020 15:03:56 -0500 Message-Id: <20200921200356.8627-13-rpearson@hpe.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200921200356.8627-1-rpearson@hpe.com> References: <20200921200356.8627-1-rpearson@hpe.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This patch does the following - Fix a bug in rxe_rcv. The current code calls rxe_match_dgid which checks to see if the destination ip address (dgid) matches one of the addresses in the gid table. This is ok for unicast adfdresses but not mcast addresses. Because of this all mcast packets were previously dropped. - Fix a bug in rxe_rcv_mcast_pkt. The current code is just wrong. It assumed that it could pass the same skb to rxe_rcv_pkt changing the qp pointer as it went when multiple QPs were attached to the same mcast address. In fact each QP needs a separate clone of the skb which it will delete later. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_recv.c | 60 +++++++++++++++++----------- 1 file changed, 36 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 50411b0069ba..bc86ebbd2c8c 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -233,6 +233,8 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) struct rxe_mc_elem *mce; struct rxe_qp *qp; union ib_gid dgid; + struct sk_buff *per_qp_skb; + struct rxe_pkt_info *per_qp_pkt; int err; if (skb->protocol == htons(ETH_P_IP)) @@ -261,42 +263,37 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) if (err) continue; - /* if *not* the last qp in the list - * increase the users of the skb then post to the next qp + /* for all but the last qp create a new clone of the + * skb and pass to the qp. + * This effectively reverts an earlier change + * which did not work. The pkt struct is contained + * in the skb so each time you changed pkt you also + * changed all the earlier pkts as well. Caused a mess. */ if (mce->qp_list.next != &mcg->qp_list) - skb_get(skb); + per_qp_skb = skb_clone(skb, GFP_ATOMIC); + else + per_qp_skb = skb; - pkt->qp = qp; + per_qp_pkt = SKB_TO_PKT(per_qp_skb); + per_qp_pkt->qp = qp; rxe_add_ref(qp); - rxe_rcv_pkt(pkt, skb); + rxe_rcv_pkt(per_qp_pkt, per_qp_skb); } spin_unlock_bh(&mcg->mcg_lock); - rxe_drop_ref(mcg); /* drop ref from rxe_pool_get_key. */ + return; err1: kfree_skb(skb); + return; } -static int rxe_match_dgid(struct rxe_dev *rxe, struct sk_buff *skb) +static int rxe_match_dgid(struct rxe_dev *rxe, struct sk_buff *skb, + union ib_gid *pdgid) { - struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); const struct ib_gid_attr *gid_attr; - union ib_gid dgid; - union ib_gid *pdgid; - - if (pkt->mask & RXE_LOOPBACK_MASK) - return 0; - - if (skb->protocol == htons(ETH_P_IP)) { - ipv6_addr_set_v4mapped(ip_hdr(skb)->daddr, - (struct in6_addr *)&dgid); - pdgid = &dgid; - } else { - pdgid = (union ib_gid *)&ipv6_hdr(skb)->daddr; - } gid_attr = rdma_find_gid_by_port(&rxe->ib_dev, pdgid, IB_GID_TYPE_ROCE_UDP_ENCAP, @@ -314,17 +311,32 @@ void rxe_rcv(struct sk_buff *skb) int err; struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); struct rxe_dev *rxe = pkt->rxe; + union ib_gid dgid; + union ib_gid *pdgid; __be32 *icrcp; u32 calc_icrc, pack_icrc; + int is_mc; pkt->offset = 0; if (unlikely(skb->len < pkt->offset + RXE_BTH_BYTES)) goto drop; - if (rxe_match_dgid(rxe, skb) < 0) { - pr_warn_ratelimited("failed matching dgid\n"); - goto drop; + if (skb->protocol == htons(ETH_P_IP)) { + ipv6_addr_set_v4mapped(ip_hdr(skb)->daddr, + (struct in6_addr *)&dgid); + pdgid = &dgid; + } else { + pdgid = (union ib_gid *)&ipv6_hdr(skb)->daddr; + } + + is_mc = rdma_is_multicast_addr((struct in6_addr *)pdgid); + + if (!(pkt->mask & RXE_LOOPBACK_MASK) && !is_mc) { + if (rxe_match_dgid(rxe, skb, pdgid) < 0) { + pr_warn_ratelimited("failed matching dgid\n"); + goto drop; + } } pkt->opcode = bth_opcode(pkt);