From patchwork Fri Oct 23 16:39:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Xiong, Jianxin" X-Patchwork-Id: 11853933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 895A6C55179 for ; Fri, 23 Oct 2020 16:26:16 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 33A5921D47 for ; Fri, 23 Oct 2020 16:26:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33A5921D47 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 400BD6E11A; Fri, 23 Oct 2020 16:26:13 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id B57C16E11A for ; Fri, 23 Oct 2020 16:26:11 +0000 (UTC) IronPort-SDR: 52K9QN5F7/XY19iueiwcqs1SAH0kM9OTn2v9i94sB0hZBFXiuEmabTUk/HXV8uIQr/IubwLJGh oYylMSKl4fPQ== X-IronPort-AV: E=McAfee;i="6000,8403,9782"; a="167820726" X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="167820726" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2020 09:26:11 -0700 IronPort-SDR: RwXFhkyN3o44iPFEzeeVbyfMK7kLy3Fl7qOiC/Q3V6rMd6gH8Tyv3LhaLUrXsToI6pKJacVitd BJb20Y2vDl9w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="349257185" Received: from cst-dev.jf.intel.com ([10.23.221.69]) by fmsmga004.fm.intel.com with ESMTP; 23 Oct 2020 09:26:10 -0700 From: Jianxin Xiong To: linux-rdma@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH v6 1/4] RDMA/umem: Support importing dma-buf as user memory region Date: Fri, 23 Oct 2020 09:39:58 -0700 Message-Id: <1603471201-32588-2-git-send-email-jianxin.xiong@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> References: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leon Romanovsky , Jason Gunthorpe , Doug Ledford , Daniel Vetter , Christian Koenig , Jianxin Xiong MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Dma-buf is a standard cross-driver buffer sharing mechanism that can be used to support peer-to-peer access from RDMA devices. Device memory exported via dma-buf is associated with a file descriptor. This is passed to the user space as a property associated with the buffer allocation. When the buffer is registered as a memory region, the file descriptor is passed to the RDMA driver along with other parameters. Implement the common code for importing dma-buf object and mapping dma-buf pages. Signed-off-by: Jianxin Xiong Reviewed-by: Sean Hefty Acked-by: Michael J. Ruhl Acked-by: Christian Koenig Acked-by: Daniel Vetter --- drivers/infiniband/core/Makefile | 2 +- drivers/infiniband/core/umem.c | 4 + drivers/infiniband/core/umem_dmabuf.c | 197 ++++++++++++++++++++++++++++++++++ drivers/infiniband/core/umem_dmabuf.h | 11 ++ include/rdma/ib_umem.h | 35 +++++- 5 files changed, 247 insertions(+), 2 deletions(-) create mode 100644 drivers/infiniband/core/umem_dmabuf.c create mode 100644 drivers/infiniband/core/umem_dmabuf.h diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index ccf2670..8ab4eea 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -40,5 +40,5 @@ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \ uverbs_std_types_srq.o \ uverbs_std_types_wq.o \ uverbs_std_types_qp.o -ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o +ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index e9fecbd..2c45525 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -2,6 +2,7 @@ * Copyright (c) 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 Cisco Systems. All rights reserved. * Copyright (c) 2005 Mellanox Technologies. All rights reserved. + * Copyright (c) 2020 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -43,6 +44,7 @@ #include #include "uverbs.h" +#include "umem_dmabuf.h" static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty) { @@ -269,6 +271,8 @@ void ib_umem_release(struct ib_umem *umem) { if (!umem) return; + if (umem->is_dmabuf) + return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem)); if (umem->is_odp) return ib_umem_odp_release(to_ib_umem_odp(umem)); diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c new file mode 100644 index 0000000..66b234d --- /dev/null +++ b/drivers/infiniband/core/umem_dmabuf.c @@ -0,0 +1,197 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +/* + * Copyright (c) 2020 Intel Corporation. All rights reserved. + */ + +#include +#include +#include + +#include "uverbs.h" +#include "umem_dmabuf.h" + +/* + * Generate a new dma sg list from a sub range of an existing dma sg list. + * Both the input and output have their entries page aligned. + */ +static int ib_umem_dmabuf_sgt_slice(struct sg_table *sgt, u64 offset, + u64 length, struct sg_table *new_sgt) +{ + struct scatterlist *sg, *new_sg; + u64 start, end, off, addr, len; + unsigned int new_nents; + int err; + int i; + + start = ALIGN_DOWN(offset, PAGE_SIZE); + end = ALIGN(offset + length, PAGE_SIZE); + + offset = start; + length = end - start; + new_nents = 0; + for_each_sgtable_dma_sg(sgt, sg, i) { + len = sg_dma_len(sg); + off = min(len, offset); + len -= off; + len = min(len, length); + if (len) + new_nents++; + length -= len; + offset -= off; + } + + err = sg_alloc_table(new_sgt, new_nents, GFP_KERNEL); + if (err) + return err; + + offset = start; + length = end - start; + new_sg = new_sgt->sgl; + for_each_sgtable_dma_sg(sgt, sg, i) { + addr = sg_dma_address(sg); + len = sg_dma_len(sg); + off = min(len, offset); + addr += off; + len -= off; + len = min(len, length); + if (len) { + sg_dma_address(new_sg) = addr; + sg_dma_len(new_sg) = len; + new_sg = sg_next(new_sg); + } + length -= len; + offset -= off; + } + + return 0; +} + +int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) +{ + struct sg_table *sgt; + struct dma_fence *fence; + int err; + + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv); + + sgt = dma_buf_map_attachment(umem_dmabuf->attach, + DMA_BIDIRECTIONAL); + + if (IS_ERR(sgt)) + return PTR_ERR(sgt); + + err = ib_umem_dmabuf_sgt_slice(sgt, umem_dmabuf->umem.address, + umem_dmabuf->umem.length, + &umem_dmabuf->umem.sg_head); + if (err) { + dma_buf_unmap_attachment(umem_dmabuf->attach, sgt, + DMA_BIDIRECTIONAL); + return err; + } + + umem_dmabuf->umem.nmap = umem_dmabuf->umem.sg_head.nents; + umem_dmabuf->sgt = sgt; + + /* + * Although the sg list is valid now, the content of the pages + * may be not up-to-date. Wait for the exporter to finish + * the migration. + */ + fence = dma_resv_get_excl(umem_dmabuf->attach->dmabuf->resv); + if (fence) + dma_fence_wait(fence, false); + + return 0; +} +EXPORT_SYMBOL(ib_umem_dmabuf_map_pages); + +void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf) +{ + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv); + + if (!umem_dmabuf->sgt) + return; + + sg_free_table(&umem_dmabuf->umem.sg_head); + dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt, + DMA_BIDIRECTIONAL); + umem_dmabuf->sgt = NULL; +} +EXPORT_SYMBOL(ib_umem_dmabuf_unmap_pages); + +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device, + unsigned long offset, size_t size, + int fd, int access, + const struct dma_buf_attach_ops *ops) +{ + struct dma_buf *dmabuf; + struct ib_umem_dmabuf *umem_dmabuf; + struct ib_umem *umem; + unsigned long end; + long ret; + + if (check_add_overflow(offset, (unsigned long)size, &end)) + return ERR_PTR(-EINVAL); + + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE)) + return ERR_PTR(-EINVAL); + + if (unlikely(!ops || !ops->move_notify)) + return ERR_PTR(-EINVAL); + +#ifdef CONFIG_DMA_VIRT_OPS + if (device->dma_device->dma_ops == &dma_virt_ops) + return ERR_PTR(-EINVAL); +#endif + + umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL); + if (!umem_dmabuf) + return ERR_PTR(-ENOMEM); + + umem = &umem_dmabuf->umem; + umem->ibdev = device; + umem->length = size; + umem->address = offset; + umem->iova = offset; + umem->writable = ib_access_writable(access); + umem->is_dmabuf = 1; + + dmabuf = dma_buf_get(fd); + if (IS_ERR(dmabuf)) { + ret = PTR_ERR(dmabuf); + goto out_free_umem; + } + + umem_dmabuf->attach = dma_buf_dynamic_attach( + dmabuf, + device->dma_device, + ops, + umem_dmabuf); + if (IS_ERR(umem_dmabuf->attach)) { + ret = PTR_ERR(umem_dmabuf->attach); + goto out_release_dmabuf; + } + + return umem; + +out_release_dmabuf: + dma_buf_put(dmabuf); + +out_free_umem: + kfree(umem_dmabuf); + return ERR_PTR(ret); +} +EXPORT_SYMBOL(ib_umem_dmabuf_get); + +void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf) +{ + struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf; + + dma_resv_lock(dmabuf->resv, NULL); + ib_umem_dmabuf_unmap_pages(umem_dmabuf); + dma_resv_unlock(dmabuf->resv); + + dma_buf_detach(dmabuf, umem_dmabuf->attach); + dma_buf_put(dmabuf); + kfree(umem_dmabuf); +} diff --git a/drivers/infiniband/core/umem_dmabuf.h b/drivers/infiniband/core/umem_dmabuf.h new file mode 100644 index 0000000..13acf55 --- /dev/null +++ b/drivers/infiniband/core/umem_dmabuf.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/* + * Copyright (c) 2020 Intel Corporation. All rights reserved. + */ + +#ifndef UMEM_DMABUF_H +#define UMEM_DMABUF_H + +void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf); + +#endif /* UMEM_DMABUF_H */ diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 7059750..73a7b19 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ /* * Copyright (c) 2007 Cisco Systems. All rights reserved. + * Copyright (c) 2020 Intel Corporation. All rights reserved. */ #ifndef IB_UMEM_H @@ -13,6 +14,7 @@ struct ib_ucontext; struct ib_umem_odp; +struct dma_buf_attach_ops; struct ib_umem { struct ib_device *ibdev; @@ -22,12 +24,25 @@ struct ib_umem { unsigned long address; u32 writable : 1; u32 is_odp : 1; + u32 is_dmabuf : 1; struct work_struct work; struct sg_table sg_head; int nmap; unsigned int sg_nents; }; +struct ib_umem_dmabuf { + struct ib_umem umem; + struct dma_buf_attachment *attach; + struct sg_table *sgt; + void *device_context; +}; + +static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem) +{ + return container_of(umem, struct ib_umem_dmabuf, umem); +} + /* Returns the offset of the umem start relative to the first page. */ static inline int ib_umem_offset(struct ib_umem *umem) { @@ -79,6 +94,12 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, unsigned long pgsz_bitmap, unsigned long virt); +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device, + unsigned long offset, size_t size, + int fd, int access, + const struct dma_buf_attach_ops *ops); +int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf); +void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -101,7 +122,19 @@ static inline unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, { return 0; } +static inline struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device, + unsigned long offset, + size_t size, int fd, + int access, + struct dma_buf_attach_ops *ops) +{ + return ERR_PTR(-EINVAL); +} +static inline int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) +{ + return -EINVAL; +} +static inline void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf) { } #endif /* CONFIG_INFINIBAND_USER_MEM */ - #endif /* IB_UMEM_H */ From patchwork Fri Oct 23 16:39:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Xiong, Jianxin" X-Patchwork-Id: 11853937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B63AC388F9 for ; Fri, 23 Oct 2020 16:26:18 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 451EC21D47 for ; Fri, 23 Oct 2020 16:26:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 451EC21D47 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2D8366E1BD; Fri, 23 Oct 2020 16:26:16 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 692646E145 for ; Fri, 23 Oct 2020 16:26:13 +0000 (UTC) IronPort-SDR: wb1vYjTScRktxG2MVLt4nOVp8RUBvV1TX+LGFTdWYRYbKc6zPb/5ys8zPTh3l0GuMZ2Gl+QJfZ X7NEhbtUGYiA== X-IronPort-AV: E=McAfee;i="6000,8403,9782"; a="167820728" X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="167820728" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2020 09:26:13 -0700 IronPort-SDR: kn9nRitkIU9KBZtka5v7dlLch/6Z4D4K3lpKLolOkJR4W3sSJQIBWJwUnmVzBP0IcR+qq+wusm V5dn7zyB7wxA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="349257197" Received: from cst-dev.jf.intel.com ([10.23.221.69]) by fmsmga004.fm.intel.com with ESMTP; 23 Oct 2020 09:26:12 -0700 From: Jianxin Xiong To: linux-rdma@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH v6 2/4] RDMA/core: Add device method for registering dma-buf base memory region Date: Fri, 23 Oct 2020 09:39:59 -0700 Message-Id: <1603471201-32588-3-git-send-email-jianxin.xiong@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> References: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leon Romanovsky , Jason Gunthorpe , Doug Ledford , Daniel Vetter , Christian Koenig , Jianxin Xiong MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Dma-buf based memory region requires one extra parameter and is processed quite differently. Adding a separate method allows clean separation from regular memory regions. Signed-off-by: Jianxin Xiong Reviewed-by: Sean Hefty Acked-by: Michael J. Ruhl Acked-by: Christian Koenig Acked-by: Daniel Vetter --- drivers/infiniband/core/device.c | 1 + include/rdma/ib_verbs.h | 6 +++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index feaec8d..d6cd0ac 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -2653,6 +2653,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) SET_DEVICE_OP(dev_ops, read_counters); SET_DEVICE_OP(dev_ops, reg_dm_mr); SET_DEVICE_OP(dev_ops, reg_user_mr); + SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf); SET_DEVICE_OP(dev_ops, req_ncomp_notif); SET_DEVICE_OP(dev_ops, req_notify_cq); SET_DEVICE_OP(dev_ops, rereg_user_mr); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 9bf6c31..5f0f8be 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2,7 +2,7 @@ /* * Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved. * Copyright (c) 2004 Infinicon Corporation. All rights reserved. - * Copyright (c) 2004 Intel Corporation. All rights reserved. + * Copyright (c) 2004, 2020 Intel Corporation. All rights reserved. * Copyright (c) 2004 Topspin Corporation. All rights reserved. * Copyright (c) 2004 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. @@ -2429,6 +2429,10 @@ struct ib_device_ops { struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int mr_access_flags, struct ib_udata *udata); + struct ib_mr *(*reg_user_mr_dmabuf)(struct ib_pd *pd, u64 offset, + u64 length, u64 virt_addr, int fd, + int mr_access_flags, + struct ib_udata *udata); int (*rereg_user_mr)(struct ib_mr *mr, int flags, u64 start, u64 length, u64 virt_addr, int mr_access_flags, struct ib_pd *pd, struct ib_udata *udata); From patchwork Fri Oct 23 16:40:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Xiong, Jianxin" X-Patchwork-Id: 11853939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 533EBC55179 for ; Fri, 23 Oct 2020 16:26:21 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EF0E02192A for ; Fri, 23 Oct 2020 16:26:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF0E02192A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 80DBB6E204; Fri, 23 Oct 2020 16:26:16 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3A0FE6E145 for ; Fri, 23 Oct 2020 16:26:14 +0000 (UTC) IronPort-SDR: z/7yubQzkqIHWQ7qWpCOrhQitVpvmzy8k3dFWMpENvj4a/L5Mh6vjMvwICbc8v/16MQeyQpsj1 5+TYkdlJkPWA== X-IronPort-AV: E=McAfee;i="6000,8403,9782"; a="167820732" X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="167820732" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2020 09:26:14 -0700 IronPort-SDR: 0EwbaYOQ7s5N483dC6rILbkDDd/53/Kl0MHqlhCngdGHe0djlGKyq29IZMEyseJlCvQc3Qk3Fk XjxzOfwJMhHw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="349257206" Received: from cst-dev.jf.intel.com ([10.23.221.69]) by fmsmga004.fm.intel.com with ESMTP; 23 Oct 2020 09:26:13 -0700 From: Jianxin Xiong To: linux-rdma@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH v6 3/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Date: Fri, 23 Oct 2020 09:40:00 -0700 Message-Id: <1603471201-32588-4-git-send-email-jianxin.xiong@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> References: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leon Romanovsky , Jason Gunthorpe , Doug Ledford , Daniel Vetter , Christian Koenig , Jianxin Xiong MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Implement a new uverbs ioctl method for memory registration with file descriptor as an extra parameter. Signed-off-by: Jianxin Xiong Reviewed-by: Sean Hefty Acked-by: Michael J. Ruhl Acked-by: Christian Koenig Acked-by: Daniel Vetter --- drivers/infiniband/core/uverbs_std_types_mr.c | 112 ++++++++++++++++++++++++++ include/uapi/rdma/ib_user_ioctl_cmds.h | 14 ++++ 2 files changed, 126 insertions(+) diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c index 9b22bb5..f22e58b 100644 --- a/drivers/infiniband/core/uverbs_std_types_mr.c +++ b/drivers/infiniband/core/uverbs_std_types_mr.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. + * Copyright (c) 2020, Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -178,6 +179,85 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)( return IS_UVERBS_COPY_ERR(ret) ? ret : 0; } +static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)( + struct uverbs_attr_bundle *attrs) +{ + struct ib_uobject *uobj = + uverbs_attr_get_uobject(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE); + struct ib_pd *pd = + uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE); + struct ib_device *ib_dev = pd->device; + + u64 offset, length, virt_addr; + u32 fd, access_flags; + struct ib_mr *mr; + int ret; + + if (!ib_dev->ops.reg_user_mr_dmabuf) + return -EOPNOTSUPP; + + ret = uverbs_copy_from(&offset, attrs, + UVERBS_ATTR_REG_DMABUF_MR_OFFSET); + if (ret) + return ret; + + ret = uverbs_copy_from(&length, attrs, + UVERBS_ATTR_REG_DMABUF_MR_LENGTH); + if (ret) + return ret; + + ret = uverbs_copy_from(&virt_addr, attrs, + UVERBS_ATTR_REG_DMABUF_MR_IOVA); + if (ret) + return ret; + + ret = uverbs_copy_from(&fd, attrs, + UVERBS_ATTR_REG_DMABUF_MR_FD); + if (ret) + return ret; + + ret = uverbs_get_flags32(&access_flags, attrs, + UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS, + IB_ACCESS_SUPPORTED); + if (ret) + return ret; + + ret = ib_check_mr_access(access_flags); + if (ret) + return ret; + + mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr, + fd, access_flags, + &attrs->driver_udata); + if (IS_ERR(mr)) + return PTR_ERR(mr); + + mr->device = pd->device; + mr->pd = pd; + mr->type = IB_MR_TYPE_USER; + mr->uobject = uobj; + atomic_inc(&pd->usecnt); + + uobj->object = mr; + + ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY, + &mr->lkey, sizeof(mr->lkey)); + if (ret) + goto err_dereg; + + ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY, + &mr->rkey, sizeof(mr->rkey)); + if (ret) + goto err_dereg; + + return 0; + +err_dereg: + ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs)); + + return ret; +} + DECLARE_UVERBS_NAMED_METHOD( UVERBS_METHOD_ADVISE_MR, UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE, @@ -243,6 +323,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)( UVERBS_ATTR_TYPE(u32), UA_MANDATORY)); +DECLARE_UVERBS_NAMED_METHOD( + UVERBS_METHOD_REG_DMABUF_MR, + UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE, + UVERBS_OBJECT_MR, + UVERBS_ACCESS_NEW, + UA_MANDATORY), + UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE, + UVERBS_OBJECT_PD, + UVERBS_ACCESS_READ, + UA_MANDATORY), + UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_OFFSET, + UVERBS_ATTR_TYPE(u64), + UA_MANDATORY), + UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_LENGTH, + UVERBS_ATTR_TYPE(u64), + UA_MANDATORY), + UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_IOVA, + UVERBS_ATTR_TYPE(u64), + UA_MANDATORY), + UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_FD, + UVERBS_ATTR_TYPE(u32), + UA_MANDATORY), + UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS, + enum ib_access_flags), + UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY, + UVERBS_ATTR_TYPE(u32), + UA_MANDATORY), + UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY, + UVERBS_ATTR_TYPE(u32), + UA_MANDATORY)); + DECLARE_UVERBS_NAMED_METHOD_DESTROY( UVERBS_METHOD_MR_DESTROY, UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_MR_HANDLE, @@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)( DECLARE_UVERBS_NAMED_OBJECT( UVERBS_OBJECT_MR, UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr), + &UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR), &UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG), &UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY), &UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR), diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h index 7968a18..dafc7eb 100644 --- a/include/uapi/rdma/ib_user_ioctl_cmds.h +++ b/include/uapi/rdma/ib_user_ioctl_cmds.h @@ -1,5 +1,6 @@ /* * Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. + * Copyright (c) 2020, Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -251,6 +252,7 @@ enum uverbs_methods_mr { UVERBS_METHOD_MR_DESTROY, UVERBS_METHOD_ADVISE_MR, UVERBS_METHOD_QUERY_MR, + UVERBS_METHOD_REG_DMABUF_MR, }; enum uverbs_attrs_mr_destroy_ids { @@ -272,6 +274,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids { UVERBS_ATTR_QUERY_MR_RESP_IOVA, }; +enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids { + UVERBS_ATTR_REG_DMABUF_MR_HANDLE, + UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE, + UVERBS_ATTR_REG_DMABUF_MR_OFFSET, + UVERBS_ATTR_REG_DMABUF_MR_LENGTH, + UVERBS_ATTR_REG_DMABUF_MR_IOVA, + UVERBS_ATTR_REG_DMABUF_MR_FD, + UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS, + UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY, + UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY, +}; + enum uverbs_attrs_create_counters_cmd_attr_ids { UVERBS_ATTR_CREATE_COUNTERS_HANDLE, }; From patchwork Fri Oct 23 16:40:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Xiong, Jianxin" X-Patchwork-Id: 11853935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C2DBC388F9 for ; Fri, 23 Oct 2020 16:26:23 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0C86C2192A for ; Fri, 23 Oct 2020 16:26:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0C86C2192A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 84C0F6E25B; Fri, 23 Oct 2020 16:26:21 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2D3D76E145 for ; Fri, 23 Oct 2020 16:26:16 +0000 (UTC) IronPort-SDR: RtIrAqh9ilSsxurVBUZ+kbBUbSTL2aCeiDZCmXzp6IrmQBtlAiE6Dq3Q7hIexSXm6s4zdJjoJK 9xtvdUURHg5w== X-IronPort-AV: E=McAfee;i="6000,8403,9782"; a="167820737" X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="167820737" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2020 09:26:15 -0700 IronPort-SDR: 0xBzPI2/b35bif2NL2fQWxUKl/CNOqjwcxYMBKxO5Mvt/Ratv83eIUU2Bmqx+Osyi5rGfArs9V 29In0Xqn7R6A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,409,1596524400"; d="scan'208";a="349257214" Received: from cst-dev.jf.intel.com ([10.23.221.69]) by fmsmga004.fm.intel.com with ESMTP; 23 Oct 2020 09:26:15 -0700 From: Jianxin Xiong To: linux-rdma@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH v6 4/4] RDMA/mlx5: Support dma-buf based userspace memory region Date: Fri, 23 Oct 2020 09:40:01 -0700 Message-Id: <1603471201-32588-5-git-send-email-jianxin.xiong@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> References: <1603471201-32588-1-git-send-email-jianxin.xiong@intel.com> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leon Romanovsky , Jason Gunthorpe , Doug Ledford , Daniel Vetter , Christian Koenig , Jianxin Xiong MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Implement the new driver method 'reg_user_mr_dmabuf'. Utilize the core functions to import dma-buf based memory region and update the mappings. Add code to handle dma-buf related page fault. Signed-off-by: Jianxin Xiong Reviewed-by: Sean Hefty Acked-by: Michael J. Ruhl Acked-by: Christian Koenig Acked-by: Daniel Vetter --- drivers/infiniband/hw/mlx5/main.c | 2 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 11 ++++ drivers/infiniband/hw/mlx5/mr.c | 114 ++++++++++++++++++++++++++++++++++- drivers/infiniband/hw/mlx5/odp.c | 70 +++++++++++++++++++-- 4 files changed, 191 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 89e04ca..ec4ad2f 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved. + * Copyright (c) 2020, Intel Corporation. All rights reserved. */ #include @@ -4060,6 +4061,7 @@ static int mlx5_ib_enable_driver(struct ib_device *dev) .query_srq = mlx5_ib_query_srq, .query_ucontext = mlx5_ib_query_ucontext, .reg_user_mr = mlx5_ib_reg_user_mr, + .reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf, .req_notify_cq = mlx5_ib_arm_cq, .rereg_user_mr = mlx5_ib_rereg_user_mr, .resize_cq = mlx5_ib_resize_cq, diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index b1f2b34..4b72ff9 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ /* * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved. + * Copyright (c) 2020, Intel Corporation. All rights reserved. */ #ifndef MLX5_IB_H @@ -639,6 +640,12 @@ static inline bool is_odp_mr(struct mlx5_ib_mr *mr) mr->umem->is_odp; } +static inline bool is_dmabuf_mr(struct mlx5_ib_mr *mr) +{ + return IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && mr->umem && + mr->umem->is_dmabuf; +} + struct mlx5_ib_mw { struct ib_mw ibmw; struct mlx5_core_mkey mmkey; @@ -1174,6 +1181,10 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); +struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start, + u64 length, u64 virt_addr, + int fd, int access_flags, + struct ib_udata *udata); int mlx5_ib_advise_mr(struct ib_pd *pd, enum ib_uverbs_advise_mr_advice advice, u32 flags, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index b261797..3bc412b 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved. + * Copyright (c) 2020, Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -36,6 +37,8 @@ #include #include #include +#include +#include #include #include #include @@ -1113,6 +1116,8 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages, dma_sync_single_for_cpu(ddev, dma, size, DMA_TO_DEVICE); if (mr->umem->is_odp) { mlx5_odp_populate_xlt(xlt, idx, npages, mr, flags); + } else if (mr->umem->is_dmabuf && (flags & MLX5_IB_UPD_XLT_ZAP)) { + memset(xlt, 0, size); } else { __mlx5_ib_populate_pas(dev, mr->umem, page_shift, idx, npages, xlt, @@ -1462,6 +1467,111 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, return ERR_PTR(err); } +static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach) +{ + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv; + struct mlx5_ib_mr *mr = umem_dmabuf->device_context; + + mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, MLX5_IB_UPD_XLT_ZAP); + ib_umem_dmabuf_unmap_pages(umem_dmabuf); +} + +static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = { + .allow_peer2peer = 1, + .move_notify = mlx5_ib_dmabuf_invalidate_cb, +}; + +struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset, + u64 length, u64 virt_addr, + int fd, int access_flags, + struct ib_udata *udata) +{ + struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct mlx5_ib_mr *mr = NULL; + struct ib_umem *umem; + struct ib_umem_dmabuf *umem_dmabuf; + int npages; + int order; + int err; + + if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM)) + return ERR_PTR(-EOPNOTSUPP); + + mlx5_ib_dbg(dev, + "offset 0x%llx, virt_addr 0x%llx, length 0x%llx, fd %d, access_flags 0x%x\n", + offset, virt_addr, length, fd, access_flags); + + if (!mlx5_ib_can_load_pas_with_umr(dev, length)) + return ERR_PTR(-EINVAL); + + umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, access_flags, + &mlx5_ib_dmabuf_attach_ops); + if (IS_ERR(umem)) { + mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem)); + return ERR_PTR(PTR_ERR(umem)); + } + + npages = ib_umem_num_pages(umem); + if (!npages) { + mlx5_ib_warn(dev, "avoid zero region\n"); + ib_umem_release(umem); + return ERR_PTR(-EINVAL); + } + + order = ilog2(roundup_pow_of_two(npages)); + + mlx5_ib_dbg(dev, "npages %d, ncont %d, order %d, page_shift %d\n", + npages, npages, order, PAGE_SHIFT); + + mr = alloc_mr_from_cache(pd, umem, virt_addr, length, npages, + PAGE_SHIFT, order, access_flags); + if (IS_ERR(mr)) + mr = NULL; + + if (!mr) { + mutex_lock(&dev->slow_path_mutex); + mr = reg_create(NULL, pd, virt_addr, length, umem, npages, + PAGE_SHIFT, access_flags, false); + mutex_unlock(&dev->slow_path_mutex); + } + + if (IS_ERR(mr)) { + err = PTR_ERR(mr); + goto error; + } + + mlx5_ib_dbg(dev, "mkey 0x%x\n", mr->mmkey.key); + + mr->umem = umem; + set_mr_fields(dev, mr, npages, length, access_flags); + + umem_dmabuf = to_ib_umem_dmabuf(umem); + umem_dmabuf->device_context = mr; + + err = mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, + MLX5_IB_UPD_XLT_ENABLE | MLX5_IB_UPD_XLT_ZAP); + + if (err) { + dereg_mr(dev, mr); + return ERR_PTR(err); + } + + init_waitqueue_head(&mr->q_deferred_work); + atomic_set(&mr->num_deferred_work, 0); + err = xa_err(xa_store(&dev->odp_mkeys, + mlx5_base_mkey(mr->mmkey.key), &mr->mmkey, + GFP_KERNEL)); + if (err) { + dereg_mr(dev, mr); + return ERR_PTR(err); + } + + return &mr->ibmr; +error: + ib_umem_release(umem); + return ERR_PTR(err); +} + /** * mlx5_mr_cache_invalidate - Fence all DMA on the MR * @mr: The MR to fence @@ -1536,7 +1646,7 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, if (!mr->umem) return -EINVAL; - if (is_odp_mr(mr)) + if (is_odp_mr(mr) || is_dmabuf_mr(mr)) return -EOPNOTSUPP; if (flags & IB_MR_REREG_TRANS) { @@ -1695,7 +1805,7 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) struct ib_umem *umem = mr->umem; /* Stop all DMA */ - if (is_odp_mr(mr)) + if (is_odp_mr(mr) || is_dmabuf_mr(mr)) mlx5_ib_fence_odp_mr(mr); else clean_mr(dev, mr); diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c index 5c853ec..7bc863b 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include "mlx5_ib.h" #include "cmd.h" @@ -204,6 +206,16 @@ static void dma_fence_odp_mr(struct mlx5_ib_mr *mr) } } +static void dma_fence_dmabuf_mr(struct mlx5_ib_mr *mr) +{ + mlx5_mr_cache_invalidate(mr); + + if (!mr->cache_ent) { + mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey); + WARN_ON(mr->descs); + } +} + /* * This must be called after the mr has been removed from implicit_children * and the SRCU synchronized. NOTE: The MR does not necessarily have to be @@ -661,6 +673,9 @@ void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr) wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work)); + if (is_dmabuf_mr(mr)) + return dma_fence_dmabuf_mr(mr); + dma_fence_odp_mr(mr); } @@ -801,6 +816,52 @@ static int pagefault_implicit_mr(struct mlx5_ib_mr *imr, * Returns: * -EFAULT: The io_virt->bcnt is not within the MR, it covers pages that are * not accessible, or the MR is no longer valid. + * -EAGAIN: The operation should be retried + * + * >0: Number of pages mapped + */ +static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, struct ib_umem *umem, + u64 io_virt, size_t bcnt, u32 *bytes_mapped, + u32 flags) +{ + struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem); + u64 user_va; + u64 end; + int npages; + int err; + + if (unlikely(io_virt < mr->mmkey.iova)) + return -EFAULT; + if (check_add_overflow(io_virt - mr->mmkey.iova, + (u64)umem->address, &user_va)) + return -EFAULT; + + /* Overflow has alreddy been checked at the umem creation time */ + end = umem->address + umem->length; + if (unlikely(user_va >= end || end - user_va < bcnt)) + return -EFAULT; + + dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL); + err = ib_umem_dmabuf_map_pages(umem_dmabuf); + if (!err) + err = mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, 0); + dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); + + if (err) + return err; + + if (bytes_mapped) + *bytes_mapped += bcnt; + + npages = (ALIGN(user_va + bcnt, PAGE_SIZE) - + ALIGN_DOWN(user_va, PAGE_SIZE)) >> PAGE_SHIFT; + return npages; +} + +/* + * Returns: + * -EFAULT: The io_virt->bcnt is not within the MR, it covers pages that are + * not accessible, or the MR is no longer valid. * -EAGAIN/-ENOMEM: The operation should be retried * * -EINVAL/others: General internal malfunction @@ -811,6 +872,10 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt, { struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem); + if (is_dmabuf_mr(mr)) + return pagefault_dmabuf_mr(mr, mr->umem, io_virt, bcnt, + bytes_mapped, flags); + lockdep_assert_held(&mr->dev->odp_srcu); if (unlikely(io_virt < mr->mmkey.iova)) return -EFAULT; @@ -1747,7 +1812,6 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work) { struct mlx5_ib_dev *dev = to_mdev(pd->device); struct mlx5_core_mkey *mmkey; - struct ib_umem_odp *odp; struct mlx5_ib_mr *mr; lockdep_assert_held(&dev->odp_srcu); @@ -1761,11 +1825,9 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work) if (mr->ibmr.pd != pd) return NULL; - odp = to_ib_umem_odp(mr->umem); - /* prefetch with write-access must be supported by the MR */ if (advice == IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE && - !odp->umem.writable) + !mr->umem->writable) return NULL; return mr;