From patchwork Sun Nov 10 13:46:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13869881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BAEAD6409B for ; Sun, 10 Nov 2024 13:48:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D45806B00A7; Sun, 10 Nov 2024 08:48:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CCE376B00A8; Sun, 10 Nov 2024 08:48:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B21756B00A9; Sun, 10 Nov 2024 08:48:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8E2B86B00A7 for ; Sun, 10 Nov 2024 08:48:26 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3E1B940DAE for ; Sun, 10 Nov 2024 13:48:26 +0000 (UTC) X-FDA: 82770313206.08.874B6F2 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf17.hostedemail.com (Postfix) with ESMTP id 6B3B640017 for ; Sun, 10 Nov 2024 13:47:54 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="DdR/vQXF"; spf=pass (imf17.hostedemail.com: domain of leon@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731246333; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=drTSOAsm5pzuncFDBUgVzzfK9tYwgI/CfoYb4abfUiY=; b=Lv82zJJQ21iHFk+jeGhajoUdt13u1uh3hjqZCiz0ZzwRnXo99FKKCXRWwdm7NEyusOR16T idn6Ia/YrrLiCSEkqi4f4EmIYQ85tmQeAtsodqmTbdWChBeU0m8ISDMKI+r+eH6XQFSOoQ F7MvQY23ruwQp8OPLDOKRgBaXUjxGrI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="DdR/vQXF"; spf=pass (imf17.hostedemail.com: domain of leon@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731246333; a=rsa-sha256; cv=none; b=XHjTGlu42oM6VG//363AkhHo8M3yrt5Y4JM13vuoLSpavKdgXWaC3YsqoNgiIbLUft83l9 X8ELRVnXQwDhuW7x8W4K/bS5IAPvPgJLPpXKx0wR/Xtaig5Aji8czgEA2bJjr6vakMpTqh gn8zgyv+tz9IJXg69WtIGlmnHjHP65E= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 831FBA40D18; Sun, 10 Nov 2024 13:46:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4DEC3C4CED0; Sun, 10 Nov 2024 13:48:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731246503; bh=d+uKs0GYZW5W2UKBX1M+yeru6myD2v4gQMKNi0BRyd0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DdR/vQXFHPgS3BZenwffB//VagsS3KtxnYzufIIukWw5XGuEBSbohgjtdouNIXjq9 8u8tZp0UYSims/JFF7anbcAqFvRlwgATR9E8a+5JKGz6sSQxuXTqu1xwAZHV8kGkyi 0JPblvlhHZPUJoLoljMhDxABgUDF8UDeb58weJ44+paDldUth90fe23LRU4Zr0rIZ6 NrKML3awO2k1IyE0UxeH/novgss8H87Dznt6AdMsnUx3il0iw9l/uX5g4+g0rmbdUb Wrgfha9h34jendbnLKyy85P8vLr+FDI8cBPpo8OMHqK1ACQmF1TgmUQltZ4K3az7Qp QC3rdNXWO9xbg== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Christoph Hellwig , Sagi Grimberg Cc: Leon Romanovsky , Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Randy Dunlap Subject: [PATCH v3 12/17] RDMA/umem: Store ODP access mask information in PFN Date: Sun, 10 Nov 2024 15:46:59 +0200 Message-ID: <2c8de8796a52d408e5e74785962630f9ece08b8e.1731244445.git.leon@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6B3B640017 X-Stat-Signature: zfqizibmezzz6e5c1zwzqkys1qc3mfm1 X-Rspam-User: X-HE-Tag: 1731246474-205430 X-HE-Meta: U2FsdGVkX1/Y+RUqmKIZ85rP8ENvIvTVzLEz9oXw4xCfmtS282m3poTiW/MaEDTwE1lYRYv2N/9QIZm+W+1aDU7llj4lq3DbZm0K82WC0QwfkiXWvy+l6zHJL4SkbJznGGP95e+4nNqYmJZWOQg6/l+xN3YuWFi3h1KEJLOKV+23McaE6PUQVrE5ZSaleMp4kwDJenQHZlLRa9YLvc9RNU1VNEM/61oap+hYXSZCbfu5clJ7b5pWq7bJMkiq8BhdHPJMSnC2dMtUAfYBtLm2pEcfXp76dkz1WNyJbKNu0iCuGv0kmwvUspVIIRqwz6qFkvbvr0bholX8IVULCgB7PH4Ip5YJ7B+5yVG587BTVx7xYWH86PyTLxsd76GAdMTKy+hST959yQn3rvvogfhcaAVQ3Dp4qo2RJJtDzdG9bJieOZrb9j+47GtZZE7TA77+OFwt6NHjPUuT9fI0XxIk3H4haaMPFVARKZvuOHPFjRy5Buls9qYE6ZhG8MY4fRRQLEia7lG6eetKLwUq/Fa8LQVjZM+lNNaVp+uBZIX5A2jn3PHsoH9g59HN6N7eP/szlOFKA6R8zE/aQoUt/bKKXcJ5Tn/705dagqyLmlPHAD0cBY+fsSAlS+aV6N7piDH5kwovaoyUgi6P5M6jIDfcy9iieM0UKI4pPIIhA8uUTLNf0mBA34BBsE0w12HfGedFgsOthK1JIMCntcOFTIcJ+M2A49DNzJqjrSINoRBqlsdUubQYBx2oxlPOVQlhvNSOiOpImt3f6Qo7jiP+5tbtFEc0VJIzH9rPOUO3zvbyjhK5OSGpKXh8/FO5TnaHCpJ+WZdQ6fFCMBBcOL8zIg7uIMz2ZNOebWFSbroCXPPkB6z5v6etzWHvDSJBSzozitbiqzDrmug2zaFww+P745uJotp9PLn9TVi0IthVAnLUxxnKP1pqT0YRpFuyiirowTJSVt0opfEJvL06ACdX4h+ e1WKYj/Y /QkNZYo3TjZwrAOZoQ5Q2cUoYLMuU8DYL9dffRqnzbv+c6VQmpOoNuW6VbR+0MJXvP63rCmx4J0nG3NdQDlR0rqkVAYE9MPdtL9ujPefLedrtDnEK9XGLyD+HAJpVX5aHTv/vQSssf4QGfZO5GIKHeHNRHegcjrtUixElkLP6gq4uwxyx3/R8MyGRVOSpxUJJBHSAUiqBMFytTbeXU40s45435/St/nSSUp6JNGTxHoqZ1O2+PUrUBTtEVNxfsanDnsjnc/rXmGAkGL6XYZLHO2xyOWg9yN5cNVZOKaqAf7GXxjKkdftbOb5HIQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Leon Romanovsky As a preparation to remove dma_list, store access mask in PFN pointer and not in dma_addr_t. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 100 +++++++++++---------------- drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + drivers/infiniband/hw/mlx5/odp.c | 37 +++++----- include/rdma/ib_umem_odp.h | 14 +--- 4 files changed, 61 insertions(+), 91 deletions(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index e9fa22d31c23..9dba369365af 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -296,22 +296,11 @@ EXPORT_SYMBOL(ib_umem_odp_release); static int ib_umem_odp_map_dma_single_page( struct ib_umem_odp *umem_odp, unsigned int dma_index, - struct page *page, - u64 access_mask) + struct page *page) { struct ib_device *dev = umem_odp->umem.ibdev; dma_addr_t *dma_addr = &umem_odp->dma_list[dma_index]; - if (*dma_addr) { - /* - * If the page is already dma mapped it means it went through - * a non-invalidating trasition, like read-only to writable. - * Resync the flags. - */ - *dma_addr = (*dma_addr & ODP_DMA_ADDR_MASK) | access_mask; - return 0; - } - *dma_addr = ib_dma_map_page(dev, page, 0, 1 << umem_odp->page_shift, DMA_BIDIRECTIONAL); if (ib_dma_mapping_error(dev, *dma_addr)) { @@ -319,7 +308,6 @@ static int ib_umem_odp_map_dma_single_page( return -EFAULT; } umem_odp->npages++; - *dma_addr |= access_mask; return 0; } @@ -355,9 +343,6 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, struct hmm_range range = {}; unsigned long timeout; - if (access_mask == 0) - return -EINVAL; - if (user_virt < ib_umem_start(umem_odp) || user_virt + bcnt > ib_umem_end(umem_odp)) return -EFAULT; @@ -383,7 +368,7 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, if (fault) { range.default_flags = HMM_PFN_REQ_FAULT; - if (access_mask & ODP_WRITE_ALLOWED_BIT) + if (access_mask & HMM_PFN_WRITE) range.default_flags |= HMM_PFN_REQ_WRITE; } @@ -415,22 +400,17 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, for (pfn_index = 0; pfn_index < num_pfns; pfn_index += 1 << (page_shift - PAGE_SHIFT), dma_index++) { - if (fault) { - /* - * Since we asked for hmm_range_fault() to populate - * pages it shouldn't return an error entry on success. - */ - WARN_ON(range.hmm_pfns[pfn_index] & HMM_PFN_ERROR); - WARN_ON(!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)); - } else { - if (!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)) { - WARN_ON(umem_odp->dma_list[dma_index]); - continue; - } - access_mask = ODP_READ_ALLOWED_BIT; - if (range.hmm_pfns[pfn_index] & HMM_PFN_WRITE) - access_mask |= ODP_WRITE_ALLOWED_BIT; - } + /* + * Since we asked for hmm_range_fault() to populate + * pages it shouldn't return an error entry on success. + */ + WARN_ON(fault && range.hmm_pfns[pfn_index] & HMM_PFN_ERROR); + WARN_ON(fault && !(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)); + if (!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)) + continue; + + if (range.hmm_pfns[pfn_index] & HMM_PFN_DMA_MAPPED) + continue; hmm_order = hmm_pfn_to_map_order(range.hmm_pfns[pfn_index]); /* If a hugepage was detected and ODP wasn't set for, the umem @@ -445,13 +425,13 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, } ret = ib_umem_odp_map_dma_single_page( - umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index]), - access_mask); + umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index])); if (ret < 0) { ibdev_dbg(umem_odp->umem.ibdev, "ib_umem_odp_map_dma_single_page failed with error %d\n", ret); break; } + range.hmm_pfns[pfn_index] |= HMM_PFN_DMA_MAPPED; } /* upon success lock should stay on hold for the callee */ if (!ret) @@ -471,7 +451,6 @@ EXPORT_SYMBOL(ib_umem_odp_map_dma_and_lock); void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, u64 bound) { - dma_addr_t dma_addr; dma_addr_t dma; int idx; u64 addr; @@ -482,34 +461,35 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, virt = max_t(u64, virt, ib_umem_start(umem_odp)); bound = min_t(u64, bound, ib_umem_end(umem_odp)); for (addr = virt; addr < bound; addr += BIT(umem_odp->page_shift)) { + unsigned long pfn_idx = (addr - ib_umem_start(umem_odp)) >> PAGE_SHIFT; + struct page *page = hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); + idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; dma = umem_odp->dma_list[idx]; - /* The access flags guaranteed a valid DMA address in case was NULL */ - if (dma) { - unsigned long pfn_idx = (addr - ib_umem_start(umem_odp)) >> PAGE_SHIFT; - struct page *page = hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); - - dma_addr = dma & ODP_DMA_ADDR_MASK; - ib_dma_unmap_page(dev, dma_addr, - BIT(umem_odp->page_shift), - DMA_BIDIRECTIONAL); - if (dma & ODP_WRITE_ALLOWED_BIT) { - struct page *head_page = compound_head(page); - /* - * set_page_dirty prefers being called with - * the page lock. However, MMU notifiers are - * called sometimes with and sometimes without - * the lock. We rely on the umem_mutex instead - * to prevent other mmu notifiers from - * continuing and allowing the page mapping to - * be removed. - */ - set_page_dirty(head_page); - } - umem_odp->dma_list[idx] = 0; - umem_odp->npages--; + if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_VALID)) + goto clear; + if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_DMA_MAPPED)) + goto clear; + + ib_dma_unmap_page(dev, dma, BIT(umem_odp->page_shift), + DMA_BIDIRECTIONAL); + if (umem_odp->pfn_list[pfn_idx] & HMM_PFN_WRITE) { + struct page *head_page = compound_head(page); + /* + * set_page_dirty prefers being called with + * the page lock. However, MMU notifiers are + * called sometimes with and sometimes without + * the lock. We rely on the umem_mutex instead + * to prevent other mmu notifiers from + * continuing and allowing the page mapping to + * be removed. + */ + set_page_dirty(head_page); } + umem_odp->npages--; +clear: + umem_odp->pfn_list[pfn_idx] &= ~HMM_PFN_FLAGS; } } EXPORT_SYMBOL(ib_umem_odp_unmap_dma_pages); diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 23fd72f7f63d..3e4aaa6319db 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -336,6 +336,7 @@ struct mlx5_ib_flow_db { #define MLX5_IB_UPD_XLT_PD BIT(4) #define MLX5_IB_UPD_XLT_ACCESS BIT(5) #define MLX5_IB_UPD_XLT_INDIRECT BIT(6) +#define MLX5_IB_UPD_XLT_DOWNGRADE BIT(7) /* Private QP creation flags to be passed in ib_qp_init_attr.create_flags. * diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c index 4b37446758fd..78887500ce15 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "mlx5_ib.h" #include "cmd.h" @@ -158,22 +159,12 @@ static void populate_klm(struct mlx5_klm *pklm, size_t idx, size_t nentries, } } -static u64 umem_dma_to_mtt(dma_addr_t umem_dma) -{ - u64 mtt_entry = umem_dma & ODP_DMA_ADDR_MASK; - - if (umem_dma & ODP_READ_ALLOWED_BIT) - mtt_entry |= MLX5_IB_MTT_READ; - if (umem_dma & ODP_WRITE_ALLOWED_BIT) - mtt_entry |= MLX5_IB_MTT_WRITE; - - return mtt_entry; -} - static void populate_mtt(__be64 *pas, size_t idx, size_t nentries, struct mlx5_ib_mr *mr, int flags) { struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem); + bool downgrade = flags & MLX5_IB_UPD_XLT_DOWNGRADE; + unsigned long pfn; dma_addr_t pa; size_t i; @@ -181,8 +172,17 @@ static void populate_mtt(__be64 *pas, size_t idx, size_t nentries, return; for (i = 0; i < nentries; i++) { + pfn = odp->pfn_list[idx + i]; + if (!(pfn & HMM_PFN_VALID)) + /* ODP initialization */ + continue; + pa = odp->dma_list[idx + i]; - pas[i] = cpu_to_be64(umem_dma_to_mtt(pa)); + pa |= MLX5_IB_MTT_READ; + if ((pfn & HMM_PFN_WRITE) && !downgrade) + pa |= MLX5_IB_MTT_WRITE; + + pas[i] = cpu_to_be64(pa); } } @@ -286,8 +286,7 @@ static bool mlx5_ib_invalidate_range(struct mmu_interval_notifier *mni, * estimate the cost of another UMR vs. the cost of bigger * UMR. */ - if (umem_odp->dma_list[idx] & - (ODP_READ_ALLOWED_BIT | ODP_WRITE_ALLOWED_BIT)) { + if (umem_odp->pfn_list[idx] & HMM_PFN_VALID) { if (!in_block) { blk_start_idx = idx; in_block = 1; @@ -668,7 +667,7 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp, { int page_shift, ret, np; bool downgrade = flags & MLX5_PF_FLAGS_DOWNGRADE; - u64 access_mask; + u64 access_mask = 0; u64 start_idx; bool fault = !(flags & MLX5_PF_FLAGS_SNAPSHOT); u32 xlt_flags = MLX5_IB_UPD_XLT_ATOMIC; @@ -676,12 +675,14 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp, if (flags & MLX5_PF_FLAGS_ENABLE) xlt_flags |= MLX5_IB_UPD_XLT_ENABLE; + if (flags & MLX5_PF_FLAGS_DOWNGRADE) + xlt_flags |= MLX5_IB_UPD_XLT_DOWNGRADE; + page_shift = odp->page_shift; start_idx = (user_va - ib_umem_start(odp)) >> page_shift; - access_mask = ODP_READ_ALLOWED_BIT; if (odp->umem.writable && !downgrade) - access_mask |= ODP_WRITE_ALLOWED_BIT; + access_mask |= HMM_PFN_WRITE; np = ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask, fault); if (np < 0) diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index 0844c1d05ac6..a345c26a745d 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -8,6 +8,7 @@ #include #include +#include struct ib_umem_odp { struct ib_umem umem; @@ -67,19 +68,6 @@ static inline size_t ib_umem_odp_num_pages(struct ib_umem_odp *umem_odp) umem_odp->page_shift; } -/* - * The lower 2 bits of the DMA address signal the R/W permissions for - * the entry. To upgrade the permissions, provide the appropriate - * bitmask to the map_dma_pages function. - * - * Be aware that upgrading a mapped address might result in change of - * the DMA address for the page. - */ -#define ODP_READ_ALLOWED_BIT (1<<0ULL) -#define ODP_WRITE_ALLOWED_BIT (1<<1ULL) - -#define ODP_DMA_ADDR_MASK (~(ODP_READ_ALLOWED_BIT | ODP_WRITE_ALLOWED_BIT)) - #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING struct ib_umem_odp *