From patchwork Tue Jul 2 09:09:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719223 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A69E915098E; Tue, 2 Jul 2024 09:10:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911404; cv=none; b=IU944rl1mktqT9IjI1CrlHpz5q3HIvDViV3knKIHfWhwLGZOW/W/XuwsxJyiKHT1/RQQeHQhTYaHzRfzQXRRf0rn1MdhdY41eQMuiDHJGKbzyn0ZHrphKHJSHJ5tk2OpJId79i+6d2J57jqWWD6/wreLZBBQqFPlaoEmjuZuJxI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911404; c=relaxed/simple; bh=PsDcRtwRrZy3xcTFmq5V+v2aFvC1FX6fMcSIsPNIgJ4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jSM1700hKeyJbnlOYgEpF9OY0uo/XyUgWdxflTcMhDicoq9VJz+VkmsGwXdgzrwDVE7uJisaR5eM5MJRzRoUzLwaFwSTmJh9IfwL6O1JSWmgmm1JLeE1nGj5GJWDI1ebVxUwmQCf2IMJAiMVEg4nbcnG1kiY9UbqMT8uWykh1GE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WdB+vVAS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WdB+vVAS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97D44C116B1; Tue, 2 Jul 2024 09:10:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911404; bh=PsDcRtwRrZy3xcTFmq5V+v2aFvC1FX6fMcSIsPNIgJ4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WdB+vVAS8uHPdPme4myJO22f4FbGRcG1dHDJHwEW7t+ki8CzsXtGbIJKrwNZzL+oN HZhggaNWlUq2LDjYroZErlO93jF4y+zjRYe2QMVqmzz1MDxk1Rl8xeJex/5Q1jOGA8 kNM7rlYFNxKQypQjyLEv1W8ppPwmxff79s/KS7zpU97+5kGBVsrGR36Z/Ui8WyxfjN +2WhW3GmeKGr7A5xpE0coC6WO/48R8aI2ccSdCQQ38AntHiLnCHjCF0ItXO1X8/fim N1g17gLjq4qvdcY6NksGgFPtVrvn0kKJnJryFF33jW+ntCtDHOmI6RkHi12DWkTu0B MTNXNHg+0HJIQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 01/18] dma-mapping: query DMA memory type Date: Tue, 2 Jul 2024 12:09:31 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Provide an option to query and set DMA memory type so callers who supply range of pages can perform it only once as the whole range is supposed to have same memory type. Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 20 ++++++++++++++++++++ kernel/dma/mapping.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index f693aafe221f..49b99c6e7ec5 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -76,6 +76,20 @@ #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) +enum dma_memory_types { + /* Normal memory without any extra properties like P2P, e.t.c */ + DMA_MEMORY_TYPE_NORMAL, + /* Memory which is p2p capable */ + DMA_MEMORY_TYPE_P2P, + /* Encrypted memory (TDX) */ + DMA_MEMORY_TYPE_ENCRYPTED, +}; + +struct dma_memory_type { + enum dma_memory_types type; + struct dev_pagemap *p2p_pgmap; +}; + #ifdef CONFIG_DMA_API_DEBUG void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); void debug_dma_map_single(struct device *dev, const void *addr, @@ -149,6 +163,8 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t size, void dma_vunmap_noncontiguous(struct device *dev, void *vaddr); int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma, size_t size, struct sg_table *sgt); + +void dma_get_memory_type(struct page *page, struct dma_memory_type *type); #else /* CONFIG_HAS_DMA */ static inline dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, @@ -279,6 +295,10 @@ static inline int dma_mmap_noncontiguous(struct device *dev, { return -EINVAL; } +static inline void dma_get_memory_type(struct page *page, + struct dma_memory_type *type) +{ +} #endif /* CONFIG_HAS_DMA */ #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 81de84318ccc..877e43b39c06 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -6,6 +6,7 @@ * Copyright (c) 2006 Tejun Heo */ #include /* for max_pfn */ +#include #include #include #include @@ -14,6 +15,7 @@ #include #include #include +#include #include "debug.h" #include "direct.h" @@ -894,3 +896,31 @@ unsigned long dma_get_merge_boundary(struct device *dev) return ops->get_merge_boundary(dev); } EXPORT_SYMBOL_GPL(dma_get_merge_boundary); + +/** + * dma_get_memory_type - get the DMA memory type of the page supplied + * @page: page to check + * @type: memory type of that page + * + * Return the DMA memory type for the struct page. Pages with the same + * memory type can be combined into the same IOVA mapping. Users of the + * dma_iova family of functions must seperate the memory they want to map + * into same-memory type ranges. + */ +void dma_get_memory_type(struct page *page, struct dma_memory_type *type) +{ + /* TODO: Rewrite this check to rely on specific struct page flags */ + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) { + type->type = DMA_MEMORY_TYPE_ENCRYPTED; + return; + } + + if (is_pci_p2pdma_page(page)) { + type->type = DMA_MEMORY_TYPE_P2P; + type->p2p_pgmap = page->pgmap; + return; + } + + type->type = DMA_MEMORY_TYPE_NORMAL; +} +EXPORT_SYMBOL_GPL(dma_get_memory_type); From patchwork Tue Jul 2 09:09:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719225 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9DBA15216E; Tue, 2 Jul 2024 09:10:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911412; cv=none; b=NEb8NAZ2OrkYI9Ne+2FQa8KFJIiW5uzZ0UpvTaC+M1aKj82ElzjfGkEJuTBzIsaXc1LPVRp1B1BTj1zkgXtRkVOxz+/Wnt49XULSH/9t5DDQRTbYc3vORZUfmsMPUbYfE/HBOQxpcav/OslmwbYy4T02eA28ONH9bOc42FFdnDE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911412; c=relaxed/simple; bh=GdqxFreVifTUH7yLunaT0Kp4VH1OcNBKz7jklxCA7IE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YohiMKedg5IqO9IeGx9efMcKnGkKrRHB2C7qHB5EBxb+/WTecbP/aSNLWsbBiLm+WSNN3+Cn1S/fTS0ZZG0XhvGFXmHX6xQlKuRfOP8APMiroqhinSYwT9VMGp7Y3oAfEy24kcNeZoDkbgNxMSUs9r+JrMUKTNnGRXRM5zlgHXw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jZsLMwcx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jZsLMwcx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB762C116B1; Tue, 2 Jul 2024 09:10:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911412; bh=GdqxFreVifTUH7yLunaT0Kp4VH1OcNBKz7jklxCA7IE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jZsLMwcxfIhbNdXAH7Erajavnd9zt8wZwqJVAhbFF8wuHrgj9ZFLnS0M824YeY1WI lsBBR7Huv75yN3CNdjq3wGOj6d6XRdgf43l4OFc9s6OcECQ4HjS13MEjvrsvp/qDtC V9bMyC4eCtQlzz7bAocuoWalQBZeKMuoUk0+t2JOvKUiwHjZ7NOHkhxRQ2bfGbsofJ 9EiPqR5DPCSASk+2Ji+dPRehAt5lU8Xk41chq5PYN3dkl0zeRpGP7rNQIh7zxcQ/Ux TOXwMCzKrQG/bRPwo4SyChwOnWE10AWLJZqnWPQAFqWJhu2cqfYHV+DULsrE24NbAI pEN9z2lU0w7oA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 02/18] dma-mapping: provide an interface to allocate IOVA Date: Tue, 2 Jul 2024 12:09:32 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Existing .map_page() callback provides two things at the same time: allocates IOVA and links DMA pages. That combination works great for most of the callers who use it in control paths, but less effective in fast paths. These advanced callers already manage their data in some sort of database and can perform IOVA allocation in advance, leaving range linkage operation to be in fast path. Provide an interface to allocate/deallocate IOVA and next patch link/unlink DMA ranges to that specific IOVA. Signed-off-by: Leon Romanovsky --- include/linux/dma-map-ops.h | 3 +++ include/linux/dma-mapping.h | 20 +++++++++++++++++ kernel/dma/mapping.c | 44 +++++++++++++++++++++++++++++++++++++ 3 files changed, 67 insertions(+) diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 02a1c825896b..23e5e2f63a1c 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -86,6 +86,9 @@ struct dma_map_ops { size_t (*max_mapping_size)(struct device *dev); size_t (*opt_mapping_size)(void); unsigned long (*get_merge_boundary)(struct device *dev); + + dma_addr_t (*alloc_iova)(struct device *dev, size_t size); + void (*free_iova)(struct device *dev, dma_addr_t dma_addr, size_t size); }; #ifdef CONFIG_DMA_OPS diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 49b99c6e7ec5..673ddcf140ff 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -90,6 +90,16 @@ struct dma_memory_type { struct dev_pagemap *p2p_pgmap; }; +struct dma_iova_attrs { + /* OUT field */ + dma_addr_t addr; + /* IN fields */ + struct device *dev; + size_t size; + enum dma_data_direction dir; + unsigned long attrs; +}; + #ifdef CONFIG_DMA_API_DEBUG void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); void debug_dma_map_single(struct device *dev, const void *addr, @@ -115,6 +125,9 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) return 0; } +int dma_alloc_iova(struct dma_iova_attrs *iova); +void dma_free_iova(struct dma_iova_attrs *iova); + dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs); @@ -166,6 +179,13 @@ int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma, void dma_get_memory_type(struct page *page, struct dma_memory_type *type); #else /* CONFIG_HAS_DMA */ +static inline int dma_alloc_iova(struct dma_iova_attrs *iova) +{ + return -EOPNOTSUPP; +} +static inline void dma_free_iova(struct dma_iova_attrs *iova) +{ +} static inline dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 877e43b39c06..0c8f51010d08 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -924,3 +924,47 @@ void dma_get_memory_type(struct page *page, struct dma_memory_type *type) type->type = DMA_MEMORY_TYPE_NORMAL; } EXPORT_SYMBOL_GPL(dma_get_memory_type); + +/** + * dma_alloc_iova - Allocate an IOVA space + * @iova: IOVA attributes + * + * Allocate an IOVA space for the given IOVA attributes. The IOVA space + * is allocated to the worst case when whole range is going to be used. + */ +int dma_alloc_iova(struct dma_iova_attrs *iova) +{ + struct device *dev = iova->dev; + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (dma_map_direct(dev, ops) || !ops->alloc_iova) { + /* dma_map_direct(..) check is for HMM range fault callers */ + iova->addr = 0; + return 0; + } + + iova->addr = ops->alloc_iova(dev, iova->size); + if (dma_mapping_error(dev, iova->addr)) + return -ENOMEM; + + return 0; +} +EXPORT_SYMBOL_GPL(dma_alloc_iova); + +/** + * dma_free_iova - Free an IOVA space + * @iova: IOVA attributes + * + * Free an IOVA space for the given IOVA attributes. + */ +void dma_free_iova(struct dma_iova_attrs *iova) +{ + struct device *dev = iova->dev; + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (dma_map_direct(dev, ops) || !ops->free_iova || !iova->addr) + return; + + ops->free_iova(dev, iova->addr, iova->size); +} +EXPORT_SYMBOL_GPL(dma_free_iova); From patchwork Tue Jul 2 09:09:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719224 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEB0115098E; Tue, 2 Jul 2024 09:10:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911409; cv=none; b=P5SnoL6WtKPb8VbRWd/5Evjo52Rh1+xvkOFiCcTz5hLpkLgEZwbLezkVr2CGHCOUBM+4E6B5wuu4CPViU6Am4+9z3IvKwV6SrZTXzyn6cTrbxCnGEHJ+a0di8k0g7x3BQFEscI0cFNE/I7goago/FEK4eglg5eU6H63ocGhR264= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911409; c=relaxed/simple; bh=I7mD3dENF4HZBNBumoLyPjANfOtz0e/DAq7U9sRqLk0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F6HrbetSOL8cQw0MHFDrfLfzwtQj9Au19yHL3h/aDOL0fkTaOWVS7ckEjjUnmq0VIK0bMdz0rrGYjVawBx7+cG96agAHTTp3KILgVXy7iW8YSHQNC+NVgyxVoHO+eZibIMAjM0Anr9QM1jh5IPVZeeN8tu0bceqMZSpw/x06B9E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PqzjMLOj; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PqzjMLOj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9DCD7C4AF0C; Tue, 2 Jul 2024 09:10:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911408; bh=I7mD3dENF4HZBNBumoLyPjANfOtz0e/DAq7U9sRqLk0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PqzjMLOjXeye9zq/mHzTj6Cl4RxSudqszFKsPx54AmE/tJ+OHgWu5ivCnN1LbBRUN 1Yv5vxJ4cXclv2ETPV3ZWWpK913fTcXr7yVC/nJo7AtJ6rnu3SJBTnF9+JY89SOEvd TYkrX534Vg7HQIcEHujxARKgvk/nUOly8dQEFeqHVLZriAax5EaH09H4nUGZ/y8Loi vDIjzYbeVRyYivHIMQkPD/nstZnUBJhmQWQx2JPLSSZymUewhzPZooZ7yJC7ufM6es ONXggsrnv88nbqTGWDYThjoAZxFxhMpavbb+IIZpeIVu2VNBfqxKFRLDEd3u7X+sYd zohrHgWstJ+9w== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanvosky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 03/18] dma-mapping: check if IOVA can be used Date: Tue, 2 Jul 2024 12:09:33 +0300 Message-ID: <4c479ac482c3bd123a5f999fdff46454a7faa905.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanvosky Provide a way to the callers to see if IOVA can be used for specific DMA memory type. Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 13 ------------- drivers/pci/p2pdma.c | 4 ++-- include/linux/dma-map-ops.h | 21 +++++++++++++++++++++ include/linux/dma-mapping.h | 10 ++++++++++ kernel/dma/mapping.c | 32 ++++++++++++++++++++++++++++++++ 5 files changed, 65 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 43520e7275cc..89e34503e0bb 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -597,19 +597,6 @@ static int iova_reserve_iommu_regions(struct device *dev, return ret; } -static bool dev_is_untrusted(struct device *dev) -{ - return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; -} - -static bool dev_use_swiotlb(struct device *dev, size_t size, - enum dma_data_direction dir) -{ - return IS_ENABLED(CONFIG_SWIOTLB) && - (dev_is_untrusted(dev) || - dma_kmalloc_needs_bounce(dev, size, dir)); -} - static bool dev_use_sg_swiotlb(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 4f47a13cb500..6ceea32bb041 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -964,8 +964,8 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) } EXPORT_SYMBOL_GPL(pci_p2pmem_publish); -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, - struct device *dev) +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, + struct device *dev) { enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED; struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider; diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 23e5e2f63a1c..b52e9c8db241 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -9,6 +9,7 @@ #include #include #include +#include struct cma; struct iommu_ops; @@ -348,6 +349,19 @@ static inline bool dma_kmalloc_needs_bounce(struct device *dev, size_t size, return !dma_kmalloc_safe(dev, dir) && !dma_kmalloc_size_aligned(size); } +static inline bool dev_is_untrusted(struct device *dev) +{ + return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; +} + +static inline bool dev_use_swiotlb(struct device *dev, size_t size, + enum dma_data_direction dir) +{ + return IS_ENABLED(CONFIG_SWIOTLB) && + (dev_is_untrusted(dev) || + dma_kmalloc_needs_bounce(dev, size, dir)); +} + void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, @@ -514,6 +528,8 @@ struct pci_p2pdma_map_state { enum pci_p2pdma_map_type pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, struct scatterlist *sg); +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, + struct device *dev); #else /* CONFIG_PCI_P2PDMA */ static inline enum pci_p2pdma_map_type pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, @@ -521,6 +537,11 @@ pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, { return PCI_P2PDMA_MAP_NOT_SUPPORTED; } +static inline enum pci_p2pdma_map_type +pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev) +{ + return PCI_P2PDMA_MAP_NOT_SUPPORTED; +} #endif /* CONFIG_PCI_P2PDMA */ #endif /* _LINUX_DMA_MAP_OPS_H */ diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 673ddcf140ff..9d1e020869a6 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -100,6 +100,11 @@ struct dma_iova_attrs { unsigned long attrs; }; +struct dma_iova_state { + struct dma_iova_attrs *iova; + struct dma_memory_type *type; +}; + #ifdef CONFIG_DMA_API_DEBUG void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); void debug_dma_map_single(struct device *dev, const void *addr, @@ -178,6 +183,7 @@ int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma, size_t size, struct sg_table *sgt); void dma_get_memory_type(struct page *page, struct dma_memory_type *type); +bool dma_can_use_iova(struct dma_iova_state *state, size_t size); #else /* CONFIG_HAS_DMA */ static inline int dma_alloc_iova(struct dma_iova_attrs *iova) { @@ -319,6 +325,10 @@ static inline void dma_get_memory_type(struct page *page, struct dma_memory_type *type) { } +static inline bool dma_can_use_iova(struct dma_iova_state *state, size_t size) +{ + return false; +} #endif /* CONFIG_HAS_DMA */ #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 0c8f51010d08..9044ee525fdb 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -968,3 +968,35 @@ void dma_free_iova(struct dma_iova_attrs *iova) ops->free_iova(dev, iova->addr, iova->size); } EXPORT_SYMBOL_GPL(dma_free_iova); + +/** + * dma_can_use_iova - check if the device type is valid + * and won't take SWIOTLB path + * @state: IOVA state + * @size: size of the buffer + * + * Return %true if the device should use swiotlb for the given buffer, else + * %false. + */ +bool dma_can_use_iova(struct dma_iova_state *state, size_t size) +{ + struct device *dev = state->iova->dev; + const struct dma_map_ops *ops = get_dma_ops(dev); + struct dma_memory_type *type = state->type; + enum pci_p2pdma_map_type map; + + if (is_swiotlb_force_bounce(dev) || + dev_use_swiotlb(dev, size, state->iova->dir)) + return false; + + if (dma_map_direct(dev, ops) || !ops->alloc_iova) + return false; + + if (type->type == DMA_MEMORY_TYPE_P2P) { + map = pci_p2pdma_map_type(type->p2p_pgmap, dev); + return map == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE; + } + + return type->type == DMA_MEMORY_TYPE_NORMAL; +} +EXPORT_SYMBOL_GPL(dma_can_use_iova); From patchwork Tue Jul 2 09:09:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719230 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0D5615445D; Tue, 2 Jul 2024 09:10:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911434; cv=none; b=p9Qj2HlUMhyxn1jUIkAf1hHAX5qOMYxSV6wbEaNv9wH2EgFaovaVMX4PspIKpSqQKLlXpvncArhJrsS6vU4Q3us30a4Xp4RPJzFV/6wtx0Hpu5sbQQulCXvLQ1AExb5WCgN4gy8GlfxQ6gGfVYEG8dTBgHegBYajR28cYSw7rnU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911434; c=relaxed/simple; bh=0ck5rd5jPLMOP1nHX+50lNZNnmmRrxabk/SQIbsa6U4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tSvHCDVfWf6h+pqyAmdGraPAF8MRBkUAcKDmUy1D+dUt8zo3OHtrJDZM7F+H5gIWlTjT7UrcLxBYfLBbKva5zilmoRDrfs2UmuqbCirVZ10puoSqg5/UNJAxY/WOIPcx9DbbMz7SIeZaiRS+XZ7SSgDpUClVd/IKA7Y8WT7YuGs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YYKS1Ot/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YYKS1Ot/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AD899C4AF0A; Tue, 2 Jul 2024 09:10:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911433; bh=0ck5rd5jPLMOP1nHX+50lNZNnmmRrxabk/SQIbsa6U4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YYKS1Ot/toIO1bChF3hjNIXQnLRGAFZW60cYZGmLLcwlJGabkFTKyyB9S4UjzMRw4 OWdB+gcPZfiFupAaORoIq2eLPt1FH+wTYrENPy5qAeVHARSxgi1nIntNE+I24Dz9VP HGwRhb2VAiNoV6XRMQ24UVg2ee/gd+HohgenhbH1Ffqm3bb0/puHq59LU7IGpEL/vk sVPeLGHK/A9Eqt9VnWOf9HrTGNvjmQUyWmuLi3tGXthIBjVPGtIFwFF4zX749XQYhs NGGEjZMQlBISbiHromVsoVz6nukh5hle9SkHC3Rj4ksdesgD1SzycXc0OVeUTNAhcG ADXWSXiEd0N1A== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 04/18] dma-mapping: implement link range API Date: Tue, 2 Jul 2024 12:09:34 +0300 Message-ID: <8944a1211b243fed1234a56bc8004a11dbf85a87.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Introduce new DMA APIs to perform DMA linkage of buffers in layers higher than DMA. In proposed API, the callers will perform the following steps: dma_alloc_iova() if (dma_can_use_iova(...)) dma_start_range(...) for (page in range) dma_link_range(...) dma_end_range(...) else /* Fallback to legacy map pages */ dma_map_page(...) Signed-off-by: Leon Romanovsky --- include/linux/dma-map-ops.h | 6 +++ include/linux/dma-mapping.h | 22 +++++++++++ kernel/dma/mapping.c | 78 ++++++++++++++++++++++++++++++++++++- 3 files changed, 105 insertions(+), 1 deletion(-) diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index b52e9c8db241..4868586b015e 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -90,6 +90,12 @@ struct dma_map_ops { dma_addr_t (*alloc_iova)(struct device *dev, size_t size); void (*free_iova)(struct device *dev, dma_addr_t dma_addr, size_t size); + int (*link_range)(struct dma_iova_state *state, phys_addr_t phys, + dma_addr_t addr, size_t size); + void (*unlink_range)(struct dma_iova_state *state, + dma_addr_t dma_handle, size_t size); + int (*start_range)(struct dma_iova_state *state); + void (*end_range)(struct dma_iova_state *state); }; #ifdef CONFIG_DMA_OPS diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 9d1e020869a6..c530095ff232 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -11,6 +11,7 @@ #include #include #include +#include /** * List of possible attributes associated with a DMA mapping. The semantics @@ -103,6 +104,8 @@ struct dma_iova_attrs { struct dma_iova_state { struct dma_iova_attrs *iova; struct dma_memory_type *type; + struct iommu_domain *domain; + size_t range_size; }; #ifdef CONFIG_DMA_API_DEBUG @@ -184,6 +187,10 @@ int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma, void dma_get_memory_type(struct page *page, struct dma_memory_type *type); bool dma_can_use_iova(struct dma_iova_state *state, size_t size); +int dma_start_range(struct dma_iova_state *state); +void dma_end_range(struct dma_iova_state *state); +int dma_link_range(struct dma_iova_state *state, phys_addr_t phys, size_t size); +void dma_unlink_range(struct dma_iova_state *state); #else /* CONFIG_HAS_DMA */ static inline int dma_alloc_iova(struct dma_iova_attrs *iova) { @@ -329,6 +336,21 @@ static inline bool dma_can_use_iova(struct dma_iova_state *state, size_t size) { return false; } +static inline int dma_start_range(struct dma_iova_state *state) +{ + return -EOPNOTSUPP; +} +static inline void dma_end_range(struct dma_iova_state *state) +{ +} +static inline int dma_link_range(struct dma_iova_state *state, phys_addr_t phys, + size_t size) +{ + return -EOPNOTSUPP; +} +static inline void dma_unlink_range(struct dma_iova_state *state) +{ +} #endif /* CONFIG_HAS_DMA */ #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 9044ee525fdb..089b4a977bab 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -989,7 +989,8 @@ bool dma_can_use_iova(struct dma_iova_state *state, size_t size) dev_use_swiotlb(dev, size, state->iova->dir)) return false; - if (dma_map_direct(dev, ops) || !ops->alloc_iova) + if (dma_map_direct(dev, ops) || !ops->alloc_iova || !ops->link_range || + !ops->start_range) return false; if (type->type == DMA_MEMORY_TYPE_P2P) { @@ -1000,3 +1001,78 @@ bool dma_can_use_iova(struct dma_iova_state *state, size_t size) return type->type == DMA_MEMORY_TYPE_NORMAL; } EXPORT_SYMBOL_GPL(dma_can_use_iova); + +/** + * dma_start_range - Start a range of IOVA space + * @state: IOVA state + * + * Start a range of IOVA space for the given IOVA state. + */ +int dma_start_range(struct dma_iova_state *state) +{ + struct device *dev = state->iova->dev; + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (!ops->start_range) + return 0; + + return ops->start_range(state); +} +EXPORT_SYMBOL_GPL(dma_start_range); + +/** + * dma_end_range - End a range of IOVA space + * @state: IOVA state + * + * End a range of IOVA space for the given IOVA state. + */ +void dma_end_range(struct dma_iova_state *state) +{ + struct device *dev = state->iova->dev; + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (!ops->end_range) + return; + + ops->end_range(state); +} +EXPORT_SYMBOL_GPL(dma_end_range); + +/** + * dma_link_range - Link a range of IOVA space + * @state: IOVA state + * @phys: physical address to link + * @size: size of the buffer + * + * Link a range of IOVA space for the given IOVA state. + */ +int dma_link_range(struct dma_iova_state *state, phys_addr_t phys, size_t size) +{ + struct device *dev = state->iova->dev; + dma_addr_t addr = state->iova->addr + state->range_size; + const struct dma_map_ops *ops = get_dma_ops(dev); + int ret; + + ret = ops->link_range(state, phys, addr, size); + if (ret) + return ret; + + state->range_size += size; + return 0; +} +EXPORT_SYMBOL_GPL(dma_link_range); + +/** + * dma_unlink_range - Unlink a range of IOVA space + * @state: IOVA state + * + * Unlink a range of IOVA space for the given IOVA state. + */ +void dma_unlink_range(struct dma_iova_state *state) +{ + struct device *dev = state->iova->dev; + const struct dma_map_ops *ops = get_dma_ops(dev); + + ops->unlink_range(state, state->iova->addr, state->range_size); +} +EXPORT_SYMBOL_GPL(dma_unlink_range); From patchwork Tue Jul 2 09:09:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719226 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B81415250C; Tue, 2 Jul 2024 09:10:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911417; cv=none; b=tFEDaRbXU/0QeO8bstGEptPJNr/y/auX+ucm4cGjC0trxCctIXGaRxMelRSVg5Zmwk0ICIc3//uMnt4jQWNBeLdFz3hZfGA72rVnibCSKuLUkg9IFJZrXHLWpATrYFp5L9VCCL5hhyfvNfmFEIDBYouDB17m6iTJ8jCgT8gRrXM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911417; c=relaxed/simple; bh=r8LbBOjlbH4F69KVmFsNaHjJ/rODIFYYIWRpcAsJOVI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ts0E8LPx3qQz8m8dY+KfJlFDZrmGjISCethTVYlQtEKe4WCWJ9fC1l58SLOo1U6iEWUPRnHE/MwB5asfpUbRAnWKC+k9b5OieM/8f8EFAcKMh5L9KZOBwsITaymu4hGvFPkr7jOg5H/Ny8XkWfmDVcZlrsZu8ZwJb0XYEEt70sw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DERWykst; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DERWykst" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E8E09C4AF0E; Tue, 2 Jul 2024 09:10:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911416; bh=r8LbBOjlbH4F69KVmFsNaHjJ/rODIFYYIWRpcAsJOVI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DERWykstv7XxGRlK5d+9zV0EQM0MK52hTvvioFtFjlLgY6y7WByKUa78ayxSb4kv4 I+xUVII03CJetjfBvTMv9CMMhQJfzVzLz89kaGypwXqqSFHhbw4Gb2EbFJGDnfX17l 0aC4Byhh+gWLSP7xqntoErbx/0yPnHiKuF5CB/Xs9/iSM9GN7ebAkO6YOTDfK+Sr8y dJtkIzyXzi0gzX+IoPLVGBHI3F2b9pwZ+KO31UXjhbL9I3Iuq75SeGIsrKaVl1FT3q sHBmNAh3u7+dpLJB9+kZkDVxWvxbuHf4tfrlHP2fR8bQCt2MPqKJJkSIF05sdFYODC r+/wFzkIKdacg== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 05/18] mm/hmm: let users to tag specific PFN with DMA mapped bit Date: Tue, 2 Jul 2024 12:09:35 +0300 Message-ID: <769b6266d5e8638322f25550dd01a85515bf9d08.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Introduce new sticky flag (HMM_PFN_DMA_MAPPED), which isn't overwritten by HMM range fault. Such flag allows users to tag specific PFNs with information if this specific PFN was already DMA mapped. Signed-off-by: Leon Romanovsky --- include/linux/hmm.h | 4 ++++ mm/hmm.c | 34 +++++++++++++++++++++------------- 2 files changed, 25 insertions(+), 13 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 126a36571667..2999697db83a 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -23,6 +23,8 @@ struct mmu_interval_notifier; * HMM_PFN_WRITE - if the page memory can be written to (requires HMM_PFN_VALID) * HMM_PFN_ERROR - accessing the pfn is impossible and the device should * fail. ie poisoned memory, special pages, no vma, etc + * HMM_PFN_DMA_MAPPED - Flag preserved on input-to-output transformation + * to mark that page is already DMA mapped * * On input: * 0 - Return the current state of the page, do not fault it. @@ -36,6 +38,8 @@ enum hmm_pfn_flags { HMM_PFN_VALID = 1UL << (BITS_PER_LONG - 1), HMM_PFN_WRITE = 1UL << (BITS_PER_LONG - 2), HMM_PFN_ERROR = 1UL << (BITS_PER_LONG - 3), + /* Sticky lag, carried from Input to Output */ + HMM_PFN_DMA_MAPPED = 1UL << (BITS_PER_LONG - 7), HMM_PFN_ORDER_SHIFT = (BITS_PER_LONG - 8), /* Input flags */ diff --git a/mm/hmm.c b/mm/hmm.c index 93aebd9cc130..03aeb9929d9e 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -44,8 +44,10 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end, { unsigned long i = (addr - range->start) >> PAGE_SHIFT; - for (; addr < end; addr += PAGE_SIZE, i++) - range->hmm_pfns[i] = cpu_flags; + for (; addr < end; addr += PAGE_SIZE, i++) { + range->hmm_pfns[i] &= HMM_PFN_DMA_MAPPED; + range->hmm_pfns[i] |= cpu_flags; + } return 0; } @@ -202,8 +204,10 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, return hmm_vma_fault(addr, end, required_fault, walk); pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) - hmm_pfns[i] = pfn | cpu_flags; + for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) { + hmm_pfns[i] &= HMM_PFN_DMA_MAPPED; + hmm_pfns[i] |= pfn | cpu_flags; + } return 0; } #else /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -236,7 +240,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) goto fault; - *hmm_pfn = 0; + *hmm_pfn = *hmm_pfn & HMM_PFN_DMA_MAPPED; return 0; } @@ -253,14 +257,14 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, cpu_flags = HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) cpu_flags |= HMM_PFN_WRITE; - *hmm_pfn = swp_offset_pfn(entry) | cpu_flags; + *hmm_pfn = (*hmm_pfn & HMM_PFN_DMA_MAPPED) | swp_offset_pfn(entry) | cpu_flags; return 0; } required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (!required_fault) { - *hmm_pfn = 0; + *hmm_pfn = *hmm_pfn & HMM_PFN_DMA_MAPPED; return 0; } @@ -304,11 +308,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, pte_unmap(ptep); return -EFAULT; } - *hmm_pfn = HMM_PFN_ERROR; + *hmm_pfn = (*hmm_pfn & HMM_PFN_DMA_MAPPED) | HMM_PFN_ERROR; return 0; } - *hmm_pfn = pte_pfn(pte) | cpu_flags; + *hmm_pfn = (*hmm_pfn & HMM_PFN_DMA_MAPPED) | pte_pfn(pte) | cpu_flags; return 0; fault: @@ -448,8 +452,10 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, } pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - for (i = 0; i < npages; ++i, ++pfn) - hmm_pfns[i] = pfn | cpu_flags; + for (i = 0; i < npages; ++i, ++pfn) { + hmm_pfns[i] &= HMM_PFN_DMA_MAPPED; + hmm_pfns[i] |= pfn | cpu_flags; + } goto out_unlock; } @@ -507,8 +513,10 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); - for (; addr < end; addr += PAGE_SIZE, i++, pfn++) - range->hmm_pfns[i] = pfn | cpu_flags; + for (; addr < end; addr += PAGE_SIZE, i++, pfn++) { + range->hmm_pfns[i] &= HMM_PFN_DMA_MAPPED; + range->hmm_pfns[i] |= pfn | cpu_flags; + } spin_unlock(ptl); return 0; From patchwork Tue Jul 2 09:09:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719227 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61D38152DF1; Tue, 2 Jul 2024 09:10:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911421; cv=none; b=MZYtaARbHtyOVI5amA2tT1YQzIp/CTwztMSev+Ytrj/0Kx3osE4NadCBBzrrmEAOkag2iqcwbRud25a4no3qQCXVCgGSMjsfEVauxtYjdmFahCSntqB/v88KXYDmq8nfwCVKHKP+AmdZy3W1FnWCtjlgU3iK8xL5R6Qs01xfbbM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911421; c=relaxed/simple; bh=SqRJkswAXpWAsD53KsFAbDwitxXp1HHK8jZehp+jNVQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=slqbCwC5qULopmY9+Jd6zuLTPZSxwA3s9BE0W+PLBGBS9h2B28nEUR7kXDv2F0s6UwoCyge40OnkO3yupoCkluMyJkAbksJI8yEh5W8rG/HovQRoLNkBDk8/zpY7/V1xnUdBQ8EpYXcN8BTGJ0ndVWbpz+EtGzP8WaustzYJTRA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=B0P6xSie; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="B0P6xSie" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26C0AC4AF0D; Tue, 2 Jul 2024 09:10:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911420; bh=SqRJkswAXpWAsD53KsFAbDwitxXp1HHK8jZehp+jNVQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=B0P6xSien3K1c31vVmNKC6qS649bWpjI5mYxXWH8emcmDGg2eigH8FA2DdzWikUeV OTGCsKzhgCxVn0yFExmtnvDeYvSdw03wWMH3YCkE0Is5H/+FrihVahL3vShxIXpt5H ekIE/hI3wHawgVD7zRlm3769EJWxwN6cjJq9LjWDY8CwX07zGC8TIt4ox3kCloi44h 8tByePaihSYoAZbPDlvArG+KbAuvnYoS6wLwWd3uqYtGr/qXHBBtoAFke7v4CagF12 +Ba7rdX8nQKBDA4nkUOmts95lMSEr2/fvOG5MxU0r8Xs/bA45+3GA81F9bXChbQUkP XrFGs/s9NsLMw== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 06/18] dma-mapping: provide callbacks to link/unlink HMM PFNs to specific IOVA Date: Tue, 2 Jul 2024 12:09:36 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Introduce new DMA link/unlink API to provide a way for HMM users to link pages to already preallocated IOVA. Signed-off-by: Leon Romanovsky --- include/linux/dma-mapping.h | 15 +++++ kernel/dma/mapping.c | 108 ++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index c530095ff232..2578b6615a2f 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -135,6 +135,10 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) int dma_alloc_iova(struct dma_iova_attrs *iova); void dma_free_iova(struct dma_iova_attrs *iova); +dma_addr_t dma_hmm_link_page(unsigned long *pfn, struct dma_iova_attrs *iova, + dma_addr_t dma_offset); +void dma_hmm_unlink_page(unsigned long *pfn, struct dma_iova_attrs *iova, + dma_addr_t dma_offset); dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, @@ -199,6 +203,17 @@ static inline int dma_alloc_iova(struct dma_iova_attrs *iova) static inline void dma_free_iova(struct dma_iova_attrs *iova) { } +static inline dma_addr_t dma_hmm_link_page(unsigned long *pfn, + struct dma_iova_attrs *iova, + dma_addr_t dma_offset) +{ + return DMA_MAPPING_ERROR; +} +static inline void dma_hmm_unlink_page(unsigned long *pfn, + struct dma_iova_attrs *iova, + dma_addr_t dma_offset) +{ +} static inline dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 089b4a977bab..69c431bd89e6 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "debug.h" #include "direct.h" @@ -1076,3 +1077,110 @@ void dma_unlink_range(struct dma_iova_state *state) ops->unlink_range(state, state->iova->addr, state->range_size); } EXPORT_SYMBOL_GPL(dma_unlink_range); + +/** + * dma_hmm_link_page - Link a physical HMM page to DMA address + * @pfn: HMM PFN + * @iova: Preallocated IOVA space + * @dma_offset: DMA offset form which this page needs to be linked + * + * dma_alloc_iova() allocates IOVA based on the size specified by their use in + * iova->size. Call this function after IOVA allocation to link whole @page + * to get the DMA address. Note that very first call to this function + * will have @dma_offset set to 0 in the IOVA space allocated from + * dma_alloc_iova(). For subsequent calls to this function on same @iova, + * @dma_offset needs to be advanced by the caller with the size of previous + * page that was linked + DMA address returned for the previous page that was + * linked by this function. + */ +dma_addr_t dma_hmm_link_page(unsigned long *pfn, struct dma_iova_attrs *iova, + dma_addr_t dma_offset) +{ + struct device *dev = iova->dev; + struct page *page = hmm_pfn_to_page(*pfn); + phys_addr_t phys = page_to_phys(page); + bool coherent = dev_is_dma_coherent(dev); + struct dma_memory_type type = {}; + struct dma_iova_state state = {}; + dma_addr_t addr; + int ret; + + if (*pfn & HMM_PFN_DMA_MAPPED) + /* + * We are in this flow when there is a need to resync flags, + * for example when page was already linked in prefetch call + * with READ flag and now we need to add WRITE flag + * + * This page was already programmed to HW and we don't want/need + * to unlink and link it again just to resync flags. + * + * The DMA address calculation below is based on the fact that + * HMM doesn't work with swiotlb. + */ + return (iova->addr) ? iova->addr + dma_offset : + phys_to_dma(dev, phys); + + dma_get_memory_type(page, &type); + + state.iova = iova; + state.type = &type; + state.range_size = dma_offset; + + if (!dma_can_use_iova(&state, PAGE_SIZE)) { + if (!coherent && !(iova->attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arch_sync_dma_for_device(phys, PAGE_SIZE, iova->dir); + + addr = phys_to_dma(dev, phys); + goto done; + } + + ret = dma_start_range(&state); + if (ret) + return DMA_MAPPING_ERROR; + + ret = dma_link_range(&state, page_to_phys(page), PAGE_SIZE); + dma_end_range(&state); + if (ret) + return DMA_MAPPING_ERROR; + + addr = iova->addr + dma_offset; +done: + kmsan_handle_dma(page, 0, PAGE_SIZE, iova->dir); + *pfn |= HMM_PFN_DMA_MAPPED; + return addr; +} +EXPORT_SYMBOL_GPL(dma_hmm_link_page); + +/** + * dma_hmm_unlink_page - Unlink a physical HMM page from DMA address + * @pfn: HMM PFN + * @iova: Preallocated IOVA space + * @dma_offset: DMA offset form which this page needs to be unlinked + * from the IOVA space + */ +void dma_hmm_unlink_page(unsigned long *pfn, struct dma_iova_attrs *iova, + dma_addr_t dma_offset) +{ + struct device *dev = iova->dev; + struct page *page = hmm_pfn_to_page(*pfn); + struct dma_memory_type type = {}; + struct dma_iova_state state = {}; + const struct dma_map_ops *ops = get_dma_ops(dev); + + dma_get_memory_type(page, &type); + + state.iova = iova; + state.type = &type; + + *pfn &= ~HMM_PFN_DMA_MAPPED; + + if (!dma_can_use_iova(&state, PAGE_SIZE)) { + if (!(iova->attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_direct_sync_single_for_cpu(dev, dma_offset, + PAGE_SIZE, iova->dir); + return; + } + + ops->unlink_range(&state, state.iova->addr + dma_offset, PAGE_SIZE); +} +EXPORT_SYMBOL_GPL(dma_hmm_unlink_page); From patchwork Tue Jul 2 09:09:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719228 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 239C915A86B; Tue, 2 Jul 2024 09:10:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911425; cv=none; b=WYvBTiaueRLOUD9Ck8pUi9LYv0Wtd4WkiaSM9T0bwqo/065LOt6obQeLVudN5hQ722jnKUkpmjPkA5SUIRNQ9e3uSn+tqo/xc85Kti2wlgHKEWj+xoUTkX3HcFarI0jPE7ifgPvqlRoIvzS552HtZVq7h3t1Y9yx7Eag+heaNCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911425; c=relaxed/simple; bh=3LQD0/3Cm7R9Az4oi5cuZkbR6iiFWyUtiH8FYV5qU6g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tAaXXdzxLkD+hs8DdUXJ/yJMmnis8lKrLvniIq4tPDnqjYajeSwpNqsCHOWJdxj9Kh6U8EWCCgNeeBk1E6yB56AYZ//lsLfw1KnDHjZ4wZoatZuZqHDM8NwdeGBSqfMzevcMKdSvB/H8sQLvKXjdL4ueRp7NFphBHEA5YUpfkU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jzgab3Uf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jzgab3Uf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3695BC116B1; Tue, 2 Jul 2024 09:10:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911425; bh=3LQD0/3Cm7R9Az4oi5cuZkbR6iiFWyUtiH8FYV5qU6g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jzgab3UfsAsewcAa24eobcDOjpYZ/7PPMau2CiYGMxN+FfxGb00wDLL3BI2ELXmGU moRg+6YKgbtiDf2JVUe5eQJT8XKaWX5rrUt19Vkg6yDcQ3EOq1SxCxg+IKYC8PD3l0 HS940fsiVLIaK9q8PMwxocD9VU27iN2gNzMGrNBw8q8UcVws7pN76iWryjdgzqisHm VXjMg0SkcQEtPyECwWKViptiiD9d+jmj3OBTAB4yD01wnJBA03aMuR0LwlA7S+3cjd 2MBCI6RclneZK65+IxUismvWUQZUYYw63kyUErE4LhTZye947A31418PnSxjVrmxDt Tsc8PYJOJc7iA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 07/18] iommu/dma: Provide an interface to allow preallocate IOVA Date: Tue, 2 Jul 2024 12:09:37 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Separate IOVA allocation to dedicated callback so it will allow cache of IOVA and reuse it in fast paths for devices which support ODP (on-demand-paging) mechanism. Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 50 +++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 89e34503e0bb..0b5ca6961940 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -357,7 +357,7 @@ int iommu_dma_init_fq(struct iommu_domain *domain) atomic_set(&cookie->fq_timer_on, 0); /* * Prevent incomplete fq state being observable. Pairs with path from - * __iommu_dma_unmap() through iommu_dma_free_iova() to queue_iova() + * __iommu_dma_unmap() through __iommu_dma_free_iova() to queue_iova() */ smp_wmb(); WRITE_ONCE(cookie->fq_domain, domain); @@ -745,7 +745,7 @@ static int dma_info_to_prot(enum dma_data_direction dir, bool coherent, } } -static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, +static dma_addr_t __iommu_dma_alloc_iova(struct iommu_domain *domain, size_t size, u64 dma_limit, struct device *dev) { struct iommu_dma_cookie *cookie = domain->iova_cookie; @@ -791,7 +791,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, return (dma_addr_t)iova << shift; } -static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie, +static void __iommu_dma_free_iova(struct iommu_dma_cookie *cookie, dma_addr_t iova, size_t size, struct iommu_iotlb_gather *gather) { struct iova_domain *iovad = &cookie->iovad; @@ -828,7 +828,7 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr, if (!iotlb_gather.queued) iommu_iotlb_sync(domain, &iotlb_gather); - iommu_dma_free_iova(cookie, dma_addr, size, &iotlb_gather); + __iommu_dma_free_iova(cookie, dma_addr, size, &iotlb_gather); } static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, @@ -851,12 +851,12 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, size = iova_align(iovad, size + iova_off); - iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev); + iova = __iommu_dma_alloc_iova(domain, size, dma_mask, dev); if (!iova) return DMA_MAPPING_ERROR; if (iommu_map(domain, iova, phys - iova_off, size, prot, GFP_ATOMIC)) { - iommu_dma_free_iova(cookie, iova, size, NULL); + __iommu_dma_free_iova(cookie, iova, size, NULL); return DMA_MAPPING_ERROR; } return iova + iova_off; @@ -960,7 +960,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, return NULL; size = iova_align(iovad, size); - iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev); + iova = __iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev); if (!iova) goto out_free_pages; @@ -994,7 +994,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, out_free_sg: sg_free_table(sgt); out_free_iova: - iommu_dma_free_iova(cookie, iova, size, NULL); + __iommu_dma_free_iova(cookie, iova, size, NULL); out_free_pages: __iommu_dma_free_pages(pages, count); return NULL; @@ -1429,7 +1429,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, if (!iova_len) return __finalise_sg(dev, sg, nents, 0); - iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev); + iova = __iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev); if (!iova) { ret = -ENOMEM; goto out_restore_sg; @@ -1446,7 +1446,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, return __finalise_sg(dev, sg, nents, iova); out_free_iova: - iommu_dma_free_iova(cookie, iova, iova_len, NULL); + __iommu_dma_free_iova(cookie, iova, iova_len, NULL); out_restore_sg: __invalidate_sg(sg, nents); out: @@ -1707,6 +1707,30 @@ static size_t iommu_dma_max_mapping_size(struct device *dev) return SIZE_MAX; } +static dma_addr_t iommu_dma_alloc_iova(struct device *dev, size_t size) +{ + struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie = domain->iova_cookie; + struct iova_domain *iovad = &cookie->iovad; + dma_addr_t dma_mask = dma_get_mask(dev); + + size = iova_align(iovad, size); + return __iommu_dma_alloc_iova(domain, size, dma_mask, dev); +} + +static void iommu_dma_free_iova(struct device *dev, dma_addr_t iova, + size_t size) +{ + struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie = domain->iova_cookie; + struct iova_domain *iovad = &cookie->iovad; + struct iommu_iotlb_gather iotlb_gather; + + size = iova_align(iovad, size); + iommu_iotlb_gather_init(&iotlb_gather); + __iommu_dma_free_iova(cookie, iova, size, &iotlb_gather); +} + static const struct dma_map_ops iommu_dma_ops = { .flags = DMA_F_PCI_P2PDMA_SUPPORTED | DMA_F_CAN_SKIP_SYNC, @@ -1731,6 +1755,8 @@ static const struct dma_map_ops iommu_dma_ops = { .get_merge_boundary = iommu_dma_get_merge_boundary, .opt_mapping_size = iommu_dma_opt_mapping_size, .max_mapping_size = iommu_dma_max_mapping_size, + .alloc_iova = iommu_dma_alloc_iova, + .free_iova = iommu_dma_free_iova, }; void iommu_setup_dma_ops(struct device *dev) @@ -1773,7 +1799,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!msi_page) return NULL; - iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); + iova = __iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); if (!iova) goto out_free_page; @@ -1787,7 +1813,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return msi_page; out_free_iova: - iommu_dma_free_iova(cookie, iova, size, NULL); + __iommu_dma_free_iova(cookie, iova, size, NULL); out_free_page: kfree(msi_page); return NULL; From patchwork Tue Jul 2 09:09:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719229 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AEB315A86B; Tue, 2 Jul 2024 09:10:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911429; cv=none; b=i9Oh51P+SZfBEooka5H8I1Fli6tDplniRj8CGB4xY28INvHkvgNCNJwRUegmvu8XZIqhIwuF2/OeMKXlOosMdSTT2R1NhLWj6zHNovnhZjwFmRlCp1Tydh1VEGamtBWjE2uSRA9iTUukthcPI+o+oha5g1TUF+8kUUzaKMg0XxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911429; c=relaxed/simple; bh=suT8AWgnm6GBnp/JyhIgcl0q4xx7OPR0g7KMmtGZr9Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=goEs5ukKJpXkTMxJWI7p3Hl3iMGoMmm/XslY/IK4SMmkoZrBH882Nwt0UKcUtO5ebDUfU8sTjgUex07BlDGL8aWahlOf3/q5+RQuv48STM6RS7eEjVNrVPvWIYu7Cs8u2LRo7nqms5lzpx8TSGDoohere1XtDrALVXZGfEm7Ws4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rq4OARQm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rq4OARQm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3ADDBC116B1; Tue, 2 Jul 2024 09:10:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911429; bh=suT8AWgnm6GBnp/JyhIgcl0q4xx7OPR0g7KMmtGZr9Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rq4OARQmYhfiLIiz6G9wms51urucovfps8babGp4qOGPRB4+5JYgUHLAf6DLfrqPT hJXfo5keL4ouKe9Y4EGQdEL9StrtH1y0FoD0WIEG6+W6CO3MX++fPmfQii7hxJRJux fusUTN23YcTMGW6P7yhGGqD1y1aMsDmtGmRN8YuzUAp5T8XEoteErwwnKbH0sieTGn NDceqME/1xe7SEg4mP6v20tLeGjQsiNWxolnzJ+VOCoSlueXx8IZMPwdetZjLTcW8K X/5eBD/D3Zv08vRmpy+naCb5TKMeQuI9m5F7mEmHjHwRQltCAQDN/qS030U5hOEtlB dKCPiM7uI6ZuQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 08/18] iommu/dma: Implement link/unlink ranges callbacks Date: Tue, 2 Jul 2024 12:09:38 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Add an implementation of link/unlink interface to perform in map/unmap pages in fast patch for pre-allocated IOVA. Signed-off-by: Leon Romanovsky --- drivers/iommu/dma-iommu.c | 79 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 0b5ca6961940..7425d155a14e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1731,6 +1731,82 @@ static void iommu_dma_free_iova(struct device *dev, dma_addr_t iova, __iommu_dma_free_iova(cookie, iova, size, &iotlb_gather); } +static int iommu_dma_start_range(struct dma_iova_state *state) +{ + struct device *dev = state->iova->dev; + + state->domain = iommu_get_dma_domain(dev); + + if (static_branch_unlikely(&iommu_deferred_attach_enabled)) + return iommu_deferred_attach(dev, state->domain); + + return 0; +} + +static int iommu_dma_link_range(struct dma_iova_state *state, phys_addr_t phys, + dma_addr_t addr, size_t size) +{ + struct iommu_domain *domain = state->domain; + struct iommu_dma_cookie *cookie = domain->iova_cookie; + struct iova_domain *iovad = &cookie->iovad; + struct device *dev = state->iova->dev; + enum dma_data_direction dir = state->iova->dir; + bool coherent = dev_is_dma_coherent(dev); + unsigned long attrs = state->iova->attrs; + int prot = dma_info_to_prot(dir, coherent, attrs); + + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arch_sync_dma_for_device(phys, size, dir); + + size = iova_align(iovad, size); + return iommu_map(domain, addr, phys, size, prot, GFP_ATOMIC); +} + +static void iommu_sync_dma_for_cpu(struct iommu_domain *domain, + dma_addr_t start, size_t size, + enum dma_data_direction dir) +{ + size_t sync_size, unmapped = 0; + phys_addr_t phys; + + do { + phys = iommu_iova_to_phys(domain, start + unmapped); + if (WARN_ON(!phys)) + continue; + + sync_size = (unmapped + PAGE_SIZE > size) ? size % PAGE_SIZE : + PAGE_SIZE; + arch_sync_dma_for_cpu(phys, sync_size, dir); + unmapped += sync_size; + } while (unmapped < size); +} + +static void iommu_dma_unlink_range(struct dma_iova_state *state, + dma_addr_t start, size_t size) +{ + struct device *dev = state->iova->dev; + struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie = domain->iova_cookie; + struct iova_domain *iovad = &cookie->iovad; + struct iommu_iotlb_gather iotlb_gather; + bool coherent = dev_is_dma_coherent(dev); + unsigned long attrs = state->iova->attrs; + size_t unmapped; + + iommu_iotlb_gather_init(&iotlb_gather); + iotlb_gather.queued = READ_ONCE(cookie->fq_domain); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !coherent) + iommu_sync_dma_for_cpu(domain, start, size, state->iova->dir); + + size = iova_align(iovad, size); + unmapped = iommu_unmap_fast(domain, start, size, &iotlb_gather); + WARN_ON(unmapped != size); + + if (!iotlb_gather.queued) + iommu_iotlb_sync(domain, &iotlb_gather); +} + static const struct dma_map_ops iommu_dma_ops = { .flags = DMA_F_PCI_P2PDMA_SUPPORTED | DMA_F_CAN_SKIP_SYNC, @@ -1757,6 +1833,9 @@ static const struct dma_map_ops iommu_dma_ops = { .max_mapping_size = iommu_dma_max_mapping_size, .alloc_iova = iommu_dma_alloc_iova, .free_iova = iommu_dma_free_iova, + .link_range = iommu_dma_link_range, + .unlink_range = iommu_dma_unlink_range, + .start_range = iommu_dma_start_range, }; void iommu_setup_dma_ops(struct device *dev) From patchwork Tue Jul 2 09:09:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719236 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F79C1741DC; Tue, 2 Jul 2024 09:10:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911458; cv=none; b=mjcwnqWGsHcv0mYlyawy6JSIcoW6VKRorZIf8S0PhsFq0124p0toCRr72IgQ81LeIvp2U/+tAK1OvL4TYWpSdwKYS1qMmSTOIWdRLgzA1MoqqXTILEWELOXrKfYTi3JTD9XW8o3dhPfTUPeUDWvYQoc0OuwejwLkn3zq9V4PKzU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911458; c=relaxed/simple; bh=SkqFHwOM6KJzTXdmnNBw1x88eUJ1/2PlxqGXP9zV2ZU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TmAdzPoEQ8S6eVi2RFGNVfB/6oW80wjlq8lspQyXAl+CZBVcUPJ6XNgYdgpNiT2XklP+tB8svyvAwMH9Hs3Pwh2URsTQerYoA9cxVBSMmBCbz0vxwy5UfyIzTKiCszPhERqM+pmXTSqsgtRdnpXt0wWfGveUhfMpvvQkkUpzW6s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=A9QLYC0x; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="A9QLYC0x" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0CADCC116B1; Tue, 2 Jul 2024 09:10:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911457; bh=SkqFHwOM6KJzTXdmnNBw1x88eUJ1/2PlxqGXP9zV2ZU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A9QLYC0xA24DxduqY3heCuvEMv2g+F4kcqj46l7twDf7eoAjLAfDXUqwWewJv15rI V4ngl5OKU+cYCEmcoA1OWEcg+g4LDqDL7IRuO5NiDytGijovb3mdlf1czP84Qh/8w1 gojGNLjQCBZqSCRgaAAiURw+YEKUsuwnCMNl6nCr2hlpPPb4vGZE4NXCFAJUYCIcpD dAJi7vB/mecxRkG5SOV39gduWkn/DXvyFUwGe+XPrw4P8XlIk54V56l6s3+MBenBuR aHhZJAvI2iVoMTtjqvR2irRwgfTxB8wUnp7xjwsi9Cd0akPOCz7fVXYus3N3dY3QeP 1txEpEBqSvlag== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 09/18] RDMA/umem: Preallocate and cache IOVA for UMEM ODP Date: Tue, 2 Jul 2024 12:09:39 +0300 Message-ID: <2d04e220fea52a41f2005c3a3e2123c3967af88f.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky As a preparation to provide two step interface to map pages, preallocate IOVA when UMEM is initialized. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 14 +++++++++++++- include/rdma/ib_umem_odp.h | 1 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index e9fa22d31c23..955bf338b1bf 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -50,6 +50,7 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp, const struct mmu_interval_notifier_ops *ops) { + struct ib_device *dev = umem_odp->umem.ibdev; int ret; umem_odp->umem.is_odp = 1; @@ -87,15 +88,25 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp, goto out_pfn_list; } + umem_odp->iova.dev = dev->dma_device; + umem_odp->iova.size = end - start; + umem_odp->iova.dir = DMA_BIDIRECTIONAL; + ret = dma_alloc_iova(&umem_odp->iova); + if (ret) + goto out_dma_list; + + ret = mmu_interval_notifier_insert(&umem_odp->notifier, umem_odp->umem.owning_mm, start, end - start, ops); if (ret) - goto out_dma_list; + goto out_free_iova; } return 0; +out_free_iova: + dma_free_iova(&umem_odp->iova); out_dma_list: kvfree(umem_odp->dma_list); out_pfn_list: @@ -274,6 +285,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp) ib_umem_end(umem_odp)); mutex_unlock(&umem_odp->umem_mutex); mmu_interval_notifier_remove(&umem_odp->notifier); + dma_free_iova(&umem_odp->iova); kvfree(umem_odp->dma_list); kvfree(umem_odp->pfn_list); } diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index 0844c1d05ac6..bb2d7f2a5b04 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -23,6 +23,7 @@ struct ib_umem_odp { * See ODP_READ_ALLOWED_BIT and ODP_WRITE_ALLOWED_BIT. */ dma_addr_t *dma_list; + struct dma_iova_attrs iova; /* * The umem_mutex protects the page_list and dma_list fields of an ODP * umem, allowing only a single thread to map/unmap pages. The mutex From patchwork Tue Jul 2 09:09:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719231 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADC97155325; Tue, 2 Jul 2024 09:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911437; cv=none; b=uxnUJdmUE5ZD9FEJxRpKFOg2h3rMmTaevwvE/O3M0Z0PjaLQXQHkLoQIPQDJpS0hMpt7xY2TUYX3cmdbLRKd6sd3br002XDYJWzdZ4Bi3kJx7pN7Vwz0lxhIPdPIDMR1sg6xIW95Himv15319ufn98dPyom6Ezq8zm5Ly2eCNec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911437; c=relaxed/simple; bh=UccJQ9nS+AS0vpHIOQiIlzi89+lcmxz/okxY3l4iYQ8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VTyTfrmw/5sTExgyFr9Llzn8Box+ynOwUGBy4H43PfdlniD01ZtKV9BbGMgDjZ/43XKderclY7+f0gL/LJyqO8oErt8lOR8xCdMslTtqNZbnIAlCdoO+MgwCunGPeH9/idXkrTnh/8jLWjztjkGHuA5s9FFVLtaNnXJjl/XDjcU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iNhBksVu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iNhBksVu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3CCCC116B1; Tue, 2 Jul 2024 09:10:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911437; bh=UccJQ9nS+AS0vpHIOQiIlzi89+lcmxz/okxY3l4iYQ8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iNhBksVuMjdd9OgE12zpQ/G/0gXROuc2JiQtdkq/YPji7n6elUfWNHPpjzLSkzwpD rYUQOCs17v6ZkaTLonG9Qm2hIKwxebLAoNnvhf5uX0T5kMW55FAtOwHN1U58yzjg5H KmCEktqyhOcnCdt2e+XYEsVRIapgAdwwTd3WfQSE5X050+CWU82gphnSKNpW3gg9Yj ZT0bXPA9TCvLuXGiFdBiGVLaWvdhZ6EExdZ0nFexFeg8QlaSTJEauVpUoRW/POrhmm lFbNK4ciqFxYeGoW/fyYyyOismRIotZnY2twO4F1++hPCQz9/9/ERwuYQmgecOSPzd 6W0gz8DC0fy+A== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 10/18] RDMA/umem: Store ODP access mask information in PFN Date: Tue, 2 Jul 2024 12:09:40 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky As a preparation to remove of dma_list, store access mask in PFN pointer and not in dma_addr_t. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 98 +++++++++++----------------- drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + drivers/infiniband/hw/mlx5/odp.c | 37 ++++++----- include/rdma/ib_umem_odp.h | 14 +--- 4 files changed, 59 insertions(+), 91 deletions(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 955bf338b1bf..c628a98c41b7 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -308,22 +308,11 @@ EXPORT_SYMBOL(ib_umem_odp_release); static int ib_umem_odp_map_dma_single_page( struct ib_umem_odp *umem_odp, unsigned int dma_index, - struct page *page, - u64 access_mask) + struct page *page) { struct ib_device *dev = umem_odp->umem.ibdev; dma_addr_t *dma_addr = &umem_odp->dma_list[dma_index]; - if (*dma_addr) { - /* - * If the page is already dma mapped it means it went through - * a non-invalidating trasition, like read-only to writable. - * Resync the flags. - */ - *dma_addr = (*dma_addr & ODP_DMA_ADDR_MASK) | access_mask; - return 0; - } - *dma_addr = ib_dma_map_page(dev, page, 0, 1 << umem_odp->page_shift, DMA_BIDIRECTIONAL); if (ib_dma_mapping_error(dev, *dma_addr)) { @@ -331,7 +320,6 @@ static int ib_umem_odp_map_dma_single_page( return -EFAULT; } umem_odp->npages++; - *dma_addr |= access_mask; return 0; } @@ -367,9 +355,6 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, struct hmm_range range = {}; unsigned long timeout; - if (access_mask == 0) - return -EINVAL; - if (user_virt < ib_umem_start(umem_odp) || user_virt + bcnt > ib_umem_end(umem_odp)) return -EFAULT; @@ -395,7 +380,7 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, if (fault) { range.default_flags = HMM_PFN_REQ_FAULT; - if (access_mask & ODP_WRITE_ALLOWED_BIT) + if (access_mask & HMM_PFN_WRITE) range.default_flags |= HMM_PFN_REQ_WRITE; } @@ -427,22 +412,17 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, for (pfn_index = 0; pfn_index < num_pfns; pfn_index += 1 << (page_shift - PAGE_SHIFT), dma_index++) { - if (fault) { - /* - * Since we asked for hmm_range_fault() to populate - * pages it shouldn't return an error entry on success. - */ - WARN_ON(range.hmm_pfns[pfn_index] & HMM_PFN_ERROR); - WARN_ON(!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)); - } else { - if (!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)) { - WARN_ON(umem_odp->dma_list[dma_index]); - continue; - } - access_mask = ODP_READ_ALLOWED_BIT; - if (range.hmm_pfns[pfn_index] & HMM_PFN_WRITE) - access_mask |= ODP_WRITE_ALLOWED_BIT; - } + /* + * Since we asked for hmm_range_fault() to populate + * pages it shouldn't return an error entry on success. + */ + WARN_ON(fault && range.hmm_pfns[pfn_index] & HMM_PFN_ERROR); + WARN_ON(fault && !(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)); + if (!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID)) + continue; + + if (range.hmm_pfns[pfn_index] & HMM_PFN_DMA_MAPPED) + continue; hmm_order = hmm_pfn_to_map_order(range.hmm_pfns[pfn_index]); /* If a hugepage was detected and ODP wasn't set for, the umem @@ -457,13 +437,13 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, } ret = ib_umem_odp_map_dma_single_page( - umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index]), - access_mask); + umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index])); if (ret < 0) { ibdev_dbg(umem_odp->umem.ibdev, "ib_umem_odp_map_dma_single_page failed with error %d\n", ret); break; } + range.hmm_pfns[pfn_index] |= HMM_PFN_DMA_MAPPED; } /* upon success lock should stay on hold for the callee */ if (!ret) @@ -483,7 +463,6 @@ EXPORT_SYMBOL(ib_umem_odp_map_dma_and_lock); void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, u64 bound) { - dma_addr_t dma_addr; dma_addr_t dma; int idx; u64 addr; @@ -494,34 +473,33 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, virt = max_t(u64, virt, ib_umem_start(umem_odp)); bound = min_t(u64, bound, ib_umem_end(umem_odp)); for (addr = virt; addr < bound; addr += BIT(umem_odp->page_shift)) { + unsigned long pfn_idx = (addr - ib_umem_start(umem_odp)) >> PAGE_SHIFT; + struct page *page = hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); + idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; dma = umem_odp->dma_list[idx]; - /* The access flags guaranteed a valid DMA address in case was NULL */ - if (dma) { - unsigned long pfn_idx = (addr - ib_umem_start(umem_odp)) >> PAGE_SHIFT; - struct page *page = hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); - - dma_addr = dma & ODP_DMA_ADDR_MASK; - ib_dma_unmap_page(dev, dma_addr, - BIT(umem_odp->page_shift), - DMA_BIDIRECTIONAL); - if (dma & ODP_WRITE_ALLOWED_BIT) { - struct page *head_page = compound_head(page); - /* - * set_page_dirty prefers being called with - * the page lock. However, MMU notifiers are - * called sometimes with and sometimes without - * the lock. We rely on the umem_mutex instead - * to prevent other mmu notifiers from - * continuing and allowing the page mapping to - * be removed. - */ - set_page_dirty(head_page); - } - umem_odp->dma_list[idx] = 0; - umem_odp->npages--; + if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_VALID)) + continue; + if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_DMA_MAPPED)) + continue; + + ib_dma_unmap_page(dev, dma, BIT(umem_odp->page_shift), + DMA_BIDIRECTIONAL); + if (umem_odp->pfn_list[pfn_idx] & HMM_PFN_WRITE) { + struct page *head_page = compound_head(page); + /* + * set_page_dirty prefers being called with + * the page lock. However, MMU notifiers are + * called sometimes with and sometimes without + * the lock. We rely on the umem_mutex instead + * to prevent other mmu notifiers from + * continuing and allowing the page mapping to + * be removed. + */ + set_page_dirty(head_page); } + umem_odp->npages--; } } EXPORT_SYMBOL(ib_umem_odp_unmap_dma_pages); diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index f255a12e26a0..e8494a803a58 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -334,6 +334,7 @@ struct mlx5_ib_flow_db { #define MLX5_IB_UPD_XLT_PD BIT(4) #define MLX5_IB_UPD_XLT_ACCESS BIT(5) #define MLX5_IB_UPD_XLT_INDIRECT BIT(6) +#define MLX5_IB_UPD_XLT_DOWNGRADE BIT(7) /* Private QP creation flags to be passed in ib_qp_init_attr.create_flags. * diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c index 4a04cbc5b78a..5713fe25f4de 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "mlx5_ib.h" #include "cmd.h" @@ -143,22 +144,12 @@ static void populate_klm(struct mlx5_klm *pklm, size_t idx, size_t nentries, } } -static u64 umem_dma_to_mtt(dma_addr_t umem_dma) -{ - u64 mtt_entry = umem_dma & ODP_DMA_ADDR_MASK; - - if (umem_dma & ODP_READ_ALLOWED_BIT) - mtt_entry |= MLX5_IB_MTT_READ; - if (umem_dma & ODP_WRITE_ALLOWED_BIT) - mtt_entry |= MLX5_IB_MTT_WRITE; - - return mtt_entry; -} - static void populate_mtt(__be64 *pas, size_t idx, size_t nentries, struct mlx5_ib_mr *mr, int flags) { struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem); + bool downgrade = flags & MLX5_IB_UPD_XLT_DOWNGRADE; + unsigned long pfn; dma_addr_t pa; size_t i; @@ -166,8 +157,17 @@ static void populate_mtt(__be64 *pas, size_t idx, size_t nentries, return; for (i = 0; i < nentries; i++) { + pfn = odp->pfn_list[idx + i]; + if (!(pfn & HMM_PFN_VALID)) + /* Initial ODP init */ + continue; + pa = odp->dma_list[idx + i]; - pas[i] = cpu_to_be64(umem_dma_to_mtt(pa)); + pa |= MLX5_IB_MTT_READ; + if ((pfn & HMM_PFN_WRITE) && !downgrade) + pa |= MLX5_IB_MTT_WRITE; + + pas[i] = cpu_to_be64(pa); } } @@ -268,8 +268,7 @@ static bool mlx5_ib_invalidate_range(struct mmu_interval_notifier *mni, * estimate the cost of another UMR vs. the cost of bigger * UMR. */ - if (umem_odp->dma_list[idx] & - (ODP_READ_ALLOWED_BIT | ODP_WRITE_ALLOWED_BIT)) { + if (umem_odp->pfn_list[idx] & HMM_PFN_VALID) { if (!in_block) { blk_start_idx = idx; in_block = 1; @@ -555,7 +554,7 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp, { int page_shift, ret, np; bool downgrade = flags & MLX5_PF_FLAGS_DOWNGRADE; - u64 access_mask; + u64 access_mask = 0; u64 start_idx; bool fault = !(flags & MLX5_PF_FLAGS_SNAPSHOT); u32 xlt_flags = MLX5_IB_UPD_XLT_ATOMIC; @@ -563,12 +562,14 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp, if (flags & MLX5_PF_FLAGS_ENABLE) xlt_flags |= MLX5_IB_UPD_XLT_ENABLE; + if (flags & MLX5_PF_FLAGS_DOWNGRADE) + xlt_flags |= MLX5_IB_UPD_XLT_DOWNGRADE; + page_shift = odp->page_shift; start_idx = (user_va - ib_umem_start(odp)) >> page_shift; - access_mask = ODP_READ_ALLOWED_BIT; if (odp->umem.writable && !downgrade) - access_mask |= ODP_WRITE_ALLOWED_BIT; + access_mask |= HMM_PFN_WRITE; np = ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask, fault); if (np < 0) diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index bb2d7f2a5b04..a3f4a5c03bf8 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -8,6 +8,7 @@ #include #include +#include struct ib_umem_odp { struct ib_umem umem; @@ -68,19 +69,6 @@ static inline size_t ib_umem_odp_num_pages(struct ib_umem_odp *umem_odp) umem_odp->page_shift; } -/* - * The lower 2 bits of the DMA address signal the R/W permissions for - * the entry. To upgrade the permissions, provide the appropriate - * bitmask to the map_dma_pages function. - * - * Be aware that upgrading a mapped address might result in change of - * the DMA address for the page. - */ -#define ODP_READ_ALLOWED_BIT (1<<0ULL) -#define ODP_WRITE_ALLOWED_BIT (1<<1ULL) - -#define ODP_DMA_ADDR_MASK (~(ODP_READ_ALLOWED_BIT | ODP_WRITE_ALLOWED_BIT)) - #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING struct ib_umem_odp * From patchwork Tue Jul 2 09:09:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719232 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4D1716B384; Tue, 2 Jul 2024 09:10:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911441; cv=none; b=W5sw2OTyQzD/nLrr1gLrhBDhJTNNto2zsJt6yT3EpSucTi+w0+POkaSG7scXpa0EfykdwwqWwpTucAlhf3hvtvqdd6D66FmKEeotDM7C8w2X5cP2i5Aq2mmd4OPU9CxXqxxFT+aFO5xsX3hy9TtvH1e3trasVg8ko4VODgKf9pM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911441; c=relaxed/simple; bh=iroMohzT1Q8AWRK2+Thbq89BnX39U8SKlcdoPnAxJYY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WHu6vY3rkmppcXLPO7ErFMHyyN17Owyz48x2z7GvS5IND0Xfae6vzZT7bT+qAA04cmmjnBMdTvDytOInlSZLAetU4d8/egoJn5Lkya8WQ+M1pkKtJWrZtPl1Wxy1jfgfgJtZO5ahNrN4TQMJUMLc0y5FCFtVtRPM+TDGb0G4ESY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VczNJJsV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VczNJJsV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD16CC116B1; Tue, 2 Jul 2024 09:10:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911441; bh=iroMohzT1Q8AWRK2+Thbq89BnX39U8SKlcdoPnAxJYY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VczNJJsVneBG8U1/XVDb6geZHvZySIGG4DY2l4QAi9gXw0LZ4UZx7I111sTW/pzG+ 4gFYx4HiuOTIS7ymyC1ZaLERA/X33A5BIU5kAJM/SZyFgB5GRFBxOi0L/j9m8A/bcs lgRBq3rNRiPVGh9h5iFW+4ma1ZmCVnY4jodvAce1csOdqMDIlLFM1gpYm0fdzdf3dc DXGkry+mEYxh70j6i+/ZAGXhXHz4SK5EzMJ4XXxg16NwaKkpXS/s2cAIDA+bNQUsB3 T6fOcXLArV3UrnEVTuSxL9/A8so/gdTs+kjzN4UkmqpH6Ew92iDJu+Z7ECGV8U0SDk gTnjyBi/EKyDQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 11/18] RDMA/core: Separate DMA mapping to caching IOVA and page linkage Date: Tue, 2 Jul 2024 12:09:41 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Reuse newly added DMA API to cache IOVA and only link/unlink pages in fast path. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 61 +++--------------------------- drivers/infiniband/hw/mlx5/odp.c | 7 +++- include/rdma/ib_umem_odp.h | 8 +--- 3 files changed, 12 insertions(+), 64 deletions(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index c628a98c41b7..6e170cb5110c 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -81,20 +81,13 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp, if (!umem_odp->pfn_list) return -ENOMEM; - umem_odp->dma_list = kvcalloc( - ndmas, sizeof(*umem_odp->dma_list), GFP_KERNEL); - if (!umem_odp->dma_list) { - ret = -ENOMEM; - goto out_pfn_list; - } umem_odp->iova.dev = dev->dma_device; umem_odp->iova.size = end - start; umem_odp->iova.dir = DMA_BIDIRECTIONAL; ret = dma_alloc_iova(&umem_odp->iova); if (ret) - goto out_dma_list; - + goto out_pfn_list; ret = mmu_interval_notifier_insert(&umem_odp->notifier, umem_odp->umem.owning_mm, @@ -107,8 +100,6 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp, out_free_iova: dma_free_iova(&umem_odp->iova); -out_dma_list: - kvfree(umem_odp->dma_list); out_pfn_list: kvfree(umem_odp->pfn_list); return ret; @@ -286,7 +277,6 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp) mutex_unlock(&umem_odp->umem_mutex); mmu_interval_notifier_remove(&umem_odp->notifier); dma_free_iova(&umem_odp->iova); - kvfree(umem_odp->dma_list); kvfree(umem_odp->pfn_list); } put_pid(umem_odp->tgid); @@ -294,40 +284,10 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp) } EXPORT_SYMBOL(ib_umem_odp_release); -/* - * Map for DMA and insert a single page into the on-demand paging page tables. - * - * @umem: the umem to insert the page to. - * @dma_index: index in the umem to add the dma to. - * @page: the page struct to map and add. - * @access_mask: access permissions needed for this page. - * - * The function returns -EFAULT if the DMA mapping operation fails. - * - */ -static int ib_umem_odp_map_dma_single_page( - struct ib_umem_odp *umem_odp, - unsigned int dma_index, - struct page *page) -{ - struct ib_device *dev = umem_odp->umem.ibdev; - dma_addr_t *dma_addr = &umem_odp->dma_list[dma_index]; - - *dma_addr = ib_dma_map_page(dev, page, 0, 1 << umem_odp->page_shift, - DMA_BIDIRECTIONAL); - if (ib_dma_mapping_error(dev, *dma_addr)) { - *dma_addr = 0; - return -EFAULT; - } - umem_odp->npages++; - return 0; -} - /** * ib_umem_odp_map_dma_and_lock - DMA map userspace memory in an ODP MR and lock it. * * Maps the range passed in the argument to DMA addresses. - * The DMA addresses of the mapped pages is updated in umem_odp->dma_list. * Upon success the ODP MR will be locked to let caller complete its device * page table update. * @@ -435,15 +395,6 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt, __func__, hmm_order, page_shift); break; } - - ret = ib_umem_odp_map_dma_single_page( - umem_odp, dma_index, hmm_pfn_to_page(range.hmm_pfns[pfn_index])); - if (ret < 0) { - ibdev_dbg(umem_odp->umem.ibdev, - "ib_umem_odp_map_dma_single_page failed with error %d\n", ret); - break; - } - range.hmm_pfns[pfn_index] |= HMM_PFN_DMA_MAPPED; } /* upon success lock should stay on hold for the callee */ if (!ret) @@ -463,10 +414,8 @@ EXPORT_SYMBOL(ib_umem_odp_map_dma_and_lock); void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, u64 bound) { - dma_addr_t dma; int idx; u64 addr; - struct ib_device *dev = umem_odp->umem.ibdev; lockdep_assert_held(&umem_odp->umem_mutex); @@ -474,19 +423,19 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt, bound = min_t(u64, bound, ib_umem_end(umem_odp)); for (addr = virt; addr < bound; addr += BIT(umem_odp->page_shift)) { unsigned long pfn_idx = (addr - ib_umem_start(umem_odp)) >> PAGE_SHIFT; - struct page *page = hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; - dma = umem_odp->dma_list[idx]; if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_VALID)) continue; if (!(umem_odp->pfn_list[pfn_idx] & HMM_PFN_DMA_MAPPED)) continue; - ib_dma_unmap_page(dev, dma, BIT(umem_odp->page_shift), - DMA_BIDIRECTIONAL); + dma_hmm_unlink_page(&umem_odp->pfn_list[pfn_idx], + &umem_odp->iova, + idx * (1 << umem_odp->page_shift)); if (umem_odp->pfn_list[pfn_idx] & HMM_PFN_WRITE) { + struct page *page = hmm_pfn_to_page(umem_odp->pfn_list[pfn_idx]); struct page *head_page = compound_head(page); /* * set_page_dirty prefers being called with diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c index 5713fe25f4de..b2aeaef9d0e1 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -149,6 +149,7 @@ static void populate_mtt(__be64 *pas, size_t idx, size_t nentries, { struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem); bool downgrade = flags & MLX5_IB_UPD_XLT_DOWNGRADE; + struct ib_device *dev = odp->umem.ibdev; unsigned long pfn; dma_addr_t pa; size_t i; @@ -162,12 +163,16 @@ static void populate_mtt(__be64 *pas, size_t idx, size_t nentries, /* Initial ODP init */ continue; - pa = odp->dma_list[idx + i]; + pa = dma_hmm_link_page(&odp->pfn_list[idx + i], &odp->iova, + (idx + i) * (1 << odp->page_shift)); + WARN_ON_ONCE(ib_dma_mapping_error(dev, pa)); + pa |= MLX5_IB_MTT_READ; if ((pfn & HMM_PFN_WRITE) && !downgrade) pa |= MLX5_IB_MTT_WRITE; pas[i] = cpu_to_be64(pa); + odp->npages++; } } diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index a3f4a5c03bf8..653fc076b6ee 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -18,15 +18,9 @@ struct ib_umem_odp { /* An array of the pfns included in the on-demand paging umem. */ unsigned long *pfn_list; - /* - * An array with DMA addresses mapped for pfns in pfn_list. - * The lower two bits designate access permissions. - * See ODP_READ_ALLOWED_BIT and ODP_WRITE_ALLOWED_BIT. - */ - dma_addr_t *dma_list; struct dma_iova_attrs iova; /* - * The umem_mutex protects the page_list and dma_list fields of an ODP + * The umem_mutex protects the page_list field of an ODP * umem, allowing only a single thread to map/unmap pages. The mutex * also protects access to the mmu notifier counters. */ From patchwork Tue Jul 2 09:09:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719233 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2189B15574C; Tue, 2 Jul 2024 09:10:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911446; cv=none; b=FOyAjPHCyMdFvv2SOrQPrF9am/Pj6AQ6Nc9sil4iahw1JIWQx6KLQtkSIIxaQlauGN8DeQRNWcFETD9Yxad3yygP3pHXBnaUzMAs3fROxCJg7b0JD6kBSd89IcDKX/ZXR2Q+tBpvQXRA9ShrOfkW8jD+3CVTH3H+k64f1lU2zrQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911446; c=relaxed/simple; bh=pdK7V9xHdNBFoHCz59AGR5WOTx5xl2Vj8twOHvengYk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TTdreByZ+XQlCsYIvdOuU+FizXCcoaCsSUsuEtqqZHDOTuTa8EyymRCTxooNxE7L/qu04NSbzJ97d2oBk9p17iE9VXJdDwUs9OzipIU7Kg8nSfTTxrJjmXEkxoDHg4XveIAuuciFsd1gpSs1bPeoIff3DBjiO3mlJmjN14kklGE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KTz8NtUc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KTz8NtUc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB9F3C4AF0A; Tue, 2 Jul 2024 09:10:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911445; bh=pdK7V9xHdNBFoHCz59AGR5WOTx5xl2Vj8twOHvengYk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KTz8NtUcFIGsDCeQDH+Db9PamT6/nlViuegQcsznIc+DF9srj7x240NEr4il//Xc/ 8i6luwPgO0yiJWdOnRcYYR5NL7yfawxlEKbv7drJYa8B4a339djv3OvkkqYT8c8H3F 2fawWfN+3TdW5300DYyGBMMir31FVKME3Mb0rFDkEafNNmyz4K9325deM1XSwCqvmZ j68Pmkr7YplRS9KcdrYMe4vUV3x2d5CnPCIdbVmN1wj2l59g8sgHeEOk1PKw3gUGEO sLuttA9bzZDdVviGMe/UOzCvZZ/jZMpHBqkeBTmM9HISPNn+o452NCrh1hoAqBSCYz dRKD1bcV6Pa7w== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 12/18] RDMA/umem: Prevent UMEM ODP creation with SWIOTLB Date: Tue, 2 Jul 2024 12:09:42 +0300 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky RDMA UMEM never supported DMA addresses returned from SWIOTLB, as these addresses should be programmed to the hardware which is not aware that it is bounce buffers and not real ones. Instead of silently leave broken system for the users who didn't know it, let's be explicit and return an error to them. Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 81 +++++++++++++++--------------- 1 file changed, 41 insertions(+), 40 deletions(-) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 6e170cb5110c..12186717a892 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -42,7 +42,8 @@ #include #include #include - +#include +#include #include #include "uverbs.h" @@ -51,50 +52,50 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp, const struct mmu_interval_notifier_ops *ops) { struct ib_device *dev = umem_odp->umem.ibdev; + size_t page_size = 1UL << umem_odp->page_shift; + unsigned long start, end; + size_t ndmas, npfns; int ret; umem_odp->umem.is_odp = 1; mutex_init(&umem_odp->umem_mutex); + if (umem_odp->is_implicit_odp) + return 0; + + if (dev_use_swiotlb(dev->dma_device, page_size, DMA_BIDIRECTIONAL) || + is_swiotlb_force_bounce(dev->dma_device)) + return -EOPNOTSUPP; + + start = ALIGN_DOWN(umem_odp->umem.address, page_size); + if (check_add_overflow(umem_odp->umem.address, + (unsigned long)umem_odp->umem.length, &end)) + return -EOVERFLOW; + end = ALIGN(end, page_size); + if (unlikely(end < page_size)) + return -EOVERFLOW; + + ndmas = (end - start) >> umem_odp->page_shift; + if (!ndmas) + return -EINVAL; + + npfns = (end - start) >> PAGE_SHIFT; + umem_odp->pfn_list = + kvcalloc(npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); + if (!umem_odp->pfn_list) + return -ENOMEM; + + umem_odp->iova.dev = dev->dma_device; + umem_odp->iova.size = end - start; + umem_odp->iova.dir = DMA_BIDIRECTIONAL; + ret = dma_alloc_iova(&umem_odp->iova); + if (ret) + goto out_pfn_list; - if (!umem_odp->is_implicit_odp) { - size_t page_size = 1UL << umem_odp->page_shift; - unsigned long start; - unsigned long end; - size_t ndmas, npfns; - - start = ALIGN_DOWN(umem_odp->umem.address, page_size); - if (check_add_overflow(umem_odp->umem.address, - (unsigned long)umem_odp->umem.length, - &end)) - return -EOVERFLOW; - end = ALIGN(end, page_size); - if (unlikely(end < page_size)) - return -EOVERFLOW; - - ndmas = (end - start) >> umem_odp->page_shift; - if (!ndmas) - return -EINVAL; - - npfns = (end - start) >> PAGE_SHIFT; - umem_odp->pfn_list = kvcalloc( - npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); - if (!umem_odp->pfn_list) - return -ENOMEM; - - - umem_odp->iova.dev = dev->dma_device; - umem_odp->iova.size = end - start; - umem_odp->iova.dir = DMA_BIDIRECTIONAL; - ret = dma_alloc_iova(&umem_odp->iova); - if (ret) - goto out_pfn_list; - - ret = mmu_interval_notifier_insert(&umem_odp->notifier, - umem_odp->umem.owning_mm, - start, end - start, ops); - if (ret) - goto out_free_iova; - } + ret = mmu_interval_notifier_insert(&umem_odp->notifier, + umem_odp->umem.owning_mm, start, + end - start, ops); + if (ret) + goto out_free_iova; return 0; From patchwork Tue Jul 2 09:09:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719234 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A70E15574C; Tue, 2 Jul 2024 09:10:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911450; cv=none; b=jPqYkcBMl1JZFMnhbPqGzmrHkQsIO1aQbQmMBc6f3KDv0HgyTcjTtfVaEz9J6uK8n1vbvwMdwpCa8uIJFOO/5i9ImNgnyF7AXfqV5ijNoa8DUJAoHCdXQyhCsf5SKIUehk/VfRKcRGNd8s8sqPILaqfKnvJvuKhxR0jMwsyoZ1Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911450; c=relaxed/simple; bh=UBtvWw7Gy5bkZxQWGiITiVmdZjOyVzq6Q6pe5h5nMZE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YPsh1gBnKVhiR9UmvJUJ5UBJ1163YLug22a1RY9uMr8ShgraqgdNZJcRus+CxqmMo2sZeZvxiPuduRlrT6foGQsOdDh4s2SWkhnmuk9aerkP+2ASToIuPK7hWfNKdUVzOfJANeThaq2ZNgIN+a+aazUP7xxZUnG2aaXkK9Kykh8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OiiAr9+n; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OiiAr9+n" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFE14C32781; Tue, 2 Jul 2024 09:10:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911449; bh=UBtvWw7Gy5bkZxQWGiITiVmdZjOyVzq6Q6pe5h5nMZE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OiiAr9+nYxAigqsEDGipeVXi0TP3nXWi7mK/LlWrRaIz0IcR+fZxFtgFqfr7E/wnG 7ZazMCjRFgPgPtVEEwp8YXy2SRazjjt2+Uuh41m/x79CqLIGsWzLXbd0r9lNK9Or5x l0Dh89IJNVkfH11huzLO5HsmiCGKKTkWGUjZRb6B+EJuLxL80CuIGX0+K0APVMQ33N //In69xpEVXNX92YXJyAHM1hzpqFPmSrIFIfZU9FgI5S1zYNwGXUeWmBpHGSexRleS 7FyjezEJi+u8u8fSnqRA9/cx/u52Z9MzzW9NFEmTl8EU9VPoOXKOuFBhcBVID0QXA/ iAgDsM4unOl2Q== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 13/18] vfio/mlx5: Explicitly use number of pages instead of allocated length Date: Tue, 2 Jul 2024 12:09:43 +0300 Message-ID: <8feabd70634bc8d5c4bda4afe3f5083e56044006.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky allocated_length is a multiple of page size and number of pages, so let's change the functions to accept number of pages. It opens us a venue to combine receive and send paths together with code readability improvement. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 32 +++++++++++----------- drivers/vfio/pci/mlx5/cmd.h | 10 +++---- drivers/vfio/pci/mlx5/main.c | 53 +++++++++++++++++++++++------------- 3 files changed, 55 insertions(+), 40 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 41a4b0cf4297..fdc3e515741f 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -318,8 +318,7 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, struct mlx5_vhca_recv_buf *recv_buf, u32 *mkey) { - size_t npages = buf ? DIV_ROUND_UP(buf->allocated_length, PAGE_SIZE) : - recv_buf->npages; + size_t npages = buf ? buf->npages : recv_buf->npages; int err = 0, inlen; __be64 *mtt; void *mkc; @@ -375,7 +374,7 @@ static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) if (mvdev->mdev_detach) return -ENOTCONN; - if (buf->dmaed || !buf->allocated_length) + if (buf->dmaed || !buf->npages) return -EINVAL; ret = dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); @@ -444,7 +443,7 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, if (ret) goto err; - buf->allocated_length += filled * PAGE_SIZE; + buf->npages += filled; /* clean input for another bulk allocation */ memset(page_list, 0, filled * sizeof(*page_list)); to_fill = min_t(unsigned int, to_alloc, @@ -460,8 +459,7 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, } struct mlx5_vhca_data_buffer * -mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, enum dma_data_direction dma_dir) { struct mlx5_vhca_data_buffer *buf; @@ -473,9 +471,8 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, buf->dma_dir = dma_dir; buf->migf = migf; - if (length) { - ret = mlx5vf_add_migration_pages(buf, - DIV_ROUND_UP_ULL(length, PAGE_SIZE)); + if (npages) { + ret = mlx5vf_add_migration_pages(buf, npages); if (ret) goto end; @@ -501,8 +498,8 @@ void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf) } struct mlx5_vhca_data_buffer * -mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, enum dma_data_direction dma_dir) +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, + enum dma_data_direction dma_dir) { struct mlx5_vhca_data_buffer *buf, *temp_buf; struct list_head free_list; @@ -517,7 +514,7 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, list_for_each_entry_safe(buf, temp_buf, &migf->avail_list, buf_elm) { if (buf->dma_dir == dma_dir) { list_del_init(&buf->buf_elm); - if (buf->allocated_length >= length) { + if (buf->npages >= npages) { spin_unlock_irq(&migf->list_lock); goto found; } @@ -531,7 +528,7 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, } } spin_unlock_irq(&migf->list_lock); - buf = mlx5vf_alloc_data_buffer(migf, length, dma_dir); + buf = mlx5vf_alloc_data_buffer(migf, npages, dma_dir); found: while ((temp_buf = list_first_entry_or_null(&free_list, @@ -712,7 +709,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(save_vhca_state_in, in, op_mod, 0); MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(save_vhca_state_in, in, mkey, buf->mkey); - MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); + MLX5_SET(save_vhca_state_in, in, size, buf->npages * PAGE_SIZE); MLX5_SET(save_vhca_state_in, in, incremental, inc); MLX5_SET(save_vhca_state_in, in, set_track, track); @@ -734,8 +731,11 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, } if (!header_buf) { - header_buf = mlx5vf_get_data_buffer(migf, - sizeof(struct mlx5_vf_migration_header), DMA_NONE); + header_buf = mlx5vf_get_data_buffer( + migf, + DIV_ROUND_UP(sizeof(struct mlx5_vf_migration_header), + PAGE_SIZE), + DMA_NONE); if (IS_ERR(header_buf)) { err = PTR_ERR(header_buf); goto err_free; diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index df421dc6de04..7d4a833b6900 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -56,7 +56,7 @@ struct mlx5_vhca_data_buffer { struct sg_append_table table; loff_t start_pos; u64 length; - u64 allocated_length; + u32 npages; u32 mkey; enum dma_data_direction dma_dir; u8 dmaed:1; @@ -217,12 +217,12 @@ int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf); void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf); void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf); struct mlx5_vhca_data_buffer * -mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, enum dma_data_direction dma_dir); +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, + enum dma_data_direction dma_dir); void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf); struct mlx5_vhca_data_buffer * -mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, - size_t length, enum dma_data_direction dma_dir); +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, + enum dma_data_direction dma_dir); void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf); struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, unsigned long offset); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 61d9b0f9146d..0925cd7d2f17 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -308,6 +308,7 @@ static struct mlx5_vhca_data_buffer * mlx5vf_mig_file_get_stop_copy_buf(struct mlx5_vf_migration_file *migf, u8 index, size_t required_length) { + u32 npages = DIV_ROUND_UP(required_length, PAGE_SIZE); struct mlx5_vhca_data_buffer *buf = migf->buf[index]; u8 chunk_num; @@ -315,12 +316,11 @@ mlx5vf_mig_file_get_stop_copy_buf(struct mlx5_vf_migration_file *migf, chunk_num = buf->stop_copy_chunk_num; buf->migf->buf[index] = NULL; /* Checking whether the pre-allocated buffer can fit */ - if (buf->allocated_length >= required_length) + if (buf->npages >= npages) return buf; mlx5vf_put_data_buffer(buf); - buf = mlx5vf_get_data_buffer(buf->migf, required_length, - DMA_FROM_DEVICE); + buf = mlx5vf_get_data_buffer(buf->migf, npages, DMA_FROM_DEVICE); if (IS_ERR(buf)) return buf; @@ -373,7 +373,8 @@ static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf, u8 *to_buff; int ret; - header_buf = mlx5vf_get_data_buffer(migf, size, DMA_NONE); + header_buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(size, PAGE_SIZE), + DMA_NONE); if (IS_ERR(header_buf)) return PTR_ERR(header_buf); @@ -388,7 +389,7 @@ static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf, to_buff = kmap_local_page(page); memcpy(to_buff, &header, sizeof(header)); header_buf->length = sizeof(header); - data.stop_copy_size = cpu_to_le64(migf->buf[0]->allocated_length); + data.stop_copy_size = cpu_to_le64(migf->buf[0]->npages * PAGE_SIZE); memcpy(to_buff + sizeof(header), &data, sizeof(data)); header_buf->length += sizeof(data); kunmap_local(to_buff); @@ -437,15 +438,20 @@ static int mlx5vf_prep_stop_copy(struct mlx5vf_pci_core_device *mvdev, num_chunks = mvdev->chunk_mode ? MAX_NUM_CHUNKS : 1; for (i = 0; i < num_chunks; i++) { - buf = mlx5vf_get_data_buffer(migf, inc_state_size, DMA_FROM_DEVICE); + buf = mlx5vf_get_data_buffer( + migf, DIV_ROUND_UP(inc_state_size, PAGE_SIZE), + DMA_FROM_DEVICE); if (IS_ERR(buf)) { ret = PTR_ERR(buf); goto err; } migf->buf[i] = buf; - buf = mlx5vf_get_data_buffer(migf, - sizeof(struct mlx5_vf_migration_header), DMA_NONE); + buf = mlx5vf_get_data_buffer( + migf, + DIV_ROUND_UP(sizeof(struct mlx5_vf_migration_header), + PAGE_SIZE), + DMA_NONE); if (IS_ERR(buf)) { ret = PTR_ERR(buf); goto err; @@ -553,7 +559,8 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd, * We finished transferring the current state and the device has a * dirty state, save a new state to be ready for. */ - buf = mlx5vf_get_data_buffer(migf, inc_length, DMA_FROM_DEVICE); + buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE), + DMA_FROM_DEVICE); if (IS_ERR(buf)) { ret = PTR_ERR(buf); mlx5vf_mark_err(migf); @@ -674,8 +681,8 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track) if (track) { /* leave the allocated buffer ready for the stop-copy phase */ - buf = mlx5vf_alloc_data_buffer(migf, - migf->buf[0]->allocated_length, DMA_FROM_DEVICE); + buf = mlx5vf_alloc_data_buffer(migf, migf->buf[0]->npages, + DMA_FROM_DEVICE); if (IS_ERR(buf)) { ret = PTR_ERR(buf); goto out_pd; @@ -918,11 +925,14 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, goto out_unlock; break; case MLX5_VF_LOAD_STATE_PREP_HEADER_DATA: - if (vhca_buf_header->allocated_length < migf->record_size) { + { + u32 npages = DIV_ROUND_UP(migf->record_size, PAGE_SIZE); + + if (vhca_buf_header->npages < npages) { mlx5vf_free_data_buffer(vhca_buf_header); - migf->buf_header[0] = mlx5vf_alloc_data_buffer(migf, - migf->record_size, DMA_NONE); + migf->buf_header[0] = mlx5vf_alloc_data_buffer( + migf, npages, DMA_NONE); if (IS_ERR(migf->buf_header[0])) { ret = PTR_ERR(migf->buf_header[0]); migf->buf_header[0] = NULL; @@ -935,6 +945,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, vhca_buf_header->start_pos = migf->max_pos; migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER_DATA; break; + } case MLX5_VF_LOAD_STATE_READ_HEADER_DATA: ret = mlx5vf_resume_read_header_data(migf, vhca_buf_header, &buf, &len, pos, &done); @@ -945,12 +956,13 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, { u64 size = max(migf->record_size, migf->stop_copy_prep_size); + u32 npages = DIV_ROUND_UP(size, PAGE_SIZE); - if (vhca_buf->allocated_length < size) { + if (vhca_buf->npages < npages) { mlx5vf_free_data_buffer(vhca_buf); - migf->buf[0] = mlx5vf_alloc_data_buffer(migf, - size, DMA_TO_DEVICE); + migf->buf[0] = mlx5vf_alloc_data_buffer( + migf, npages, DMA_TO_DEVICE); if (IS_ERR(migf->buf[0])) { ret = PTR_ERR(migf->buf[0]); migf->buf[0] = NULL; @@ -1033,8 +1045,11 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) } migf->buf[0] = buf; - buf = mlx5vf_alloc_data_buffer(migf, - sizeof(struct mlx5_vf_migration_header), DMA_NONE); + buf = mlx5vf_alloc_data_buffer( + migf, + DIV_ROUND_UP(sizeof(struct mlx5_vf_migration_header), + PAGE_SIZE), + DMA_NONE); if (IS_ERR(buf)) { ret = PTR_ERR(buf); goto out_buf; From patchwork Tue Jul 2 09:09:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719235 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4A7217279B; Tue, 2 Jul 2024 09:10:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911454; cv=none; b=OaHPHHkFGJIzjZ/xWejaBNbA9MlPVfLcndyGTjVV7GYkcmEWAwNyMU5u2vnBMdEPKz0Nj8wrBHoSxida02GTd6GvkIX2nBxhRqlfCb5U0w1P7fXU+fiqaKEYzBtox+q7kFCvXcBasdKxjQpceFCaaFdOWVupYj9h9Opo6mBF/QQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911454; c=relaxed/simple; bh=9UOnDPAidcjKmf/OJX9UmmceN7wfX+lWZhw4XRQHsBI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fYbCfeP5dBVdH8288bsVMmZ/SLb1Jg0nULFFuS2st2VJksuBHe829P3LXvv8m1PopRX0eVawltHlypXyS7Uo9xxMjUKhpFHWS6u6wU6q25BO144IK9004K6DjEYjH0fB1y4dstzXpcZxJLnaUahSzng1hJhHL8/w+RgCnqviBUE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IH4G738G; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IH4G738G" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 07E58C116B1; Tue, 2 Jul 2024 09:10:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911453; bh=9UOnDPAidcjKmf/OJX9UmmceN7wfX+lWZhw4XRQHsBI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IH4G738GSwErqAnN9PIIpPbA4v1TGUQrymBnSy8qOyXwgifbXPO3u0KTdl/89ps+h p9f/IyRPlypt/XGrwR9R/A17oUdbh2wo7NHA51sch1NhyLLlQ3xFLkNVK+2gYK3L/p RiNnu7NpFOyPRV50Li1NyNWm2X77rSAW5sS6b//Ylakd8AaLb8hx4XC3S6rNJe2Z0h E4oVem7p4gi38QGhSck1S5LwRFj2SF8x9KKB6Dg0XuKwF2o2WEiiOWOFq0+Ztr+Um9 BaNIQR5Ix4Ytpr7fe4YRJVgcFLvt4s+kf6lOEEyBFj9lMldCW7PCA3KoX846v1mE+2 Oqtm9QZbIvssg== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 14/18] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Date: Tue, 2 Jul 2024 12:09:44 +0300 Message-ID: <1818fe62955e127f49469595706b1eb40b02d352.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Change the creation of mkey to be performed in multiple steps: data allocation, DMA setup and actual call to HW to create that mkey. In this new flow, the whole input to MKEY command is saved to eliminate the need to keep array of pointers for DMA addresses for receive list and in the future patches for send list too. In addition to memory size reduce and elimination of unnecessary data movements to set MKEY input, the code is prepared for future reuse. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 154 ++++++++++++++++++++---------------- drivers/vfio/pci/mlx5/cmd.h | 4 +- 2 files changed, 88 insertions(+), 70 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index fdc3e515741f..adf57104555a 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -313,39 +313,21 @@ static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id, return ret; } -static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, - struct mlx5_vhca_data_buffer *buf, - struct mlx5_vhca_recv_buf *recv_buf, - u32 *mkey) +static u32 *alloc_mkey_in(u32 npages, u32 pdn) { - size_t npages = buf ? buf->npages : recv_buf->npages; - int err = 0, inlen; - __be64 *mtt; + int inlen; void *mkc; u32 *in; inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + - sizeof(*mtt) * round_up(npages, 2); + sizeof(__be64) * round_up(npages, 2); - in = kvzalloc(inlen, GFP_KERNEL); + in = kvzalloc(inlen, GFP_KERNEL_ACCOUNT); if (!in) - return -ENOMEM; + return NULL; MLX5_SET(create_mkey_in, in, translations_octword_actual_size, DIV_ROUND_UP(npages, 2)); - mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt); - - if (buf) { - struct sg_dma_page_iter dma_iter; - - for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) - *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); - } else { - int i; - - for (i = 0; i < npages; i++) - *mtt++ = cpu_to_be64(recv_buf->dma_addrs[i]); - } mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry); MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT); @@ -359,9 +341,29 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); MLX5_SET(mkc, mkc, translations_octword_size, DIV_ROUND_UP(npages, 2)); MLX5_SET64(mkc, mkc, len, npages * PAGE_SIZE); - err = mlx5_core_create_mkey(mdev, mkey, in, inlen); - kvfree(in); - return err; + + return in; +} + +static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, + struct mlx5_vhca_data_buffer *buf, u32 *mkey_in, + u32 *mkey) +{ + __be64 *mtt; + int inlen; + + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); + if (buf) { + struct sg_dma_page_iter dma_iter; + + for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) + *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); + } + + inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + + sizeof(__be64) * round_up(npages, 2); + + return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); } static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) @@ -374,20 +376,27 @@ static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) if (mvdev->mdev_detach) return -ENOTCONN; - if (buf->dmaed || !buf->npages) + if (buf->mkey_in || !buf->npages) return -EINVAL; ret = dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); if (ret) return ret; - ret = _create_mkey(mdev, buf->migf->pdn, buf, NULL, &buf->mkey); - if (ret) + buf->mkey_in = alloc_mkey_in(buf->npages, buf->migf->pdn); + if (!buf->mkey_in) { + ret = -ENOMEM; goto err; + } - buf->dmaed = true; + ret = create_mkey(mdev, buf->npages, buf, buf->mkey_in, &buf->mkey); + if (ret) + goto err_create_mkey; return 0; + +err_create_mkey: + kvfree(buf->mkey_in); err: dma_unmap_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); return ret; @@ -401,8 +410,9 @@ void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) lockdep_assert_held(&migf->mvdev->state_mutex); WARN_ON(migf->mvdev->mdev_detach); - if (buf->dmaed) { + if (buf->mkey_in) { mlx5_core_destroy_mkey(migf->mvdev->mdev, buf->mkey); + kvfree(buf->mkey_in); dma_unmap_sgtable(migf->mvdev->mdev->device, &buf->table.sgt, buf->dma_dir, 0); } @@ -779,7 +789,7 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (mvdev->mdev_detach) return -ENOTCONN; - if (!buf->dmaed) { + if (!buf->mkey_in) { err = mlx5vf_dma_data_buffer(buf); if (err) return err; @@ -1380,56 +1390,54 @@ static int alloc_recv_pages(struct mlx5_vhca_recv_buf *recv_buf, kvfree(recv_buf->page_list); return -ENOMEM; } +static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + u32 *mkey_in) +{ + dma_addr_t addr; + __be64 *mtt; + int i; + + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); + for (i = npages - 1; i >= 0; i--) { + addr = be64_to_cpu(mtt[i]); + dma_unmap_single(mdev->device, addr, PAGE_SIZE, + DMA_FROM_DEVICE); + } +} -static int register_dma_recv_pages(struct mlx5_core_dev *mdev, - struct mlx5_vhca_recv_buf *recv_buf) +static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + struct page **page_list, u32 *mkey_in) { - int i, j; + dma_addr_t addr; + __be64 *mtt; + int i; - recv_buf->dma_addrs = kvcalloc(recv_buf->npages, - sizeof(*recv_buf->dma_addrs), - GFP_KERNEL_ACCOUNT); - if (!recv_buf->dma_addrs) - return -ENOMEM; + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - for (i = 0; i < recv_buf->npages; i++) { - recv_buf->dma_addrs[i] = dma_map_page(mdev->device, - recv_buf->page_list[i], - 0, PAGE_SIZE, - DMA_FROM_DEVICE); - if (dma_mapping_error(mdev->device, recv_buf->dma_addrs[i])) + for (i = 0; i < npages; i++) { + addr = dma_map_page(mdev->device, page_list[i], 0, PAGE_SIZE, + DMA_FROM_DEVICE); + if (dma_mapping_error(mdev->device, addr)) goto error; + + *mtt++ = cpu_to_be64(addr); } + return 0; error: - for (j = 0; j < i; j++) - dma_unmap_single(mdev->device, recv_buf->dma_addrs[j], - PAGE_SIZE, DMA_FROM_DEVICE); - - kvfree(recv_buf->dma_addrs); + unregister_dma_pages(mdev, i, mkey_in); return -ENOMEM; } -static void unregister_dma_recv_pages(struct mlx5_core_dev *mdev, - struct mlx5_vhca_recv_buf *recv_buf) -{ - int i; - - for (i = 0; i < recv_buf->npages; i++) - dma_unmap_single(mdev->device, recv_buf->dma_addrs[i], - PAGE_SIZE, DMA_FROM_DEVICE); - - kvfree(recv_buf->dma_addrs); -} - static void mlx5vf_free_qp_recv_resources(struct mlx5_core_dev *mdev, struct mlx5_vhca_qp *qp) { struct mlx5_vhca_recv_buf *recv_buf = &qp->recv_buf; mlx5_core_destroy_mkey(mdev, recv_buf->mkey); - unregister_dma_recv_pages(mdev, recv_buf); + unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in); + kvfree(recv_buf->mkey_in); free_recv_pages(&qp->recv_buf); } @@ -1445,18 +1453,28 @@ static int mlx5vf_alloc_qp_recv_resources(struct mlx5_core_dev *mdev, if (err < 0) return err; - err = register_dma_recv_pages(mdev, recv_buf); - if (err) + recv_buf->mkey_in = alloc_mkey_in(npages, pdn); + if (!recv_buf->mkey_in) { + err = -ENOMEM; goto end; + } + + err = register_dma_pages(mdev, npages, recv_buf->page_list, + recv_buf->mkey_in); + if (err) + goto err_register_dma; - err = _create_mkey(mdev, pdn, NULL, recv_buf, &recv_buf->mkey); + err = create_mkey(mdev, npages, NULL, recv_buf->mkey_in, + &recv_buf->mkey); if (err) goto err_create_mkey; return 0; err_create_mkey: - unregister_dma_recv_pages(mdev, recv_buf); + unregister_dma_pages(mdev, npages, recv_buf->mkey_in); +err_register_dma: + kvfree(recv_buf->mkey_in); end: free_recv_pages(recv_buf); return err; diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 7d4a833b6900..25dd6ff54591 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -58,8 +58,8 @@ struct mlx5_vhca_data_buffer { u64 length; u32 npages; u32 mkey; + u32 *mkey_in; enum dma_data_direction dma_dir; - u8 dmaed:1; u8 stop_copy_chunk_num; struct list_head buf_elm; struct mlx5_vf_migration_file *migf; @@ -133,8 +133,8 @@ struct mlx5_vhca_cq { struct mlx5_vhca_recv_buf { u32 npages; struct page **page_list; - dma_addr_t *dma_addrs; u32 next_rq_offset; + u32 *mkey_in; u32 mkey; }; From patchwork Tue Jul 2 09:09:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719240 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B51117921D; Tue, 2 Jul 2024 09:11:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911474; cv=none; b=GpsrGfUc2uRNFAIpuLTPye5+SD4J5FVI3VbR/KDnk2CR8QKLlAj1856nyqPPiVkWuDsO85w83BK8Cr4BXFy3UOm0yjobtqKInN8K/FyriouhA5LYHvyEl2NKLQk1Ydsu5qeE+mt+7NWyLYORc8A5A/pMAHJSbKMKeF2I6IUgRZU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911474; c=relaxed/simple; bh=PIW7uecyqbo8l+/QQ3xYJoQlnbA8PPRXn3HjOM7nqAY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MkcUIjIURRssCtolkeW+KDZIeygp/ExJD0LbhtbIb2IdojJPR1TP04La5+a1WG/kKZ7/AmPyoMbLVoVLCxl6i90YO8JPtODycepuMrJb+zJWp/BOYHbB8psQ02WpyZf/HDpbGmtMqQkIbe/eqtZR3v1IGwRj2pv2vpFDyoJgsS8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NhTi/KqE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NhTi/KqE" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4CABDC116B1; Tue, 2 Jul 2024 09:11:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911474; bh=PIW7uecyqbo8l+/QQ3xYJoQlnbA8PPRXn3HjOM7nqAY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NhTi/KqELO0ntEOXqd97jqF285urhVtp0ovXZbMP4lcmZaDwcgPg/5W9aGAWMevzM GDilwAbddqGsrw6Zr4IKwgEv/nWRXIl2ieuB+HDrivIcEW+HlwLJRNlmEiKZLN2rHp 0muXrcmfjl0zBQ2vUDmhRG8u6kxF++q1H1AnwZx2MUDaHup1Eyt+Cx95IXnEJ7sI4X am+hv9pcwPTPSt+yJK71cxH2vUFES69TrTtxJVdEaMHZrLleT2dLw+rgqcyKXSgXqu Hh1xdmh+LLCL2umU4Gf1ef/Cnv3MoUC9ltluYS0I8qeptgpVu09+wfyU+0sTdQu4iU nF6+LSv5nDujA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 15/18] vfio/mlx5: Explicitly store page list Date: Tue, 2 Jul 2024 12:09:45 +0300 Message-ID: <2691374c551bc276ec135ea58207a440e34f7be4.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky As a preparation to removal scatter-gather table and unifying receive and send list, explicitly store page list. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 33 ++++++++++++++++----------------- drivers/vfio/pci/mlx5/cmd.h | 1 + 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index adf57104555a..cb23f03d58f4 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -421,6 +421,7 @@ void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) for_each_sgtable_page(&buf->table.sgt, &sg_iter, 0) __free_page(sg_page_iter_page(&sg_iter)); sg_free_append_table(&buf->table); + kvfree(buf->page_list); kfree(buf); } @@ -428,44 +429,42 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, unsigned int npages) { unsigned int to_alloc = npages; + size_t old_size, new_size; struct page **page_list; unsigned long filled; unsigned int to_fill; int ret; - to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list)); - page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL_ACCOUNT); + to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*buf->page_list)); + old_size = buf->npages * sizeof(*buf->page_list); + new_size = old_size + to_alloc * sizeof(*buf->page_list); + page_list = kvrealloc(buf->page_list, old_size, new_size, + GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!page_list) return -ENOMEM; + buf->page_list = page_list; + do { filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill, - page_list); - if (!filled) { - ret = -ENOMEM; - goto err; - } + buf->page_list + buf->npages); + if (!filled) + return -ENOMEM; + to_alloc -= filled; ret = sg_alloc_append_table_from_pages( - &buf->table, page_list, filled, 0, + &buf->table, buf->page_list + buf->npages, filled, 0, filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, GFP_KERNEL_ACCOUNT); if (ret) - goto err; + return ret; buf->npages += filled; - /* clean input for another bulk allocation */ - memset(page_list, 0, filled * sizeof(*page_list)); to_fill = min_t(unsigned int, to_alloc, - PAGE_SIZE / sizeof(*page_list)); + PAGE_SIZE / sizeof(*buf->page_list)); } while (to_alloc > 0); - kvfree(page_list); return 0; - -err: - kvfree(page_list); - return ret; } struct mlx5_vhca_data_buffer * diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 25dd6ff54591..5b764199db53 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -53,6 +53,7 @@ struct mlx5_vf_migration_header { }; struct mlx5_vhca_data_buffer { + struct page **page_list; struct sg_append_table table; loff_t start_pos; u64 length; From patchwork Tue Jul 2 09:09:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719237 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61B13156228; Tue, 2 Jul 2024 09:11:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911462; cv=none; b=sxx5fF/QPpgJCWldFH7kipyIcF91zFHM4QATt1lQtIV14GupENIeBeBXrb/KAhhxAXUI+5VRYv7YrxhXac/o5DmnhEWpvR7KvbP/nm9SsEz8+WZIO5yOt8FqbBIDsrud5azJ8CJraBSvqSGoXxLaRbOmtluOvTFaU3HK1NMfR0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911462; c=relaxed/simple; bh=fa03j06JQsZdSB5w8kukQGr1qEK3gqfe4kVoo/wFmOU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KitS9G1e0Auxfhre0iio6cQCtsRRGnZyXOq8TuV2WWfmTek27wK/Ha1Ok5eES3dt+c6uyVbP5p4rgxtaKXRiPUpbDUL3SrVgGlGfA13+IFVJX/Z8Ii1NPkKHAsv+pym9HKDuRSuYLNPX8lb317f5qK4wFrt6HTcnEFzSMHc02JQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UxFYX5T8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UxFYX5T8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3FBF7C4AF0C; Tue, 2 Jul 2024 09:11:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911462; bh=fa03j06JQsZdSB5w8kukQGr1qEK3gqfe4kVoo/wFmOU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UxFYX5T8UxyxQNqh0dNbCpGazWGtmPierV/QvwE9BGOpvSWKPtM6wsts3yNmS5dqL tH5CgKntyE6eqP9YqyyoyG6O7j1aw3evZjy6f1qzlgNBPBlk1iiY4Ykr5qqF3To1RQ KUEah33CnUzMnFc49cw5jU5avrg/pQowUAbFlRIsygLY92LOPvWfpYjXI6LM3NfuF9 NQD+XRyxBALu/EiXi7jhLvldrcxv78UCmLUaPa2f1JvSQIDvVX3+xit5nWY6qlb5Ei XqArm6GygNZMsWo8UtofSP2LhTuU2Nvje1MspK6zjEvpFFN/4kMz/wRXyU5akAE+yK bIlJRc6et2iPQ== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Leon Romanovsky , Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 16/18] vfio/mlx5: Convert vfio to use DMA link API Date: Tue, 2 Jul 2024 12:09:46 +0300 Message-ID: <34e6da6903d31e26dbc08138eb37d1ccae3b2d3d.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Leon Romanovsky Remove intermediate scatter-gather table as it is not needed if DMA link API is used. This conversion reduces drastically the memory used to manage that table. Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5/cmd.c | 241 ++++++++++++++++++++--------------- drivers/vfio/pci/mlx5/cmd.h | 10 +- drivers/vfio/pci/mlx5/main.c | 33 +---- 3 files changed, 143 insertions(+), 141 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index cb23f03d58f4..4520eaf78767 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -345,25 +345,106 @@ static u32 *alloc_mkey_in(u32 npages, u32 pdn) return in; } -static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, - struct mlx5_vhca_data_buffer *buf, u32 *mkey_in, +static int create_mkey(struct mlx5_core_dev *mdev, u32 npages, u32 *mkey_in, u32 *mkey) { + int inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + + sizeof(__be64) * round_up(npages, 2); + + return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); +} + +static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + u32 *mkey_in, struct dma_iova_attrs *iova, + struct dma_memory_type *type) +{ + struct dma_iova_state state = {}; + dma_addr_t addr; __be64 *mtt; - int inlen; + int i; + + WARN_ON_ONCE(iova->dir == DMA_NONE); + + state.iova = iova; + state.type = type; + state.range_size = PAGE_SIZE * npages; + + if (dma_can_use_iova(&state, PAGE_SIZE)) { + dma_unlink_range(&state); + } else { + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, + klm_pas_mtt); + for (i = npages - 1; i >= 0; i--) { + addr = be64_to_cpu(mtt[i]); + dma_unmap_page_attrs(iova->dev, addr, PAGE_SIZE, + iova->dir, iova->attrs); + } + } + dma_free_iova(iova); +} + +static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, + struct page **page_list, u32 *mkey_in, + struct dma_iova_attrs *iova, + struct dma_memory_type *type) +{ + struct dma_iova_state state = {}; + dma_addr_t addr; + bool use_iova; + __be64 *mtt; + int i, err; + + WARN_ON_ONCE(iova->dir == DMA_NONE); + + iova->dev = mdev->device; + iova->size = npages * PAGE_SIZE; + err = dma_alloc_iova(iova); + if (err) + return err; + + /* + * All VFIO pages are of the same type, and it is enough + * to check one page only + */ + dma_get_memory_type(page_list[0], type); + state.iova = iova; + state.type = type; + + use_iova = dma_can_use_iova(&state, PAGE_SIZE); mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - if (buf) { - struct sg_dma_page_iter dma_iter; + if (use_iova) + err = dma_start_range(&state); + if (err) { + dma_free_iova(iova); + return err; + } + for (i = 0; i < npages; i++) { + if (use_iova) { + err = dma_link_range(&state, page_to_phys(page_list[i]), + PAGE_SIZE); + addr = iova->addr; + } else { + addr = dma_map_page_attrs(iova->dev, page_list[i], 0, + PAGE_SIZE, iova->dir, + iova->attrs); + err = dma_mapping_error(mdev->device, addr); + } + if (err) + goto error; - for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) - *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); + /* In IOVA case, we can use one MTT entry for whole buffer */ + if (i == 0 || !use_iova) + *mtt++ = cpu_to_be64(addr); } + if (use_iova) + dma_end_range(&state); - inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + - sizeof(__be64) * round_up(npages, 2); + return 0; - return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); +error: + unregister_dma_pages(mdev, i, mkey_in, iova, type); + return err; } static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) @@ -379,49 +460,56 @@ static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) if (buf->mkey_in || !buf->npages) return -EINVAL; - ret = dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); - if (ret) - return ret; - buf->mkey_in = alloc_mkey_in(buf->npages, buf->migf->pdn); - if (!buf->mkey_in) { - ret = -ENOMEM; - goto err; - } + if (!buf->mkey_in) + return -ENOMEM; - ret = create_mkey(mdev, buf->npages, buf, buf->mkey_in, &buf->mkey); + ret = register_dma_pages(mdev, buf->npages, buf->page_list, + buf->mkey_in, &buf->iova, &buf->type); + if (ret) + goto err_register_dma; + + ret = create_mkey(mdev, buf->npages, buf->mkey_in, &buf->mkey); if (ret) goto err_create_mkey; return 0; err_create_mkey: + unregister_dma_pages(mdev, buf->npages, buf->mkey_in, &buf->iova, + &buf->type); +err_register_dma: kvfree(buf->mkey_in); -err: - dma_unmap_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); return ret; } +static void free_page_list(u32 npages, struct page **page_list) +{ + int i; + + /* Undo alloc_pages_bulk_array() */ + for (i = npages - 1; i >= 0; i--) + __free_page(page_list[i]); + + kvfree(page_list); +} + void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) { - struct mlx5_vf_migration_file *migf = buf->migf; - struct sg_page_iter sg_iter; + struct mlx5vf_pci_core_device *mvdev = buf->migf->mvdev; + struct mlx5_core_dev *mdev = mvdev->mdev; - lockdep_assert_held(&migf->mvdev->state_mutex); - WARN_ON(migf->mvdev->mdev_detach); + lockdep_assert_held(&mvdev->state_mutex); + WARN_ON(mvdev->mdev_detach); if (buf->mkey_in) { - mlx5_core_destroy_mkey(migf->mvdev->mdev, buf->mkey); + mlx5_core_destroy_mkey(mdev, buf->mkey); + unregister_dma_pages(mdev, buf->npages, buf->mkey_in, + &buf->iova, &buf->type); kvfree(buf->mkey_in); - dma_unmap_sgtable(migf->mvdev->mdev->device, &buf->table.sgt, - buf->dma_dir, 0); } - /* Undo alloc_pages_bulk_array() */ - for_each_sgtable_page(&buf->table.sgt, &sg_iter, 0) - __free_page(sg_page_iter_page(&sg_iter)); - sg_free_append_table(&buf->table); - kvfree(buf->page_list); + free_page_list(buf->npages, buf->page_list); kfree(buf); } @@ -432,10 +520,7 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, size_t old_size, new_size; struct page **page_list; unsigned long filled; - unsigned int to_fill; - int ret; - to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*buf->page_list)); old_size = buf->npages * sizeof(*buf->page_list); new_size = old_size + to_alloc * sizeof(*buf->page_list); page_list = kvrealloc(buf->page_list, old_size, new_size, @@ -446,22 +531,13 @@ static int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, buf->page_list = page_list; do { - filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill, - buf->page_list + buf->npages); + filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_alloc, + buf->page_list + buf->npages); if (!filled) return -ENOMEM; to_alloc -= filled; - ret = sg_alloc_append_table_from_pages( - &buf->table, buf->page_list + buf->npages, filled, 0, - filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, - GFP_KERNEL_ACCOUNT); - - if (ret) - return ret; buf->npages += filled; - to_fill = min_t(unsigned int, to_alloc, - PAGE_SIZE / sizeof(*buf->page_list)); } while (to_alloc > 0); return 0; @@ -478,7 +554,7 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, if (!buf) return ERR_PTR(-ENOMEM); - buf->dma_dir = dma_dir; + buf->iova.dir = dma_dir; buf->migf = migf; if (npages) { ret = mlx5vf_add_migration_pages(buf, npages); @@ -521,7 +597,7 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, u32 npages, spin_lock_irq(&migf->list_lock); list_for_each_entry_safe(buf, temp_buf, &migf->avail_list, buf_elm) { - if (buf->dma_dir == dma_dir) { + if (buf->iova.dir == dma_dir) { list_del_init(&buf->buf_elm); if (buf->npages >= npages) { spin_unlock_irq(&migf->list_lock); @@ -1343,17 +1419,6 @@ static void mlx5vf_destroy_qp(struct mlx5_core_dev *mdev, kfree(qp); } -static void free_recv_pages(struct mlx5_vhca_recv_buf *recv_buf) -{ - int i; - - /* Undo alloc_pages_bulk_array() */ - for (i = 0; i < recv_buf->npages; i++) - __free_page(recv_buf->page_list[i]); - - kvfree(recv_buf->page_list); -} - static int alloc_recv_pages(struct mlx5_vhca_recv_buf *recv_buf, unsigned int npages) { @@ -1389,45 +1454,6 @@ static int alloc_recv_pages(struct mlx5_vhca_recv_buf *recv_buf, kvfree(recv_buf->page_list); return -ENOMEM; } -static void unregister_dma_pages(struct mlx5_core_dev *mdev, u32 npages, - u32 *mkey_in) -{ - dma_addr_t addr; - __be64 *mtt; - int i; - - mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - for (i = npages - 1; i >= 0; i--) { - addr = be64_to_cpu(mtt[i]); - dma_unmap_single(mdev->device, addr, PAGE_SIZE, - DMA_FROM_DEVICE); - } -} - -static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, - struct page **page_list, u32 *mkey_in) -{ - dma_addr_t addr; - __be64 *mtt; - int i; - - mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); - - for (i = 0; i < npages; i++) { - addr = dma_map_page(mdev->device, page_list[i], 0, PAGE_SIZE, - DMA_FROM_DEVICE); - if (dma_mapping_error(mdev->device, addr)) - goto error; - - *mtt++ = cpu_to_be64(addr); - } - - return 0; - -error: - unregister_dma_pages(mdev, i, mkey_in); - return -ENOMEM; -} static void mlx5vf_free_qp_recv_resources(struct mlx5_core_dev *mdev, struct mlx5_vhca_qp *qp) @@ -1435,9 +1461,10 @@ static void mlx5vf_free_qp_recv_resources(struct mlx5_core_dev *mdev, struct mlx5_vhca_recv_buf *recv_buf = &qp->recv_buf; mlx5_core_destroy_mkey(mdev, recv_buf->mkey); - unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in); + unregister_dma_pages(mdev, recv_buf->npages, recv_buf->mkey_in, + &recv_buf->iova, &recv_buf->type); kvfree(recv_buf->mkey_in); - free_recv_pages(&qp->recv_buf); + free_page_list(recv_buf->npages, recv_buf->page_list); } static int mlx5vf_alloc_qp_recv_resources(struct mlx5_core_dev *mdev, @@ -1458,24 +1485,26 @@ static int mlx5vf_alloc_qp_recv_resources(struct mlx5_core_dev *mdev, goto end; } + recv_buf->iova.dir = DMA_FROM_DEVICE; err = register_dma_pages(mdev, npages, recv_buf->page_list, - recv_buf->mkey_in); + recv_buf->mkey_in, &recv_buf->iova, + &recv_buf->type); if (err) goto err_register_dma; - err = create_mkey(mdev, npages, NULL, recv_buf->mkey_in, - &recv_buf->mkey); + err = create_mkey(mdev, npages, recv_buf->mkey_in, &recv_buf->mkey); if (err) goto err_create_mkey; return 0; err_create_mkey: - unregister_dma_pages(mdev, npages, recv_buf->mkey_in); + unregister_dma_pages(mdev, npages, recv_buf->mkey_in, &recv_buf->iova, + &recv_buf->type); err_register_dma: kvfree(recv_buf->mkey_in); end: - free_recv_pages(recv_buf); + free_page_list(npages, recv_buf->page_list); return err; } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 5b764199db53..1b2552c238d8 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -53,21 +53,17 @@ struct mlx5_vf_migration_header { }; struct mlx5_vhca_data_buffer { + struct dma_iova_attrs iova; struct page **page_list; - struct sg_append_table table; + struct dma_memory_type type; loff_t start_pos; u64 length; u32 npages; u32 mkey; u32 *mkey_in; - enum dma_data_direction dma_dir; u8 stop_copy_chunk_num; struct list_head buf_elm; struct mlx5_vf_migration_file *migf; - /* Optimize mlx5vf_get_migration_page() for sequential access */ - struct scatterlist *last_offset_sg; - unsigned int sg_last_entry; - unsigned long last_offset; }; struct mlx5vf_async_data { @@ -132,8 +128,10 @@ struct mlx5_vhca_cq { }; struct mlx5_vhca_recv_buf { + struct dma_iova_attrs iova; u32 npages; struct page **page_list; + struct dma_memory_type type; u32 next_rq_offset; u32 *mkey_in; u32 mkey; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 0925cd7d2f17..ddadf8ccae87 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -34,35 +34,10 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev) core_device); } -struct page * -mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, - unsigned long offset) +struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, + unsigned long offset) { - unsigned long cur_offset = 0; - struct scatterlist *sg; - unsigned int i; - - /* All accesses are sequential */ - if (offset < buf->last_offset || !buf->last_offset_sg) { - buf->last_offset = 0; - buf->last_offset_sg = buf->table.sgt.sgl; - buf->sg_last_entry = 0; - } - - cur_offset = buf->last_offset; - - for_each_sg(buf->last_offset_sg, sg, - buf->table.sgt.orig_nents - buf->sg_last_entry, i) { - if (offset < sg->length + cur_offset) { - buf->last_offset_sg = sg; - buf->sg_last_entry += i; - buf->last_offset = cur_offset; - return nth_page(sg_page(sg), - (offset - cur_offset) / PAGE_SIZE); - } - cur_offset += sg->length; - } - return NULL; + return buf->page_list[offset / PAGE_SIZE]; } static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) @@ -121,7 +96,7 @@ static void mlx5vf_buf_read_done(struct mlx5_vhca_data_buffer *vhca_buf) struct mlx5_vf_migration_file *migf = vhca_buf->migf; if (vhca_buf->stop_copy_chunk_num) { - bool is_header = vhca_buf->dma_dir == DMA_NONE; + bool is_header = vhca_buf->iova.dir == DMA_NONE; u8 chunk_num = vhca_buf->stop_copy_chunk_num; size_t next_required_umem_size = 0; From patchwork Tue Jul 2 09:09:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719238 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B5C8176251; Tue, 2 Jul 2024 09:11:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911466; cv=none; b=ZoBaY6UI1AcyWDd0zAbOFYN+pYHu/15iU2a60ePFqiDo+G7orM3qDr4D9w62AjrqCnSEfn9xH2bYiRIRsLJi8rbHFcVBR4vzrnFTVSzpPM+htFefaY1akLmdYLYdPMuQmOJBQLKaC5tKjg3CoQHh3xw6t1P1vq4ID1oxsz9qr7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911466; c=relaxed/simple; bh=awqNz5OGdWfjQsJN8aSzpTwPuTxHlPEDhpz1xfEBgC0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iIF9GWMGywGFgrSfnW8pYPNq1p2rw5mvoMYa09ZPply7n748W1EJM+2OpbT2GzwW7G2tTeLvNoXTyffOcVRgQvBtYsRm6ZwtPoBylL//qBfHUpiP+KHOgwklax8mGMY2APqOvOnAQO5Zm1h+NHl0luxnnIuc7nR+ILzg9kjT7Tc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tmMV07HW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tmMV07HW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38628C116B1; Tue, 2 Jul 2024 09:11:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911466; bh=awqNz5OGdWfjQsJN8aSzpTwPuTxHlPEDhpz1xfEBgC0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tmMV07HWOakUhLEx0CG5Sw8D1u/vJiFeT/1LOA4/MYdwbWSy8GvydRXDag+XzNQuF 8Mrby6JUZOtPEv8fs7zqbYmgnJ3vQ0E8UKqg2AiGbf2B4rjUPAo8OMiBBUtmXlUtED I7lOY1LwtkVhj5RlocDgysGvTm0aU3G+jzWGhM9DlHlbRQ+NQwCweNkMi/QFwfNt5f Zt/8qIdBUHskSlNSHY+lwUUXkwkAM2VZSfuGEizQOnbvlLYGPXRFfuO7ZnqRmHvzP3 sHQW3H4gZFNAN04PzGFcNOHwkG6/bvqhEaQ9QjvKDRliH9gVdc0oevJLqLUVJ+y2YD 4NmY108G1I3Ow== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 17/18] block: export helper to get segment max size Date: Tue, 2 Jul 2024 12:09:47 +0300 Message-ID: <3649c1dc673ea0a49a90f3e01b76ef91fb90f076.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chaitanya Kulkarni Export the get_max_segment_size() so driver can do use that to create DMA mapping when it receives the request. Signed-off-by: Chaitanya Kulkarni Signed-off-by: Leon Romanovsky --- block/blk-merge.c | 3 ++- include/linux/blk-mq.h | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 8534c35e0497..0561e728ef95 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -190,7 +190,7 @@ static inline unsigned get_max_io_size(struct bio *bio, * * Returns the maximum number of bytes that can be added as a single segment. */ -static inline unsigned get_max_segment_size(const struct queue_limits *lim, +inline unsigned get_max_segment_size(const struct queue_limits *lim, struct page *start_page, unsigned long offset) { unsigned long mask = lim->seg_boundary_mask; @@ -203,6 +203,7 @@ static inline unsigned get_max_segment_size(const struct queue_limits *lim, */ return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1; } +EXPORT_SYMBOL_GPL(get_max_segment_size); /** * bvec_split_segs - verify whether or not a bvec should be split in the middle diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 89ba6b16fe8b..008c77c9b518 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -1150,4 +1150,7 @@ static inline int blk_rq_map_sg(struct request_queue *q, struct request *rq, } void blk_dump_rq_flags(struct request *, char *); +unsigned get_max_segment_size(const struct queue_limits *lim, + struct page *start_page, unsigned long offset); + #endif /* BLK_MQ_H */ From patchwork Tue Jul 2 09:09:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13719239 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A8C1156C40; Tue, 2 Jul 2024 09:11:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911471; cv=none; b=I6QDvszvIPMhiy+qerqlw/ono6Ik3lnxdaPj01J0IBNd2q55TuoXLiaZ+rtz9MlBEqzWydLkhWT977/E/A0otiZ/TZUytAnMmbtmGcRCk1Z77zpNYnihQZFvo9Mb68tEwe/1lH3dun3ZcM9AKa6OnpnSOILEF56ZfNx9Lc/+Z80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911471; c=relaxed/simple; bh=rWwAHyiwmax8yvMjrwSRAusPbIB/KT92oJ4QN5II8o8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uwuB1V+B4RBmHYgt06nUppznL3i1GWUtiAZK1O7I6jMl1foclbL8svmqvcxWqC+NzFkACaUmq4xpqjNpkn4g2OZ7DA1pbKbi97/PwgOYzaHl4cTCeDuEKvFLQZnvEyKSohBxuH3JWTEhQYeo7SIz5iIPSnZntXZsbLuznuZshK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y6hNK7RX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y6hNK7RX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 336E6C116B1; Tue, 2 Jul 2024 09:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911470; bh=rWwAHyiwmax8yvMjrwSRAusPbIB/KT92oJ4QN5II8o8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y6hNK7RXDkaA/i33kcQvJ4JKz8/yPjpD5lRL7DqV5Xmj2oQvXnyGYAQ2epTmINygy 0y19irtGTGWm+OtjW5NkAItWfTaPfLe+Fvzo6XpXK70xytP//3+E0opt8r1DuIjpCG 8r0JXPXWB2Z4hDwHlXlzcZ3/Bm1gsukpVrzK5gOHjyumUYCJ6hIXV0+iG4nt9CDCbA AVvnWwKND922egkc2IkGqZ7kYK3GZ63NIoDvDrhXUVUjhoQqnTcoiujq6cCinieq+O VSwcXGDkFUwWonCqIjy2AUNt+QmmYZvJX5fJ8Y5c5HTLjsPFdIJ2jHoXuGkDlZv1PP 3XgJzr2OB4ILA== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Keith Busch , Christoph Hellwig , "Zeng, Oak" , Chaitanya Kulkarni Cc: Sagi Grimberg , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 18/18] nvme-pci: use new dma API Date: Tue, 2 Jul 2024 12:09:48 +0300 Message-ID: <47eb0510b0a6aa52d9f5665d75fa7093dd6af53f.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Chaitanya Kulkarni Introduce a new structure, iod_dma_map, to hold the DMA mapping for each I/O. This includes the iova state and mapped addresses from dma_link_range() or dma_map_page_attrs(). Replace the existing sg_table in nvme_iod with struct dma_map. The size difference between :- struct nvme_iod with struct sg_table :- 184 struct nvme_iod with struct dma_map :- 176 In nvme_map_data(), allocate dma_map from mempool and iova using dma_alloc_iova(). Obtain the memory type from the first bvec of the first bio of the request and use that to decide whether we want to use iova or not. In the newly added function nvme_rq_dma_map(), perform DMA mapping for the bvec pages using nvme_dma_link_page(). Additionally, if NVMe SGL is provided, build SGL entry inline while creating this mapping to avoid extra traversal. Call nvme_rq_dma_map() from nvme_pci_setup_prps() and nvme_pci_setup_sgls(). For NVME SGL case, nvme_rq_dma_map() will handle building SGL inline. To build PRPs, use iod->dma_map->dma_link_address in nvme_pci_setup_prps() and increment the counter appropriately to retrieve the next set of DMA addresses. This demonstrates how the new DMA API can fit into the NVMe driver and replace the old DMA APIs. As this is an RFC, I expect more robust error handling, optimizations, and in-depth testing for the final version once we agree on DMA API architecture. Following is the performance comparision for existing DMA API case with sg_table and with dma_map, once we have agreement on the new DMA API design I intend to get similar profiling numbers for new DMA API. sgl (sg_table + old dma API ) vs no_sgl (iod_dma_map + new DMA API) :- block size IOPS (k) Average of 3 4K -------------------------------------------------------------- sg-list-fio-perf.bs-4k-1.fio: 68.6 sg-list-fio-perf.bs-4k-2.fio: 68 68.36 sg-list-fio-perf.bs-4k-3.fio: 68.5 no-sg-list-fio-perf.bs-4k-1.fio: 68.7 no-sg-list-fio-perf.bs-4k-2.fio: 68.5 68.43 no-sg-list-fio-perf.bs-4k-3.fio: 68.1 % Change default vs new DMA API = +0.0975% 8K -------------------------------------------------------------- sg-list-fio-perf.bs-8k-1.fio: 67 sg-list-fio-perf.bs-8k-2.fio: 67.1 67.03 sg-list-fio-perf.bs-8k-3.fio: 67 no-sg-list-fio-perf.bs-8k-1.fio: 66.7 no-sg-list-fio-perf.bs-8k-2.fio: 66.7 66.7 no-sg-list-fio-perf.bs-8k-3.fio: 66.7 % Change default vs new DMA API = +0.4993% 16K -------------------------------------------------------------- sg-list-fio-perf.bs-16k-1.fio: 63.8 sg-list-fio-perf.bs-16k-2.fio: 63.4 63.5 sg-list-fio-perf.bs-16k-3.fio: 63.3 no-sg-list-fio-perf.bs-16k-1.fio: 63.5 no-sg-list-fio-perf.bs-16k-2.fio: 63.4 63.33 no-sg-list-fio-perf.bs-16k-3.fio: 63.1 % Change default vs new DMA API = -0.2632% 32K -------------------------------------------------------------- sg-list-fio-perf.bs-32k-1.fio: 59.3 sg-list-fio-perf.bs-32k-2.fio: 59.3 59.36 sg-list-fio-perf.bs-32k-3.fio: 59.5 no-sg-list-fio-perf.bs-32k-1.fio: 59.5 no-sg-list-fio-perf.bs-32k-2.fio: 59.6 59.43 no-sg-list-fio-perf.bs-32k-3.fio: 59.2 % Change default vs new DMA API = +0.1122% 64K -------------------------------------------------------------- sg-list-fio-perf.bs-64k-1.fio: 53.7 sg-list-fio-perf.bs-64k-2.fio: 53.4 53.56 sg-list-fio-perf.bs-64k-3.fio: 53.6 no-sg-list-fio-perf.bs-64k-1.fio: 53.5 no-sg-list-fio-perf.bs-64k-2.fio: 53.8 53.63 no-sg-list-fio-perf.bs-64k-3.fio: 53.6 % Change default vs new DMA API = +0.1246% 128K -------------------------------------------------------------- sg-list-fio-perf/bs-128k-1.fio: 48 sg-list-fio-perf/bs-128k-2.fio: 46.4 47.13 sg-list-fio-perf/bs-128k-3.fio: 47 no-sg-list-fio-perf/bs-128k-1.fio: 46.6 no-sg-list-fio-perf/bs-128k-2.fio: 47 46.9 no-sg-list-fio-perf/bs-128k-3.fio: 47.1 % Change default vs new DMA API = −0.495% 256K -------------------------------------------------------------- sg-list-fio-perf/bs-256k-1.fio: 37 sg-list-fio-perf/bs-256k-2.fio: 41 39.93 sg-list-fio-perf/bs-256k-3.fio: 41.8 no-sg-list-fio-perf/bs-256k-1.fio: 37.5 no-sg-list-fio-perf/bs-256k-2.fio: 41.4 40.5 no-sg-list-fio-perf/bs-256k-3.fio: 42.6 % Change default vs new DMA API = +1.42% 512K -------------------------------------------------------------- sg-list-fio-perf/bs-512k-1.fio: 28.5 sg-list-fio-perf/bs-512k-2.fio: 28.2 28.4 sg-list-fio-perf/bs-512k-3.fio: 28.5 no-sg-list-fio-perf/bs-512k-1.fio: 28.7 no-sg-list-fio-perf/bs-512k-2.fio: 28.6 28.7 no-sg-list-fio-perf/bs-512k-3.fio: 28.8 % Change default vs new DMA API = +1.06% Signed-off-by: Chaitanya Kulkarni Signed-off-by: Leon Romanovsky --- drivers/nvme/host/pci.c | 283 ++++++++++++++++++++++++++++++---------- 1 file changed, 213 insertions(+), 70 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 102a9fb0c65f..53a71b03c794 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -221,6 +221,16 @@ union nvme_descriptor { __le64 *prp_list; }; +struct iod_dma_map { + bool use_iova; + struct dma_iova_state state; + struct dma_memory_type type; + struct dma_iova_attrs iova; + dma_addr_t dma_link_address[NVME_MAX_SEGS]; + u32 len[NVME_MAX_SEGS]; + u16 nr_dma_link_address; +}; + /* * The nvme_iod describes the data in an I/O. * @@ -236,7 +246,7 @@ struct nvme_iod { unsigned int dma_len; /* length of single DMA segment mapping */ dma_addr_t first_dma; dma_addr_t meta_dma; - struct sg_table sgt; + struct iod_dma_map *dma_map; union nvme_descriptor list[NVME_MAX_NR_ALLOCATIONS]; }; @@ -521,6 +531,26 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req, return true; } +static inline void nvme_dma_unlink_range(struct nvme_iod *iod) +{ + struct dma_iova_attrs *iova = &iod->dma_map->iova; + dma_addr_t addr; + u16 len; + u32 i; + + if (iod->dma_map->use_iova) { + dma_unlink_range(&iod->dma_map->state); + return; + } + + for (i = 0; i < iod->dma_map->nr_dma_link_address; i++) { + addr = iod->dma_map->dma_link_address[i]; + len = iod->dma_map->len[i]; + dma_unmap_page_attrs(iova->dev, addr, len, + iova->dir, iova->attrs); + } +} + static void nvme_free_prps(struct nvme_dev *dev, struct request *req) { const int last_prp = NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1; @@ -547,9 +577,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) return; } - WARN_ON_ONCE(!iod->sgt.nents); - - dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); + nvme_dma_unlink_range(iod); if (iod->nr_allocations == 0) dma_pool_free(dev->prp_small_pool, iod->list[0].sg_list, @@ -559,21 +587,123 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) iod->first_dma); else nvme_free_prps(dev, req); - mempool_free(iod->sgt.sgl, dev->iod_mempool); + + dma_free_iova(&iod->dma_map->iova); + mempool_free(iod->dma_map, dev->iod_mempool); } -static void nvme_print_sgl(struct scatterlist *sgl, int nents) +static inline dma_addr_t nvme_dma_link_page(struct page *page, + unsigned int poffset, + unsigned int len, + struct nvme_iod *iod) { - int i; - struct scatterlist *sg; + struct dma_iova_attrs *iova = &iod->dma_map->iova; + struct dma_iova_state *state = &iod->dma_map->state; + dma_addr_t dma_addr; + int ret; + + if (iod->dma_map->use_iova) { + phys_addr_t phys = page_to_phys(page) + poffset; + + dma_addr = state->iova->addr + state->range_size; + ret = dma_link_range(&iod->dma_map->state, phys, len); + if (ret) + return DMA_MAPPING_ERROR; + } else { + dma_addr = dma_map_page_attrs(iova->dev, page, poffset, len, + iova->dir, iova->attrs); + } + return dma_addr; +} + +static void nvme_pci_sgl_set_data(struct nvme_sgl_desc *sge, + dma_addr_t dma_addr, + unsigned int dma_len); + +static int __nvme_rq_dma_map(struct request *req, struct nvme_iod *iod, + struct nvme_sgl_desc *sgl_list) +{ + struct dma_iova_attrs *iova = &iod->dma_map->iova; + struct req_iterator iter; + struct bio_vec bv; + int cnt = 0; + dma_addr_t addr; + + iod->dma_map->nr_dma_link_address = 0; + rq_for_each_bvec(bv, req, iter) { + unsigned nbytes = bv.bv_len; + unsigned total = 0; + unsigned offset, len; + + if (bv.bv_offset + bv.bv_len <= PAGE_SIZE) { + addr = nvme_dma_link_page(bv.bv_page, bv.bv_offset, + bv.bv_len, iod); + if (dma_mapping_error(iova->dev, addr)) { + pr_err("dma_mapping_error %d\n", + dma_mapping_error(iova->dev, addr)); + return -ENOMEM; + } + + iod->dma_map->dma_link_address[cnt] = addr; + iod->dma_map->len[cnt] = bv.bv_len; + iod->dma_map->nr_dma_link_address++; + + if (sgl_list) + nvme_pci_sgl_set_data(&sgl_list[cnt], addr, + bv.bv_len); + cnt++; + continue; + } + while (nbytes > 0) { + struct page *page = bv.bv_page; + + offset = bv.bv_offset + total; + len = min(get_max_segment_size(&req->q->limits, page, + offset), nbytes); + + page += (offset >> PAGE_SHIFT); + offset &= ~PAGE_MASK; + + addr = nvme_dma_link_page(page, offset, len, iod); + if (dma_mapping_error(iova->dev, addr)) { + pr_err("dma_mapping_error2 %d\n", + dma_mapping_error(iova->dev, addr)); + return -ENOMEM; + } + + iod->dma_map->dma_link_address[cnt] = addr; + iod->dma_map->len[cnt] = len; + iod->dma_map->nr_dma_link_address++; - for_each_sg(sgl, sg, nents, i) { - dma_addr_t phys = sg_phys(sg); - pr_warn("sg[%d] phys_addr:%pad offset:%d length:%d " - "dma_address:%pad dma_length:%d\n", - i, &phys, sg->offset, sg->length, &sg_dma_address(sg), - sg_dma_len(sg)); + if (sgl_list) + nvme_pci_sgl_set_data(&sgl_list[cnt], addr, len); + + total += len; + nbytes -= len; + cnt++; + } + } + return cnt; +} + +static int nvme_rq_dma_map(struct request *req, struct nvme_iod *iod, + struct nvme_sgl_desc *sgl_list) +{ + int ret; + + if (iod->dma_map->use_iova) { + ret = dma_start_range(&iod->dma_map->state); + if (ret) { + pr_err("dma_start_dange_failed %d", ret); + return ret; + } + + ret = __nvme_rq_dma_map(req, iod, sgl_list); + dma_end_range(&iod->dma_map->state); + return ret; } + + return __nvme_rq_dma_map(req, iod, sgl_list); } static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, @@ -582,13 +712,23 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct dma_pool *pool; int length = blk_rq_payload_bytes(req); - struct scatterlist *sg = iod->sgt.sgl; - int dma_len = sg_dma_len(sg); - u64 dma_addr = sg_dma_address(sg); - int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1); + u16 dma_addr_cnt = 0; + int dma_len; + u64 dma_addr; + int offset; __le64 *prp_list; dma_addr_t prp_dma; int nprps, i; + int ret; + + ret = nvme_rq_dma_map(req, iod, NULL); + if (ret < 0) + return errno_to_blk_status(ret); + + dma_len = iod->dma_map->len[dma_addr_cnt]; + dma_addr = iod->dma_map->dma_link_address[dma_addr_cnt]; + offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1); + dma_addr_cnt++; length -= (NVME_CTRL_PAGE_SIZE - offset); if (length <= 0) { @@ -600,9 +740,9 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, if (dma_len) { dma_addr += (NVME_CTRL_PAGE_SIZE - offset); } else { - sg = sg_next(sg); - dma_addr = sg_dma_address(sg); - dma_len = sg_dma_len(sg); + dma_addr = iod->dma_map->dma_link_address[dma_addr_cnt]; + dma_len = iod->dma_map->len[dma_addr_cnt]; + dma_addr_cnt++; } if (length <= NVME_CTRL_PAGE_SIZE) { @@ -646,31 +786,29 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev, break; if (dma_len > 0) continue; - if (unlikely(dma_len < 0)) - goto bad_sgl; - sg = sg_next(sg); - dma_addr = sg_dma_address(sg); - dma_len = sg_dma_len(sg); + if (dma_addr_cnt >= iod->dma_map->nr_dma_link_address) + pr_err_ratelimited("dma_addr_cnt exceeded %u and %u\n", + dma_addr_cnt, + iod->dma_map->nr_dma_link_address); + dma_addr = iod->dma_map->dma_link_address[dma_addr_cnt]; + dma_len = iod->dma_map->len[dma_addr_cnt]; + dma_addr_cnt++; } done: - cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sgt.sgl)); + cmnd->dptr.prp1 = cpu_to_le64(iod->dma_map->dma_link_address[0]); cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma); + return BLK_STS_OK; free_prps: nvme_free_prps(dev, req); return BLK_STS_RESOURCE; -bad_sgl: - WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents), - "Invalid SGL for payload:%d nents:%d\n", - blk_rq_payload_bytes(req), iod->sgt.nents); - return BLK_STS_IOERR; } static void nvme_pci_sgl_set_data(struct nvme_sgl_desc *sge, - struct scatterlist *sg) + dma_addr_t dma_addr, unsigned int dma_len) { - sge->addr = cpu_to_le64(sg_dma_address(sg)); - sge->length = cpu_to_le32(sg_dma_len(sg)); + sge->addr = cpu_to_le64(dma_addr); + sge->length = cpu_to_le32(dma_len); sge->type = NVME_SGL_FMT_DATA_DESC << 4; } @@ -685,22 +823,16 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge, static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev, struct request *req, struct nvme_rw_command *cmd) { + unsigned int entries = blk_rq_nr_phys_segments(req); struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - struct dma_pool *pool; struct nvme_sgl_desc *sg_list; - struct scatterlist *sg = iod->sgt.sgl; - unsigned int entries = iod->sgt.nents; + struct dma_pool *pool; dma_addr_t sgl_dma; - int i = 0; + int ret; /* setting the transfer type as SGL */ cmd->flags = NVME_CMD_SGL_METABUF; - if (entries == 1) { - nvme_pci_sgl_set_data(&cmd->dptr.sgl, sg); - return BLK_STS_OK; - } - if (entries <= (256 / sizeof(struct nvme_sgl_desc))) { pool = dev->prp_small_pool; iod->nr_allocations = 0; @@ -718,12 +850,11 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev, iod->list[0].sg_list = sg_list; iod->first_dma = sgl_dma; - nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries); - do { - nvme_pci_sgl_set_data(&sg_list[i++], sg); - sg = sg_next(sg); - } while (--entries > 0); + ret = nvme_rq_dma_map(req, iod, sg_list); + if (ret < 0) + return errno_to_blk_status(ret); + nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, ret); return BLK_STS_OK; } @@ -791,34 +922,47 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, } iod->dma_len = 0; - iod->sgt.sgl = mempool_alloc(dev->iod_mempool, GFP_ATOMIC); - if (!iod->sgt.sgl) + iod->dma_map = mempool_alloc(dev->iod_mempool, GFP_ATOMIC); + if (!iod->dma_map) return BLK_STS_RESOURCE; - sg_init_table(iod->sgt.sgl, blk_rq_nr_phys_segments(req)); - iod->sgt.orig_nents = blk_rq_map_sg(req->q, req, iod->sgt.sgl); - if (!iod->sgt.orig_nents) - goto out_free_sg; - rc = dma_map_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), - DMA_ATTR_NO_WARN); - if (rc) { - if (rc == -EREMOTEIO) - ret = BLK_STS_TARGET; - goto out_free_sg; - } + iod->dma_map->state.range_size = 0; + iod->dma_map->iova.dev = dev->dev; + iod->dma_map->iova.dir = rq_dma_dir(req); + iod->dma_map->iova.attrs = DMA_ATTR_NO_WARN; + iod->dma_map->iova.size = blk_rq_payload_bytes(req); + if (!iod->dma_map->iova.size) + goto free_iod_map; + + rc = dma_alloc_iova(&iod->dma_map->iova); + if (rc) + goto free_iod_map; + + /* + * Following call assumes that all the biovecs belongs to this request + * are of the same type. + */ + dma_get_memory_type(req->bio->bi_io_vec[0].bv_page, + &iod->dma_map->type); + iod->dma_map->state.iova = &iod->dma_map->iova; + iod->dma_map->state.type = &iod->dma_map->type; + + iod->dma_map->use_iova = + dma_can_use_iova(&iod->dma_map->state, + req->bio->bi_io_vec[0].bv_len); - if (nvme_pci_use_sgls(dev, req, iod->sgt.nents)) + if (nvme_pci_use_sgls(dev, req, blk_rq_nr_phys_segments(req))) ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw); else ret = nvme_pci_setup_prps(dev, req, &cmnd->rw); if (ret != BLK_STS_OK) - goto out_unmap_sg; + goto free_iova; return BLK_STS_OK; -out_unmap_sg: - dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); -out_free_sg: - mempool_free(iod->sgt.sgl, dev->iod_mempool); +free_iova: + dma_free_iova(&iod->dma_map->iova); +free_iod_map: + mempool_free(iod->dma_map, dev->iod_mempool); return ret; } @@ -842,7 +986,6 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) iod->aborted = false; iod->nr_allocations = -1; - iod->sgt.nents = 0; ret = nvme_setup_cmd(req->q->queuedata, req); if (ret) @@ -2670,7 +2813,7 @@ static void nvme_release_prp_pools(struct nvme_dev *dev) static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev) { - size_t alloc_size = sizeof(struct scatterlist) * NVME_MAX_SEGS; + size_t alloc_size = sizeof(struct iod_dma_map); dev->iod_mempool = mempool_create_node(1, mempool_kmalloc, mempool_kfree,