From patchwork Wed Feb 5 14:40:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13961191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C32ADC02192 for ; Wed, 5 Feb 2025 14:41:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51EFF280009; Wed, 5 Feb 2025 09:41:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CDB1280002; Wed, 5 Feb 2025 09:41:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 347CC280009; Wed, 5 Feb 2025 09:41:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0F54C280002 for ; Wed, 5 Feb 2025 09:41:48 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BC2D680DA0 for ; Wed, 5 Feb 2025 14:41:47 +0000 (UTC) X-FDA: 83086155054.04.FE5217E Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf29.hostedemail.com (Postfix) with ESMTP id 2E566120014 for ; Wed, 5 Feb 2025 14:41:46 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=TsI3Z0kq; spf=pass (imf29.hostedemail.com: domain of leon@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738766506; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K150YFnZpFzxE2Tw5QAd++iqX2dnfkDotaa8eZVuYsU=; b=5uCtSkgNh0/vyDEOuHv01mtwob3q+vlgg0ilDaziiWXRsikV3+5raUIptqzwyVYGv76Fp8 c21jq60RH54GDudEgAxF7ANd5pMSkuQ+MwgK4luTpyZPGhq+1XwcMb6MHbFnSqI81LIC9W fbc6Qy7HWLsvSd9Wkr46/l+a7VdFeG4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=TsI3Z0kq; spf=pass (imf29.hostedemail.com: domain of leon@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738766506; a=rsa-sha256; cv=none; b=tlZo7VpMVvautTwtStPSShdLZ1fc4PCnphFwypm7gh/CtdxnXF3IjMG8+KNsPOJWUxIElh ViEfZFJbmRRqhbkdpVK2boxwEy8Ze1u1jI5Kyco03j8SFCsHrWub1OH7GkyG3cZcKeF91N rcRMb9XW1yNAojykiO0LhPk/ITwzGXw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 54E64A436BA; Wed, 5 Feb 2025 14:39:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 130B7C4CEE4; Wed, 5 Feb 2025 14:41:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1738766505; bh=Xatk+lrCVGdwoGOa8kfsR+WTT/K6zVHeNO1fIMQwJWk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TsI3Z0kq7ZVEwmy8YAFG4Sb3RiEYefgmSarFC8IzZVi06U/Pyjyx978u0II1wpQut 03IReTCDXxBvQeqIji3ZclaIGOF3WxNnCdb+MFGxiDJ6AOJVSTNUoOHgF98BmdYz0o UO9wRPYO6HQb6BUkRqcB//usSl43vmMPla0/hOfM5VyANGaig3Ga5m8N8XX61VQkBI qYMFXLG5DEsy2cB6106zdrephQwUzoYup8CL8ipaVKu+T5PkSWBsNpZYYY94mxL5rY pK1Lo1WRWU1eaQ04Wb3/AoTnz4zQ5a/NzWjrjIGSOCP2eVlj8WQGyHVID0whVSB7I5 V5zMMrhjfYpWg== From: Leon Romanovsky To: Christoph Hellwig , Jason Gunthorpe , Robin Murphy Cc: Leon Romanovsky , Jens Axboe , Joerg Roedel , Will Deacon , Sagi Grimberg , Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, Randy Dunlap Subject: [PATCH v7 11/17] mm/hmm: provide generic DMA managing logic Date: Wed, 5 Feb 2025 16:40:31 +0200 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2E566120014 X-Stat-Signature: 8jsjjgjm65osoqkxd97w3uq5eahbpo8x X-Rspam-User: X-HE-Tag: 1738766506-963267 X-HE-Meta: U2FsdGVkX18bWrA4Kyh+L6dNIQ+3mnYUuNSD4aV+XqBGs5JFlysLI9Jl2S0dkcPLgtmM7/v/gaSZQ8nS/ajP0Yoxp4iwMmb5mwmSQaR3CFjGDti/rCfNIMEIRFEKlJNT4jjPbI+q+NqEQ8WcHzrz6p3wsCNfnstBsMtNHO9/u3k2KENZoTnMsLQVXZ0YtRIBeNNMJkf4gFoFKSdZyJMge5TgtKzI1Ob/AGLQ5/8cFKmXo/pg4nBpJr/GwfrQLsa79aJnB5cDTse3fFCqWnGTsfN6gNtbBgKOznW4tnkChharADdQawi2ONemJYKYiwTaZXKBvB4gkaSuZKpr250toSioynYpvhO82G8mPfoJiNvWrpMHhycHXXYBDG7xf9oBkFYO1QL7BHQLqIdk/NiweZllBiieO4VfKUQxMbr9WA/2XlI9ddvRboS2R02NI1/ksNO+na8L1vyxwzpWJGThwBk7bddhhCTusiKVQrKtq3GBG1Z8j/orINwh4G3Dl6MxNLlFPv+n+4NHcgMEr1tx7GYA+5RaJXtAPKvioCiHMZpnTa+gItT+Ao+sudjWmfOQfK7xJuqJLYngCI3/sSPlQhJErrd54MzkfXU5UBJrzU4j3rb9aL1lkvnre2uUwXOtWRqvWDcL/I/OfMNRuR/DmlRdLdf9YeBs5Kiqt6d1SM5OC4jYZHqapa6j2V7v43wsOF1epcKxjgiieL26+0lA070z6w5807XVVUP2ajEuhy8SnbSpOTfLfBt04GTQykv2ySF2bLsxN3k5lyZC9UOaaAQCWENL2CkxyyE2ENpPdXLjSESA9V2UrsaLOrnQb+mr+6E0Es087x1++LuHQoOLwpq8BM3HxdAvYtL9EsiZ+cTOGu5DyeDQhSQcNz9AdVC4lAihYZPQ4sXKJUJG0vA5UHw/ig/6G5fDhRzp/RtYCn8fxAhfYg9uzbUvmbYb9Kb1k0AGX1lhk/c59L8xvmj zP7IWnWW AX7jvuWpTPaCase2iFF0rojB7aP1bRQnjwSFQ66YCBRN7FlzZP2d9/VOYHCiGhjwfoHXC5wV6eDpNbn/ZDZu0uXKfZPMtFn9nUtfL/Cb/VNya2p1GmDY2EWuhvePpq9ouPWqGxvNWm5W7yd3BUFm5Ld2XCVDto6WgRkdffq0+j888aTOsWRYqiKIrRn7EtEawFVOF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Leon Romanovsky HMM callers use PFN list to populate range while calling to hmm_range_fault(), the conversion from PFN to DMA address is done by the callers with help of another DMA list. However, it is wasteful on any modern platform and by doing the right logic, that DMA list can be avoided. Provide generic logic to manage these lists and gave an interface to map/unmap PFNs to DMA addresses, without requiring from the callers to be an experts in DMA core API. Signed-off-by: Leon Romanovsky --- include/linux/hmm-dma.h | 33 ++++++ include/linux/hmm.h | 4 + mm/hmm.c | 215 +++++++++++++++++++++++++++++++++++++++- 3 files changed, 251 insertions(+), 1 deletion(-) create mode 100644 include/linux/hmm-dma.h diff --git a/include/linux/hmm-dma.h b/include/linux/hmm-dma.h new file mode 100644 index 000000000000..f58b9fc71999 --- /dev/null +++ b/include/linux/hmm-dma.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Copyright (c) 2024 NVIDIA Corporation & Affiliates */ +#ifndef LINUX_HMM_DMA_H +#define LINUX_HMM_DMA_H + +#include + +struct dma_iova_state; +struct pci_p2pdma_map_state; + +/* + * struct hmm_dma_map - array of PFNs and DMA addresses + * + * @state: DMA IOVA state + * @pfns: array of PFNs + * @dma_list: array of DMA addresses + * @dma_entry_size: size of each DMA entry in the array + */ +struct hmm_dma_map { + struct dma_iova_state state; + unsigned long *pfn_list; + dma_addr_t *dma_list; + size_t dma_entry_size; +}; + +int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map, + size_t nr_entries, size_t dma_entry_size); +void hmm_dma_map_free(struct device *dev, struct hmm_dma_map *map); +dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map, + size_t idx, + struct pci_p2pdma_map_state *p2pdma_state); +bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx); +#endif /* LINUX_HMM_DMA_H */ diff --git a/include/linux/hmm.h b/include/linux/hmm.h index a1ddbedc19c0..1bc33e4c20ea 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -23,6 +23,8 @@ struct mmu_interval_notifier; * HMM_PFN_WRITE - if the page memory can be written to (requires HMM_PFN_VALID) * HMM_PFN_ERROR - accessing the pfn is impossible and the device should * fail. ie poisoned memory, special pages, no vma, etc + * HMM_PFN_P2PDMA - P2P page + * HMM_PFN_P2PDMA_BUS - Bus mapped P2P transfer * HMM_PFN_DMA_MAPPED - Flag preserved on input-to-output transformation * to mark that page is already DMA mapped * @@ -43,6 +45,8 @@ enum hmm_pfn_flags { * Sticky flags, carried from input to output, * don't forget to update HMM_PFN_INOUT_FLAGS */ + HMM_PFN_P2PDMA = 1UL << (BITS_PER_LONG - 5), + HMM_PFN_P2PDMA_BUS = 1UL << (BITS_PER_LONG - 6), HMM_PFN_DMA_MAPPED = 1UL << (BITS_PER_LONG - 7), HMM_PFN_ORDER_SHIFT = (BITS_PER_LONG - 8), diff --git a/mm/hmm.c b/mm/hmm.c index da5743f6d854..c9a0c396b288 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -10,6 +10,7 @@ */ #include #include +#include #include #include #include @@ -23,6 +24,7 @@ #include #include #include +#include #include #include @@ -41,7 +43,8 @@ enum { enum { /* These flags are carried from input-to-output */ - HMM_PFN_INOUT_FLAGS = HMM_PFN_DMA_MAPPED, + HMM_PFN_INOUT_FLAGS = HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | + HMM_PFN_P2PDMA_BUS, }; static int hmm_pfns_fill(unsigned long addr, unsigned long end, @@ -620,3 +623,213 @@ int hmm_range_fault(struct hmm_range *range) return ret; } EXPORT_SYMBOL(hmm_range_fault); + +/** + * hmm_dma_map_alloc - Allocate HMM map structure + * @dev: device to allocate structure for + * @map: HMM map to allocate + * @nr_entries: number of entries in the map + * @dma_entry_size: size of the DMA entry in the map + * + * Allocate the HMM map structure and all the lists it contains. + * Return 0 on success, -ENOMEM on failure. + */ +int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map, + size_t nr_entries, size_t dma_entry_size) +{ + bool dma_need_sync = false; + bool use_iova; + + if (!(nr_entries * PAGE_SIZE / dma_entry_size)) + return -EINVAL; + + /* + * The HMM API violates our normal DMA buffer ownership rules and can't + * transfer buffer ownership. The dma_addressing_limited() check is a + * best approximation to ensure no swiotlb buffering happens. + */ +#ifdef CONFIG_DMA_NEED_SYNC + dma_need_sync = !dev->dma_skip_sync; +#endif /* CONFIG_DMA_NEED_SYNC */ + if (dma_need_sync || dma_addressing_limited(dev)) + return -EOPNOTSUPP; + + map->dma_entry_size = dma_entry_size; + map->pfn_list = + kvcalloc(nr_entries, sizeof(*map->pfn_list), GFP_KERNEL); + if (!map->pfn_list) + return -ENOMEM; + + use_iova = dma_iova_try_alloc(dev, &map->state, 0, + nr_entries * PAGE_SIZE); + if (!use_iova && dma_need_unmap(dev)) { + map->dma_list = kvcalloc(nr_entries, sizeof(*map->dma_list), + GFP_KERNEL); + if (!map->dma_list) + goto err_dma; + } + return 0; + +err_dma: + kvfree(map->pfn_list); + return -ENOMEM; +} +EXPORT_SYMBOL_GPL(hmm_dma_map_alloc); + +/** + * hmm_dma_map_free - iFree HMM map structure + * @dev: device to free structure from + * @map: HMM map containing the various lists and state + * + * Free the HMM map structure and all the lists it contains. + */ +void hmm_dma_map_free(struct device *dev, struct hmm_dma_map *map) +{ + if (dma_use_iova(&map->state)) + dma_iova_free(dev, &map->state); + kvfree(map->pfn_list); + kvfree(map->dma_list); +} +EXPORT_SYMBOL_GPL(hmm_dma_map_free); + +/** + * hmm_dma_map_pfn - Map a physical HMM page to DMA address + * @dev: Device to map the page for + * @map: HMM map + * @idx: Index into the PFN and dma address arrays + * @p2pdma_state: PCI P2P state. + * + * dma_alloc_iova() allocates IOVA based on the size specified by their use in + * iova->size. Call this function after IOVA allocation to link whole @page + * to get the DMA address. Note that very first call to this function + * will have @offset set to 0 in the IOVA space allocated from + * dma_alloc_iova(). For subsequent calls to this function on same @iova, + * @offset needs to be advanced by the caller with the size of previous + * page that was linked + DMA address returned for the previous page that was + * linked by this function. + */ +dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map, + size_t idx, + struct pci_p2pdma_map_state *p2pdma_state) +{ + struct dma_iova_state *state = &map->state; + dma_addr_t *dma_addrs = map->dma_list; + unsigned long *pfns = map->pfn_list; + struct page *page = hmm_pfn_to_page(pfns[idx]); + phys_addr_t paddr = hmm_pfn_to_phys(pfns[idx]); + size_t offset = idx * map->dma_entry_size; + unsigned long attrs = 0; + dma_addr_t dma_addr; + int ret; + + if ((pfns[idx] & HMM_PFN_DMA_MAPPED) && + !(pfns[idx] & HMM_PFN_P2PDMA_BUS)) { + /* + * We are in this flow when there is a need to resync flags, + * for example when page was already linked in prefetch call + * with READ flag and now we need to add WRITE flag + * + * This page was already programmed to HW and we don't want/need + * to unlink and link it again just to resync flags. + */ + if (dma_use_iova(state)) + return state->addr + offset; + + /* + * Without dma_need_unmap, the dma_addrs array is NULL, thus we + * need to regenerate the address below even if there already + * was a mapping. But !dma_need_unmap implies that the + * mapping stateless, so this is fine. + */ + if (dma_need_unmap(dev)) + return dma_addrs[idx]; + + /* Continue to remapping */ + } + + switch (pci_p2pdma_state(p2pdma_state, dev, page)) { + case PCI_P2PDMA_MAP_NONE: + break; + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: + attrs |= DMA_ATTR_SKIP_CPU_SYNC; + pfns[idx] |= HMM_PFN_P2PDMA; + break; + case PCI_P2PDMA_MAP_BUS_ADDR: + pfns[idx] |= HMM_PFN_P2PDMA_BUS | HMM_PFN_DMA_MAPPED; + return pci_p2pdma_bus_addr_map(p2pdma_state, paddr); + default: + return DMA_MAPPING_ERROR; + } + + if (dma_use_iova(state)) { + ret = dma_iova_link(dev, state, paddr, offset, + map->dma_entry_size, DMA_BIDIRECTIONAL, + attrs); + if (ret) + goto error; + + ret = dma_iova_sync(dev, state, offset, map->dma_entry_size); + if (ret) { + dma_iova_unlink(dev, state, offset, map->dma_entry_size, + DMA_BIDIRECTIONAL, attrs); + goto error; + } + + dma_addr = state->addr + offset; + } else { + if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs)) + goto error; + + dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size, + DMA_BIDIRECTIONAL); + if (dma_mapping_error(dev, dma_addr)) + goto error; + + if (dma_need_unmap(dev)) + dma_addrs[idx] = dma_addr; + } + pfns[idx] |= HMM_PFN_DMA_MAPPED; + return dma_addr; +error: + pfns[idx] &= ~HMM_PFN_P2PDMA; + return DMA_MAPPING_ERROR; + +} +EXPORT_SYMBOL_GPL(hmm_dma_map_pfn); + +/** + * hmm_dma_unmap_pfn - Unmap a physical HMM page from DMA address + * @dev: Device to unmap the page from + * @map: HMM map + * @idx: Index of the PFN to unmap + * + * Returns true if the PFN was mapped and has been unmapped, false otherwise. + */ +bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx) +{ + struct dma_iova_state *state = &map->state; + dma_addr_t *dma_addrs = map->dma_list; + unsigned long *pfns = map->pfn_list; + unsigned long attrs = 0; + +#define HMM_PFN_VALID_DMA (HMM_PFN_VALID | HMM_PFN_DMA_MAPPED) + if ((pfns[idx] & HMM_PFN_VALID_DMA) != HMM_PFN_VALID_DMA) + return false; +#undef HMM_PFN_VALID_DMA + + if (pfns[idx] & HMM_PFN_P2PDMA_BUS) + ; /* no need to unmap bus address P2P mappings */ + else if (dma_use_iova(state)) { + if (pfns[idx] & HMM_PFN_P2PDMA) + attrs |= DMA_ATTR_SKIP_CPU_SYNC; + dma_iova_unlink(dev, state, idx * map->dma_entry_size, + map->dma_entry_size, DMA_BIDIRECTIONAL, attrs); + } else if (dma_need_unmap(dev)) + dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size, + DMA_BIDIRECTIONAL); + + pfns[idx] &= + ~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS); + return true; +} +EXPORT_SYMBOL_GPL(hmm_dma_unmap_pfn);