From patchwork Wed Oct 30 15:12:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 13856684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DC52D5CCB3 for ; Wed, 30 Oct 2024 15:14:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF5158E0008; Wed, 30 Oct 2024 11:14:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA4C28E0001; Wed, 30 Oct 2024 11:14:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91F728E0008; Wed, 30 Oct 2024 11:14:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6AEE48E0001 for ; Wed, 30 Oct 2024 11:14:00 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 21C5DA0D56 for ; Wed, 30 Oct 2024 15:14:00 +0000 (UTC) X-FDA: 82730613546.02.72E0E0A Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf07.hostedemail.com (Postfix) with ESMTP id 62E4F40021 for ; Wed, 30 Oct 2024 15:13:22 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Y324KZxb; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of leon@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730301108; a=rsa-sha256; cv=none; b=1RpjroQJoDTc4LOc/R3lRRPzLt2TKfBZxLaG6qlhmSFJSUScsexEUKKCr+w9LUlJ+DLspk cmIsIl3lda4D4Jfc4RWmCjs3DJXqnsQbkZnEpL303KcS+nR6+y4af3zGjjnUqLlEdTmlpG 8ivk8k8xh7+Ay+CpyN8cEwugT9NUM3o= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Y324KZxb; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of leon@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730301108; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GQ0bw/NEB2jBz2xPBGymFBm8toO4/v03Hw2X/I1dfQc=; b=W9TN4acU9OJKJgjHI4WU0JSeCqv0jdla4wvU+niLK3PwMnmCOVqcZe3bCgcFnPVtNSq7It dDwEuKLm5inPEAJQVoZvoAfUjNI9VF6aI7CuhJ8w12qhOoIAKwQeXayD6cvAkx9szb93yw ehrLXg20pqxfMMIKbO4oqAn87X4N+G0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 2CA7DA418C0; Wed, 30 Oct 2024 15:12:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DEF74C4CED1; Wed, 30 Oct 2024 15:13:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730301237; bh=87gpgRHIgXynVPMHMmjNp7ijROtjbPM+GbsYeyGrfQ4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y324KZxbya3gSzwXzEuzoqUy3uEOTBvdqYmb4j57b954TWYJQTu2JBW9o6T5zzp+p YH2Ecop7h+tWnG3KasQOK1qxsPklEnDAHDfKyist8ki/3tStjtLM96QCQ34jF2bqG1 aVwKEBvysGIDSA3MdwJ4Ijd5OnDIzYg/39ZOXpNHgb0ATgjBGw6stBGeTjYD0fM81C qir52cXG0CqX7r0igDbdFaatlUENXehQeloTJAgCy1zAyQUVY1kwVaWpDJ5TvD4ytE Y3XcMfS1yaS2murETzmKjGMpIbtJQjeGiW8ei53Cr4wTQp2Gwxx9wyBg3Y59SDJmF8 kyRxG6tbZJMTw== From: Leon Romanovsky To: Jens Axboe , Jason Gunthorpe , Robin Murphy , Joerg Roedel , Will Deacon , Christoph Hellwig , Sagi Grimberg Cc: Leon Romanovsky , Keith Busch , Bjorn Helgaas , Logan Gunthorpe , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , Marek Szyprowski , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 11/17] mm/hmm: provide generic DMA managing logic Date: Wed, 30 Oct 2024 17:12:57 +0200 Message-ID: <3cf57ecd01f42e1fe181329e65b95b4dacbb9443.1730298502.git.leon@kernel.org> X-Mailer: git-send-email 2.46.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 62E4F40021 X-Stat-Signature: zgkpuoi9y9cifqpnenmxhbq4tjt8tusb X-Rspam-User: X-HE-Tag: 1730301202-756333 X-HE-Meta: U2FsdGVkX18kTr962j9wsYZzwmBdPGe6AcUvR9v2KUg1fmZK85ATxqgDu24LBHIubRumKu+t7Gh7X0g8Av9DiBL45s+iXCa37FAIh2RX/raeusrxOIgMrVP2qLQmtLbryD6nIdKGLbaHg/CVfFDQWjAxPuQ23cFPufsHwp2wHR9ZyABZiF1b/lOdFpWZuZ6teCqQMnpJ4+Rf+p3++FwD5EB5SUZUsgiQyjyGQSWz5KrGHWn1eACSdG+Lmz5lPXoQIxNRGcf+9IMVLtXEWfwK0X8d23jOtsb17v2b9msz57dvEvRf1+vUwi1ygJYUq9uzkMMKsAQ59CahYdienzO55ZAkiHxw3cIxvy/uvj6e5CRJL9ElalnpMAj4zPWZcwxjnRPtiQUtspWPCDLp1QFrswFUUp6dx7vov5m9jCd45zmifGrv7bWxM+7Qi2V5zYD8TaHERDBB2reDqFHQIG1yJvzOgH0e2Rui7JopYqMbixXZxaHB1673wVHvGOYCQPvpWoy7RtVxVR9JLi0/5t00HhJQR7BjLxoC3qMVMpA6gJbFg9H3EUgwr8Te6iNi7vUFg3wphVWysF8J8QDl9Ryd3ZqqQOw5HSbAeDcMSwWLNGZWM/BJ2pXHHmGdqesoC2O8RurDVX4de41SEzbwDAnxVkDni191rYGaUOIcAquPTmkg5ETmUnWx4Ik2btQOFs+UA0VF6NoSRdxqNXkylOdVZwWwnksqYLC83QPL9MJyaqPmE22nqnsNVzvXVODeeqGh/3NAxQO9RgAEDq5lvuUjzNdWBlxHzP9Pbg8jOFdyGjgyJiQsCIllMKmo2H2b52m0YCQq/AnLGtDa6LcXX0t4jV/0KelZez4eQcEPcFnun+L420H+DF7s5IwObuGT78odn3UVWfV+i7dGA39iqBByY0pR1k1FRwLEU9jqk1+tR8Ltf5A/ZPxzFLLfr/2ol0Luk0F+dV6RHvqAYHZwfAm E7yHDG1Y Mph7vfKRfVgVOoOqktZDJdye/AGsJRxeZio/9nWOQMGOVlNAVRal3tzmOca5JnpdRp0EYRhmoHa2hYth7LonaDeYfGnHfFbvNrOG749Z23o9scpFKvtCISnU3KfI7j7I9xRQQMLpLAyGSp0MTpsgmj/eZdmCfU8mdWTNAqh4pIXwChfINnFiWPemC1uSctUPC60Ht0TOxyNqx67ddp79N9NOsgS7Fs1Kz/oEPBUP7dvegQG4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Leon Romanovsky HMM callers use PFN list to populate range while calling to hmm_range_fault(), the conversion from PFN to DMA address is done by the callers with help of another DMA list. However, it is wasteful on any modern platform and by doing the right logic, that DMA list can be avoided. Provide generic logic to manage these lists and gave an interface to map/unmap PFNs to DMA addresses, without requiring from the callers to be an experts in DMA core API. Signed-off-by: Leon Romanovsky --- include/linux/hmm-dma.h | 32 +++++++ include/linux/hmm.h | 2 + mm/hmm.c | 197 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 231 insertions(+) create mode 100644 include/linux/hmm-dma.h diff --git a/include/linux/hmm-dma.h b/include/linux/hmm-dma.h new file mode 100644 index 000000000000..f6ce2a00d74d --- /dev/null +++ b/include/linux/hmm-dma.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Copyright (c) 2024 NVIDIA Corporation & Affiliates */ +#ifndef LINUX_HMM_DMA_H +#define LINUX_HMM_DMA_H + +#include + +struct dma_iova_state; +struct pci_p2pdma_map_state; + +/* + * struct hmm_dma_map - array of PFNs and DMA addresses + * + * @state: DMA IOVA state + * @pfns: array of PFNs + * @dma_list: array of DMA addresses + * @dma_entry_size: size of each DMA entry in the array + */ +struct hmm_dma_map { + struct dma_iova_state state; + unsigned long *pfn_list; + dma_addr_t *dma_list; + size_t dma_entry_size; +}; + +int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map, + size_t nr_entries, size_t dma_entry_size); +void hmm_dma_map_free(struct device *dev, struct hmm_dma_map *map); +dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map, + size_t idx, struct pci_p2pdma_map_state *p2pdma_state); +bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx); +#endif /* LINUX_HMM_DMA_H */ diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 5dd655f6766b..62980ca8f3c5 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -23,6 +23,7 @@ struct mmu_interval_notifier; * HMM_PFN_WRITE - if the page memory can be written to (requires HMM_PFN_VALID) * HMM_PFN_ERROR - accessing the pfn is impossible and the device should * fail. ie poisoned memory, special pages, no vma, etc + * HMM_PFN_P2PDMA_BUS - Bus mapped P2P transfer * HMM_PFN_DMA_MAPPED - Flag preserved on input-to-output transformation * to mark that page is already DMA mapped * @@ -40,6 +41,7 @@ enum hmm_pfn_flags { HMM_PFN_ERROR = 1UL << (BITS_PER_LONG - 3), /* Sticky flag, carried from Input to Output */ + HMM_PFN_P2PDMA_BUS = 1UL << (BITS_PER_LONG - 6), HMM_PFN_DMA_MAPPED = 1UL << (BITS_PER_LONG - 7), HMM_PFN_ORDER_SHIFT = (BITS_PER_LONG - 8), diff --git a/mm/hmm.c b/mm/hmm.c index 2a0c34d7cb2b..a852d8337c73 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -10,6 +10,7 @@ */ #include #include +#include #include #include #include @@ -23,6 +24,7 @@ #include #include #include +#include #include #include @@ -615,3 +617,198 @@ int hmm_range_fault(struct hmm_range *range) return ret; } EXPORT_SYMBOL(hmm_range_fault); + +/** + * hmm_dma_map_alloc - Allocate HMM map structure + * @dev: device to allocate structure for + * @map: HMM map to allocate + * @nr_entries: number of entries in the map + * @dma_entry_size: size of the DMA entry in the map + * + * Allocate the HMM map structure and all the lists it contains. + * Return 0 on success, -ENOMEM on failure. + */ +int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map, + size_t nr_entries, size_t dma_entry_size) +{ + bool dma_need_sync = false; + bool use_iova; + + if (!(nr_entries * PAGE_SIZE / dma_entry_size)) + return -EINVAL; + + /* + * The HMM API violates our normal DMA buffer ownership rules and can't + * transfer buffer ownership. The dma_addressing_limited() check is a + * best approximation to ensure no swiotlb buffering happens. + */ + if (IS_ENABLED(CONFIG_DMA_NEED_SYNC)) + dma_need_sync = !dev->dma_skip_sync; + if (dma_need_sync || dma_addressing_limited(dev)) + return -EOPNOTSUPP; + + map->dma_entry_size = dma_entry_size; + map->pfn_list = + kvcalloc(nr_entries, sizeof(*map->pfn_list), GFP_KERNEL); + if (!map->pfn_list) + return -ENOMEM; + + use_iova = dma_iova_try_alloc(dev, &map->state, 0, + nr_entries * PAGE_SIZE); + if (!use_iova && dma_need_unmap(dev)) { + map->dma_list = kvcalloc(nr_entries, sizeof(*map->dma_list), + GFP_KERNEL); + if (!map->dma_list) + goto err_dma; + } + return 0; + +err_dma: + kfree(map->pfn_list); + return -ENOMEM; +} +EXPORT_SYMBOL_GPL(hmm_dma_map_alloc); + +/** + * hmm_dma_map_free - iFree HMM map structure + * @dev: device to free structure from + * @map: HMM map containing the various lists and state + * + * Free the HMM map structure and all the lists it contains. + */ +void hmm_dma_map_free(struct device *dev, struct hmm_dma_map *map) +{ + if (dma_use_iova(&map->state)) + dma_iova_free(dev, &map->state); + kfree(map->pfn_list); + kfree(map->dma_list); +} +EXPORT_SYMBOL_GPL(hmm_dma_map_free); + +/** + * hmm_dma_map_pfn - Map a physical HMM page to DMA address + * @dev: Device to map the page for + * @map: HMM map + * @idx: Index into the PFN and dma address arrays + * @pci_p2pdma_map_state: PCI P2P state. + * + * dma_alloc_iova() allocates IOVA based on the size specified by their use in + * iova->size. Call this function after IOVA allocation to link whole @page + * to get the DMA address. Note that very first call to this function + * will have @offset set to 0 in the IOVA space allocated from + * dma_alloc_iova(). For subsequent calls to this function on same @iova, + * @offset needs to be advanced by the caller with the size of previous + * page that was linked + DMA address returned for the previous page that was + * linked by this function. + */ +dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map, + size_t idx, struct pci_p2pdma_map_state *p2pdma_state) +{ + struct dma_iova_state *state = &map->state; + dma_addr_t *dma_addrs = map->dma_list; + unsigned long *pfns = map->pfn_list; + struct page *page = hmm_pfn_to_page(pfns[idx]); + phys_addr_t paddr = hmm_pfn_to_phys(pfns[idx]); + size_t offset = idx * map->dma_entry_size; + dma_addr_t dma_addr; + int ret; + + if ((pfns[idx] & HMM_PFN_DMA_MAPPED) && + !(pfns[idx] & HMM_PFN_P2PDMA_BUS)) { + /* + * We are in this flow when there is a need to resync flags, + * for example when page was already linked in prefetch call + * with READ flag and now we need to add WRITE flag + * + * This page was already programmed to HW and we don't want/need + * to unlink and link it again just to resync flags. + */ + if (dma_use_iova(state)) + return state->addr + offset; + + /* + * Without dma_need_unmap, the dma_addrs array is NULL, thus we + * need to regenerate the address below even if there already + * was a mapping. But !dma_need_unmap implies that the + * mapping stateless, so this is fine. + */ + if (dma_need_unmap(dev)) + return dma_addrs[idx]; + + /* Continue to remapping */ + } + + switch (pci_p2pdma_state(p2pdma_state, dev, page)) { + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: + case PCI_P2PDMA_MAP_NONE: + break; + case PCI_P2PDMA_MAP_BUS_ADDR: + dma_addr = pci_p2pdma_bus_addr_map(p2pdma_state, paddr); + pfns[idx] |= HMM_PFN_P2PDMA_BUS; + goto done; + default: + return DMA_MAPPING_ERROR; + } + + if (dma_use_iova(state)) { + ret = dma_iova_link(dev, state, paddr, offset, + map->dma_entry_size, DMA_BIDIRECTIONAL, 0); + if (ret) + return DMA_MAPPING_ERROR; + + ret = dma_iova_sync(dev, state, offset, map->dma_entry_size); + if (ret) + return DMA_MAPPING_ERROR; + + dma_addr = state->addr + offset; + } else { + if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs)) + return DMA_MAPPING_ERROR; + + dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size, + DMA_BIDIRECTIONAL); + if (dma_mapping_error(dev, dma_addr)) + return DMA_MAPPING_ERROR; + + if (dma_need_unmap(dev)) + dma_addrs[idx] = dma_addr; + } + +done: + pfns[idx] |= HMM_PFN_DMA_MAPPED; + return dma_addr; +} +EXPORT_SYMBOL_GPL(hmm_dma_map_pfn); + +/** + * hmm_dma_unmap_pfn - Unmap a physical HMM page from DMA address + * @dev: Device to unmap the page from + * @map: HMM map + * @idx: Index of the PFN to unmap + * + * Returns true if the PFN was mapped and has been unmapped, false otherwise. + */ +bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx) +{ + struct dma_iova_state *state = &map->state; + dma_addr_t *dma_addrs = map->dma_list; + unsigned long *pfns = map->pfn_list; + +#define HMM_PFN_VALID_DMA (HMM_PFN_VALID | HMM_PFN_DMA_MAPPED) + if ((pfns[idx] & HMM_PFN_VALID_DMA) != HMM_PFN_VALID_DMA) + return false; +#undef HMM_PFN_VALID_DMA + + if (pfns[idx] & HMM_PFN_P2PDMA_BUS) + ; /* no need to unmap bus address P2P mappings */ + else if (dma_use_iova(state)) + dma_iova_unlink(dev, state, idx * map->dma_entry_size, + map->dma_entry_size, DMA_BIDIRECTIONAL, 0); + else if (dma_need_unmap(dev)) + dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size, + DMA_BIDIRECTIONAL); + + pfns[idx] &= ~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA_BUS); + return true; +} +EXPORT_SYMBOL_GPL(hmm_dma_unmap_pfn);