From patchwork Fri Jun 7 00:51:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13689167 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4778B39FD6 for ; Fri, 7 Jun 2024 00:51:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717721514; cv=none; b=BlXy6w98+ohmkmW4QEsAHpJ+rCyxxBHkSK8d4P9U3C9CyKoH7qky5TUKfXIXVp5Z/5q/TwMzUmIAG8KytoFcyAZu6Wm8Jr3IR8saqTLPoBr/4UqCVSKb1dwcTFBgUzWVv2o6thEiWHnMjwZrE+fQ6kUHh0QnuWVDrPGHrSzm6gU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717721514; c=relaxed/simple; bh=nNWbR8Cwx2IfianI03ObYoYfjIe5O7RQowj3TNpgVZI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gjVwqHZaDkXe0sQsW2s4H7h7BrmkyqSto6ihmgKruOoXT9zzMOg9hiKh/vBEjE4C9uH/HFqZ5l4gSjy9juiRjfbR+70vzyO03VraJMQkO/+/RmcDWOb8EdXpzAlx99H6Mw7XqODZoMpONEk6jDAzjI43iY5RkODt+ug2jf7U9JM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=y4hgwgzJ; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="y4hgwgzJ" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dfa73db88dcso2498758276.0 for ; Thu, 06 Jun 2024 17:51:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717721505; x=1718326305; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=++S0/BF3mmxHrf8evl8STUkuww//XBic0i9hX9TEwCs=; b=y4hgwgzJXmMkBGpHxKcYMoZZJh3bisNTcmxJA04KjgD1t8uJjCtzDGwOnRA0tYNWwO KBRuvJrg4m4S6K/UNxUYlCk5E0UVJ/gX4ij95PolC5nOhOflccJ3eBPhLDrfesogzyVG LwPmk17d7qHHk3z6SzuAPeiQdsX1la4OPgvNahFKMnnvTQ4cUGRddSK01vMc8zm9O85/ l1deOaZT2AACu6nXQw/e829bgAFyUhDMkDnPLQIwAJzT1rPmA6X6prVU7KsPTX6jPVed PrX62dOyutlOqqka1J3MlT+j0PgMkXH1r1oZ8eZjmfvGGVp/4we4yK4FlJZLRO5YSP29 N5+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717721505; x=1718326305; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=++S0/BF3mmxHrf8evl8STUkuww//XBic0i9hX9TEwCs=; b=OG4JwNNMymst09PEyM8xxLVaU/Qwyd+rTWxsSiEj6SzCg2nhpxC6c2uS0nZvOE8pjk 5YqUfx90mASNDOIEFY7h/vLRax5GfjNgvNNd9MFLo1GstkPalvmzQtSMHa+FpKKapni/ pMixBH4Fh8gkbsPmOYoB/djkYX2hLx6z+A/SMEIfstaQcgZ24Wmc6weooftdXzXxmGbu Eom8nfePMBywRxgcY3+Xv1lNQMaUyuoBoMrj9Lbz1iSOUVWNwaP5Zxr09T8HViq/Scut iVE2YLTOdwyKhRuydNDQRkt/v7HP/xzpYLjjbK3VJBaevGGjcp075di7AYrJBtEwHCJI 40dg== X-Forwarded-Encrypted: i=1; AJvYcCUc8WThW5TUS+iMhAU89L4YsRui+SdwrAvy5sZU6o647/cpJKNbnseIhFmW2S8fom4K7brejZp/ITUc/OltcrnnfByzDDyVPOTbGImAZBikHBTx X-Gm-Message-State: AOJu0YwZMv3teVrxWq2Q2A4MUVwYaVdSaa4PW/T0E21dy7SLHJFeBuuz aajN+Vzv1Zw+WjaYupZJzszglixQo4tPUMgEx3KIWeNr0wiF28iPnLj1K+yudb0DvxW7xIfozFh Kdp2j6l0dAa1grfXDfmqudg== X-Google-Smtp-Source: AGHT+IELtYHlxtRPo6DjMEPFHxDRxRD8re3CIlV8scxTXnf1U5A9RUQGTmv3hqGLvn9VcRSucBVHW+VrFrttB6amqA== X-Received: from almasrymina.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:4bc5]) (user=almasrymina job=sendgmr) by 2002:a05:6902:f84:b0:df7:d322:97c6 with SMTP id 3f1490d57ef6-dfaf664db3dmr167215276.9.1717721505009; Thu, 06 Jun 2024 17:51:45 -0700 (PDT) Date: Fri, 7 Jun 2024 00:51:17 +0000 In-Reply-To: <20240607005127.3078656-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240607005127.3078656-1-almasrymina@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240607005127.3078656-8-almasrymina@google.com> Subject: [PATCH net-next v11 07/13] memory-provider: dmabuf devmem memory provider From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Bagas Sanjaya , Christoph Hellwig , Nikolay Aleksandrov , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang Implement a memory provider that allocates dmabuf devmem in the form of net_iov. The provider receives a reference to the struct netdev_dmabuf_binding via the pool->mp_priv pointer. The driver needs to set this pointer for the provider in the net_iov. The provider obtains a reference on the netdev_dmabuf_binding which guarantees the binding and the underlying mapping remains alive until the provider is destroyed. Usage of PP_FLAG_DMA_MAP is required for this memory provide such that the page_pool can provide the driver with the dma-addrs of the devmem. Support for PP_FLAG_DMA_SYNC_DEV is omitted for simplicity & p.order != 0. Signed-off-by: Willem de Bruijn Signed-off-by: Kaiyuan Zhang Signed-off-by: Mina Almasry --- v11: - Rebase to not use the ops. (Christoph) v8: - Use skb_frag_size instead of frag->bv_len to fix patch-by-patch build error v6: - refactor new memory provider functions into net/core/devmem.c (Pavel) v2: - Disable devmem for p.order != 0 v1: - static_branch check in page_is_page_pool_iov() (Willem & Paolo). - PP_DEVMEM -> PP_IOV (David). - Require PP_FLAG_DMA_MAP (Jakub). --- include/net/mp_dmabuf_devmem.h | 46 ++++++++++++++++++++ include/net/netmem.h | 15 +++++++ include/net/page_pool/helpers.h | 22 ++++++++++ include/net/page_pool/types.h | 2 + net/core/devmem.c | 76 +++++++++++++++++++++++++++++++++ net/core/page_pool.c | 70 +++++++++++++++++++----------- 6 files changed, 205 insertions(+), 26 deletions(-) create mode 100644 include/net/mp_dmabuf_devmem.h diff --git a/include/net/mp_dmabuf_devmem.h b/include/net/mp_dmabuf_devmem.h new file mode 100644 index 0000000000000..7d290f3ad20bd --- /dev/null +++ b/include/net/mp_dmabuf_devmem.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Dmabuf device memory provider. + * + * Authors: Mina Almasry + * + */ +#ifndef _NET_MP_DMABUF_DEVMEM_H +#define _NET_MP_DMABUF_DEVMEM_H + +#include + +#if defined(CONFIG_DMA_SHARED_BUFFER) && defined(CONFIG_GENERIC_ALLOCATOR) +int mp_dmabuf_devmem_init(struct page_pool *pool); + +netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, + gfp_t gfp); + +void mp_dmabuf_devmem_destroy(struct page_pool *pool); + +bool mp_dmabuf_devmem_release_page(struct page_pool *pool, + netmem_ref netmem); +#else +static inline int mp_dmabuf_devmem_init(struct page_pool *pool) +{ + return -EOPNOTSUPP; +} + +static inline netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, + gfp_t gfp) +{ + return 0; +} + +static inline void mp_dmabuf_devmem_destroy(struct page_pool *pool) +{ +} + +static inline bool mp_dmabuf_devmem_release_page(struct page_pool *pool, + netmem_ref netmem) +{ + return false; +} +#endif + +#endif /* _NET_MP_DMABUF_DEVMEM_H */ diff --git a/include/net/netmem.h b/include/net/netmem.h index 35ad237fdf29e..7c28d6fac6242 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -100,6 +100,21 @@ static inline struct page *netmem_to_page(netmem_ref netmem) return (__force struct page *)netmem; } +static inline struct net_iov *netmem_to_net_iov(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) + return (struct net_iov *)((__force unsigned long)netmem & + ~NET_IOV); + + DEBUG_NET_WARN_ON_ONCE(true); + return NULL; +} + +static inline netmem_ref net_iov_to_netmem(struct net_iov *niov) +{ + return (__force netmem_ref)((unsigned long)niov | NET_IOV); +} + static inline netmem_ref page_to_netmem(struct page *page) { return (__force netmem_ref)page; diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 1770c7be24afc..731f2d1e1ee10 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -477,4 +477,26 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) page_pool_update_nid(pool, new_nid); } +static inline void page_pool_set_pp_info(struct page_pool *pool, + netmem_ref netmem) +{ + netmem_set_pp(netmem, pool); + netmem_or_pp_magic(netmem, PP_SIGNATURE); + + /* Ensuring all pages have been split into one fragment initially: + * page_pool_set_pp_info() is only called once for every page when it + * is allocated from the page allocator and page_pool_fragment_page() + * is dirtying the same cache line as the page->pp_magic above, so + * the overhead is negligible. + */ + page_pool_fragment_netmem(netmem, 1); + if (pool->has_init_callback) + pool->slow.init_callback(netmem, pool->slow.init_arg); +} + +static inline void page_pool_clear_pp_info(netmem_ref netmem) +{ + netmem_clear_pp_magic(netmem); + netmem_set_pp(netmem, NULL); +} #endif /* _NET_PAGE_POOL_HELPERS_H */ diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 0693117f8d74f..2179a55f21448 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -52,6 +52,7 @@ struct pp_alloc_cache { * @nid: NUMA node id to allocate from pages from * @dev: device, for DMA pre-mapping purposes * @napi: NAPI which is the sole consumer of pages, otherwise NULL + * @queue: struct netdev_rx_queue this page_pool is being created for. * @dma_dir: DMA mapping direction * @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV @@ -66,6 +67,7 @@ struct page_pool_params { int nid; struct device *dev; struct napi_struct *napi; + struct netdev_rx_queue *queue; enum dma_data_direction dma_dir; unsigned int max_len; unsigned int offset; diff --git a/net/core/devmem.c b/net/core/devmem.c index f4fd9b9dbb675..4d0eaefbd7d21 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -17,6 +17,7 @@ #include #include #include +#include #include /* Device memory support */ @@ -296,4 +297,79 @@ int net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, dma_buf_put(dmabuf); return err; } + +/*** "Dmabuf devmem memory provider" ***/ + +int mp_dmabuf_devmem_init(struct page_pool *pool) +{ + struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + + if (!binding) + return -EINVAL; + + if (!pool->dma_map) + return -EOPNOTSUPP; + + if (pool->dma_sync) + return -EOPNOTSUPP; + + if (pool->p.order != 0) + return -E2BIG; + + net_devmem_dmabuf_binding_get(binding); + return 0; +} + +netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, + gfp_t gfp) +{ + struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + netmem_ref netmem; + struct net_iov *niov; + dma_addr_t dma_addr; + + niov = net_devmem_alloc_dmabuf(binding); + if (!niov) + return 0; + + dma_addr = net_devmem_get_dma_addr(niov); + + netmem = net_iov_to_netmem(niov); + + page_pool_set_pp_info(pool, netmem); + + if (page_pool_set_dma_addr_netmem(netmem, dma_addr)) + goto err_free; + + pool->pages_state_hold_cnt++; + trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt); + return netmem; + +err_free: + net_devmem_free_dmabuf(niov); + return 0; +} + +void mp_dmabuf_devmem_destroy(struct page_pool *pool) +{ + struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + + net_devmem_dmabuf_binding_put(binding); +} + +bool mp_dmabuf_devmem_release_page(struct page_pool *pool, + netmem_ref netmem) +{ + WARN_ON_ONCE(!netmem_is_net_iov(netmem)); + WARN_ON_ONCE(atomic_long_read(netmem_get_pp_ref_count_ref(netmem)) != + 1); + + page_pool_clear_pp_info(netmem); + + net_devmem_free_dmabuf(netmem_to_net_iov(netmem)); + + /* We don't want the page pool put_page()ing our net_iovs. */ + return false; +} + #endif diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 0050a51d82dba..eab56581fef70 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -21,12 +22,16 @@ #include #include #include +#include +#include +#include #include #include "page_pool_priv.h" DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); +EXPORT_SYMBOL(page_pool_mem_providers); #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) @@ -188,6 +193,7 @@ static int page_pool_init(struct page_pool *pool, int cpuid) { unsigned int ring_qsize = 1024; /* Default */ + int err; page_pool_struct_check(); @@ -269,7 +275,25 @@ static int page_pool_init(struct page_pool *pool, if (pool->dma_map) get_device(pool->p.dev); + if (pool->p.queue) + pool->mp_priv = READ_ONCE(pool->p.queue->mp_params.mp_priv); + + if (pool->mp_priv) { + err = mp_dmabuf_devmem_init(pool); + if (err) { + pr_warn("%s() mem-provider init failed %d\n", __func__, + err); + goto free_ptr_ring; + } + + static_branch_inc(&page_pool_mem_providers); + } + return 0; + +free_ptr_ring: + ptr_ring_cleanup(&pool->ring, NULL); + return err; } static void page_pool_uninit(struct page_pool *pool) @@ -453,28 +477,6 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) return false; } -static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) -{ - netmem_set_pp(netmem, pool); - netmem_or_pp_magic(netmem, PP_SIGNATURE); - - /* Ensuring all pages have been split into one fragment initially: - * page_pool_set_pp_info() is only called once for every page when it - * is allocated from the page allocator and page_pool_fragment_page() - * is dirtying the same cache line as the page->pp_magic above, so - * the overhead is negligible. - */ - page_pool_fragment_netmem(netmem, 1); - if (pool->has_init_callback) - pool->slow.init_callback(netmem, pool->slow.init_arg); -} - -static void page_pool_clear_pp_info(netmem_ref netmem) -{ - netmem_clear_pp_magic(netmem); - netmem_set_pp(netmem, NULL); -} - static struct page *__page_pool_alloc_page_order(struct page_pool *pool, gfp_t gfp) { @@ -570,7 +572,10 @@ netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp) return netmem; /* Slow-path: cache empty, do real allocation */ - netmem = __page_pool_alloc_pages_slow(pool, gfp); + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) + netmem = mp_dmabuf_devmem_alloc_netmems(pool, gfp); + else + netmem = __page_pool_alloc_pages_slow(pool, gfp); return netmem; } EXPORT_SYMBOL(page_pool_alloc_netmem); @@ -634,8 +639,13 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) { int count; + bool put; - __page_pool_release_page_dma(pool, netmem); + put = true; + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) + put = mp_dmabuf_devmem_release_page(pool, netmem); + else + __page_pool_release_page_dma(pool, netmem); /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. @@ -643,8 +653,10 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt); trace_page_pool_state_release(pool, netmem, count); - page_pool_clear_pp_info(netmem); - put_page(netmem_to_page(netmem)); + if (put) { + page_pool_clear_pp_info(netmem); + put_page(netmem_to_page(netmem)); + } /* An optimization would be to call __free_pages(page, pool->p.order) * knowing page is not part of page-cache (thus avoiding a * __page_cache_release() call). @@ -963,6 +975,12 @@ static void __page_pool_destroy(struct page_pool *pool) page_pool_unlist(pool); page_pool_uninit(pool); + + if (pool->mp_priv) { + mp_dmabuf_devmem_destroy(pool); + static_branch_dec(&page_pool_mem_providers); + } + kfree(pool); }