From patchwork Fri Jun 7 00:51:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13689129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93071C27C54 for ; Fri, 7 Jun 2024 00:51:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24A5C6B009E; Thu, 6 Jun 2024 20:51:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FA7B6B009F; Thu, 6 Jun 2024 20:51:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04D2A6B00A1; Thu, 6 Jun 2024 20:51:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D60B16B009E for ; Thu, 6 Jun 2024 20:51:44 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 47A871606BC for ; Fri, 7 Jun 2024 00:51:44 +0000 (UTC) X-FDA: 82202264928.23.C731C04 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf16.hostedemail.com (Postfix) with ESMTP id 6F793180003 for ; Fri, 7 Jun 2024 00:51:42 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0YwnljIb; spf=pass (imf16.hostedemail.com: domain of 3nVliZgsKCBQu56uCBI627u08805y.w86527EH-664Fuw4.8B0@flex--almasrymina.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3nVliZgsKCBQu56uCBI627u08805y.w86527EH-664Fuw4.8B0@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717721502; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vcE6eYD+AzwxSXEppdUVsb23UI+87JDmSHkGBCUWmTk=; b=OKVkccUUJswco7514J6Vd9ce2E10es1WXk5ULECjq+/kpH54V5tXs26gII+V4LpiNlN2nD LC60nbiggUsvx4rmfi+M+6cBSEi+Cq0dTawf8NWm9+/hXO1nZ7K+JjvQJPrzklTCQ4tV3P /OVz2USBUjJS9bWbuaT0wFDH+SXWdek= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0YwnljIb; spf=pass (imf16.hostedemail.com: domain of 3nVliZgsKCBQu56uCBI627u08805y.w86527EH-664Fuw4.8B0@flex--almasrymina.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3nVliZgsKCBQu56uCBI627u08805y.w86527EH-664Fuw4.8B0@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717721502; a=rsa-sha256; cv=none; b=lu3kg8+DitgVWaPaR0Fr3IY12w9GqQZfmjj2Sby9EKq4Z6/alUUIRCRhmxMIX6v/1i8WM4 kuDdJ5S7QSLYkqGj3IPdy2hzyCP+LHrwgOBsMX8pVJr/SFG8uIeW2IB5i7UwylAnmAHeS4 jHcovRJVIuJPDytSuETbqVJUIgWLFb8= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62a080977a5so26067997b3.0 for ; Thu, 06 Jun 2024 17:51:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717721501; x=1718326301; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vcE6eYD+AzwxSXEppdUVsb23UI+87JDmSHkGBCUWmTk=; b=0YwnljIbyptFAB07xqv0+NGkfnoGMA0oasCK6v4Xb29kTfk7KFzGHFzoB7M4z4ZrCl HI9SmITbGY/1QbVzLnZuG/30SW9Zm/lIYqmD9gTkXELelA3FWP50fpS+3UryMTBb3NfK uHxRqjgyMVrStvYs5G+4x2jTjeNm/WAWzLivWsJmZQePjxNf/AWpu15C7RMHhIgqlq6u ozOy2IlEyHb4Okg45vV1V5WyDDeimJmpz87wYH/E1ft4UV80/mdoPentDHfCSFBpchFN kCKPy5795WSuwcrvch1Vgvu82Zu0wo1/jju5X9+Vg8u7NmXbpGR5IJLCrnJtP87t8i2N /PEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717721501; x=1718326301; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vcE6eYD+AzwxSXEppdUVsb23UI+87JDmSHkGBCUWmTk=; b=iz51izINStBENN+fhXCFCdEPuK2U/I5UdtrWDd1LGToT292YSsWyfLaP359PxyhfDs 4DP6Gape4au/J2MoUoVkVBi9oJGp6WVsU6kVaIkM83by9q3Tgki7ZKvFmN6ul3yul+YU Q0zymVZlEvhTQYB7uV61uGNIwDuLWE5JQFRyeVY+p8eoskx2DMknag2wDn5/E/kmj0k8 rNGuNJJyQ6IKCGW3sxg5cu0PUqofQiVvxuzfY7O32XU0vYp4gwfTRmGqqk0Da/nxLTiZ PxUpc2A38h9gySCshLC0iLgoTVP0/lgNrfZXHTyprCUzcKbBgjT/XpKiOQPRt5sjgjHa qwmQ== X-Forwarded-Encrypted: i=1; AJvYcCUPVFShOTMimVTjY1jwfDKX+k3fju6q9ASHqLZPtefHKFcYqcgTH9thD6mvGeXZnUd1LxkSpRE2qkxRHupS4NjxSQU= X-Gm-Message-State: AOJu0Yxep3RH6xIy9cacUGCFMllLHLYtJeQcgMneZ/vfPbk5WZ1xzr7B k/F5IXuxKoFc4zBvwa6b5rwpQ6Nk/jBy80I37pvGnHo2F5QXMAyrjESfEbvQ8CmhthtTmJqB3zl Z5CEhBGIblGyg19AwOaiq4g== X-Google-Smtp-Source: AGHT+IHCcLNbCFFZfflri8poDDmLx+OO7UqxnhVTqpzn9xAXyqF2DMlBYifGsjDzV1A6DiR9A6XAJ7IS2PqR9RwJ8g== X-Received: from almasrymina.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:4bc5]) (user=almasrymina job=sendgmr) by 2002:a05:690c:6612:b0:622:c964:a590 with SMTP id 00721157ae682-62cd55be9e1mr2919797b3.1.1717721501223; Thu, 06 Jun 2024 17:51:41 -0700 (PDT) Date: Fri, 7 Jun 2024 00:51:15 +0000 In-Reply-To: <20240607005127.3078656-1-almasrymina@google.com> Mime-Version: 1.0 References: <20240607005127.3078656-1-almasrymina@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240607005127.3078656-6-almasrymina@google.com> Subject: [PATCH net-next v11 05/13] page_pool: convert to use netmem From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Bagas Sanjaya , Christoph Hellwig , Nikolay Aleksandrov , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , linux-mm@kvack.org, Matthew Wilcox X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6F793180003 X-Stat-Signature: 7o9um4jtfm9tcjozw6dr8fq7tp68wjrb X-HE-Tag: 1717721502-481225 X-HE-Meta: U2FsdGVkX1+xpoMkrImkmXJzINWOaa6SowNeFJhHHltdV2SsA9VsEs7andKCc7zs4wybV9m1HBNY2FFDenibI7dLthYspGQtyBpW1KxWmmPm0TxeKatPDgOHQgWUhUoPTmdEnOHIQFyICik3yiT8ZGHrJgWkgwJd3KeM7PDaBj+BjO2Q4mImSLm+XLI2oCwO+RTaz+N00f9tnTCwkD/+AyuyZe8BUMV4V1pS5iCUlx2NhTaWUFr+bvCwxbQzHGNCNanNr/cc+nLEfZcJvpU27fQ+DXN2/Tlu45hYCf5+y7J+L+UlNMklCgNtc5PG/n5eJiU/k0lHRpcbhPDtu26tuDFlDB+ukOmBOjvgZS9A4BSlRYBAAQZVMPv9RvSLMB6xZGy0uD5PoTunwPl4zENKJVczM7khPa1cVPCrEunU62xbrNcKzFYp+ZVdTA7PBJePHiyYinRYJhCLLcOoIYZpM1pyRzrFMzKpsB7vvW0TyQQuKjPDYbzKiiTQ+5GR34FGB1i09Xq+aTQQ5LKnsImQu0FVALq8JvTlpTnELDuYejSRHnPk23jvUpCf0aQeg+dMAmMs36u8XefZviRyi1Y7/cdEyXH1NwgJPImopOd6oMb6H/R7ZxbDtCRqTwvNxkYaQp4KatgGjpIuoRujXs8l7NY4jMbCOilrrcZv5Gbx2Yl1geiBYh/S8R3JYv5rbltOFCDGZlPeelmo1Oh+55eD+YfWnq1hRbbfOJu727O6rnOS+yIJ8Jw1kYcvaCiNQwFCLTRvJbogUZIUhTTAZe7m4OVlFNAfakZyHmLlgQ45sbsjmO3Ze3P4na/cb41a3zxNZ4nN1HTwWzQxPxv44/bX0vl1eiTzmT1rNsTurx6HcFWvxaOLpx3bpBXEaPNAhrvl/y6xT15hIdHvOAvqcLBjlnBLvooKC92NLXuGWKVlkfzp2mEjSE5Rj9IBJXMeZeiR2nqs9apt1IjeVkizwIn l6KsCLOU 8ibhSjTd8sMbLYuWhhkavIeHyqu+jM7ixJj9FlrbVUv8hDgRlL5tw68jmveKCekyYYS7NanVHuhrVFWocRSv7fU1iG7SdCmkL75wLgfuuLNAYSRcVz/9PwDLz4c4qVR2JuNi5GGVf9t/Hj/evllgyu1qsVE7Suu5eAt5fN1gLeuT//tEfOrLaTM+ehx/SoAR/TLvJEntsu9Fr897SCjFfZH7957qz5RKmviGoYcqFupIeqAbfgFYeCkjVAf2EZzOyAF9tnkDUhFt2CTD7m4/nVA6uEI0nyp33uFK8eFnVRrEfRRIUpWsTMLiBsUGkMaLVyDEPdO4KJKhbVPk+eidrEEg1jRB/94GXyMFsaIdm8GsL4uIu0OBgbfFahcV70ckg2qUj6/VFZ82gCMWH5LTwzoz/+WrQewYqDUmY7jEWjzj2uWCxnAX0I1SvJJo8b5JNFeApOn0b+5Pg0qXEED8NS5hELcKtN0RDLFPzxbpJsQdif3vZvMirD2DwRI9AhfOQEV30sXf48Iblc3jKp9T/4w+sLke+lNRO/aMwY733Jnk80ubDHtxTvzSIJNCboJ+1powwH26Vr3yUcv1LRJLKoHgitqK91yCQ1udHDil3E0sy/85LwgzouT1G0ms758g4Shr4LNaZTZB8ZNzmFGppqRj+RNg+Cd2w6RhXkbdRZ/2DVnBzDMmwZIwxFKNKSdP348D8BQCiClgAHV6tj72zBqM5xIJYv7kDxtY9vl/2pr/qcKs5MULvUvwimuJFpYg+wbxD2UQSGBio/UlYUFBP2Wtj9+RmaioV+J3hTgSFW1Wlw2oob4WHysQ5XbnfX9VRPJvh2lkSYpple/74xhPitQ0fbjtTN7cmzwqEwtMNob0twYLl4AmhRjjBBsm3GrmTGLsx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Abstrace the memory type from the page_pool so we can later add support for new memory types. Convert the page_pool to use the new netmem type abstraction, rather than use struct page directly. As of this patch the netmem type is a no-op abstraction: it's always a struct page underneath. All the page pool internals are converted to use struct netmem instead of struct page, and the page pool now exports 2 APIs: 1. The existing struct page API. 2. The new struct netmem API. Keeping the existing API is transitional; we do not want to refactor all the current drivers using the page pool at once. The netmem abstraction is currently a no-op. The page_pool uses page_to_netmem() to convert allocated pages to netmem, and uses netmem_to_page() to convert the netmem back to pages to pass to mm APIs, Follow up patches to this series add non-paged netmem support to the page_pool. This change is factored out on its own to limit the code churn to this 1 patch, for ease of code review. Signed-off-by: Mina Almasry --- v11: - Fix typing to remove sparse warning. (Paolo/Steven) v9: - Fix sparse error (Simon). v8: - Fix napi_pp_put_page() taking netmem instead of page to fix patch-by-patch build error. - Add net/netmem.h include in this patch to fix patch-by-patch build error. v6: - Rebased on top of the merged netmem_ref type. Cc: linux-mm@kvack.org Cc: Matthew Wilcox --- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 15 ++ include/net/page_pool/helpers.h | 120 ++++++++---- include/net/page_pool/types.h | 14 +- include/trace/events/page_pool.h | 29 +-- net/bpf/test_run.c | 5 +- net/core/page_pool.c | 304 +++++++++++++++++-------------- net/core/skbuff.c | 8 +- 8 files changed, 301 insertions(+), 198 deletions(-) diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 11f0a40634033..16c241a234728 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -32,13 +32,13 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) __skb_frag_ref(&skb_shinfo(skb)->frags[f]); } -bool napi_pp_put_page(struct page *page); +bool napi_pp_put_page(netmem_ref netmem); static inline void skb_page_unref(struct page *page, bool recycle) { #ifdef CONFIG_PAGE_POOL - if (recycle && napi_pp_put_page(page)) + if (recycle && napi_pp_put_page(page_to_netmem(page))) return; #endif put_page(page); diff --git a/include/net/netmem.h b/include/net/netmem.h index 01dbdd216fae7..664df8325ece5 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -66,4 +66,19 @@ static inline netmem_ref page_to_netmem(struct page *page) return (__force netmem_ref)page; } +static inline int netmem_ref_count(netmem_ref netmem) +{ + return page_ref_count(netmem_to_page(netmem)); +} + +static inline unsigned long netmem_to_pfn(netmem_ref netmem) +{ + return page_to_pfn(netmem_to_page(netmem)); +} + +static inline netmem_ref netmem_compound_head(netmem_ref netmem) +{ + return page_to_netmem(compound_head(netmem_to_page(netmem))); +} + #endif /* _NET_NETMEM_H */ diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 873631c79ab16..5e129d5304f53 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -55,6 +55,8 @@ #include #include +#include +#include #ifdef CONFIG_PAGE_POOL_STATS /* Deprecated driver-facing API, use netlink instead */ @@ -103,7 +105,7 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) * Get a page fragment from the page allocator or page_pool caches. * * Return: - * Return allocated page fragment, otherwise return NULL. + * Return allocated page fragment, otherwise return 0. */ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, unsigned int *offset, @@ -114,22 +116,22 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, return page_pool_alloc_frag(pool, offset, size, gfp); } -static inline struct page *page_pool_alloc(struct page_pool *pool, - unsigned int *offset, - unsigned int *size, gfp_t gfp) +static inline netmem_ref page_pool_alloc(struct page_pool *pool, + unsigned int *offset, + unsigned int *size, gfp_t gfp) { unsigned int max_size = PAGE_SIZE << pool->p.order; - struct page *page; + netmem_ref netmem; if ((*size << 1) > max_size) { *size = max_size; *offset = 0; - return page_pool_alloc_pages(pool, gfp); + return page_pool_alloc_netmem(pool, gfp); } - page = page_pool_alloc_frag(pool, offset, *size, gfp); - if (unlikely(!page)) - return NULL; + netmem = page_pool_alloc_frag_netmem(pool, offset, *size, gfp); + if (unlikely(!netmem)) + return 0; /* There is very likely not enough space for another fragment, so append * the remaining size to the current fragment to avoid truesize @@ -140,7 +142,7 @@ static inline struct page *page_pool_alloc(struct page_pool *pool, pool->frag_offset = max_size; } - return page; + return netmem; } /** @@ -154,7 +156,7 @@ static inline struct page *page_pool_alloc(struct page_pool *pool, * utilization and performance penalty. * * Return: - * Return allocated page or page fragment, otherwise return NULL. + * Return allocated page or page fragment, otherwise return 0. */ static inline struct page *page_pool_dev_alloc(struct page_pool *pool, unsigned int *offset, @@ -162,7 +164,7 @@ static inline struct page *page_pool_dev_alloc(struct page_pool *pool, { gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); - return page_pool_alloc(pool, offset, size, gfp); + return netmem_to_page(page_pool_alloc(pool, offset, size, gfp)); } static inline void *page_pool_alloc_va(struct page_pool *pool, @@ -172,7 +174,8 @@ static inline void *page_pool_alloc_va(struct page_pool *pool, struct page *page; /* Mask off __GFP_HIGHMEM to ensure we can use page_address() */ - page = page_pool_alloc(pool, &offset, size, gfp & ~__GFP_HIGHMEM); + page = netmem_to_page( + page_pool_alloc(pool, &offset, size, gfp & ~__GFP_HIGHMEM)); if (unlikely(!page)) return NULL; @@ -189,7 +192,7 @@ static inline void *page_pool_alloc_va(struct page_pool *pool, * it returns va of the allocated page or page fragment. * * Return: - * Return the va for the allocated page or page fragment, otherwise return NULL. + * Return the va for the allocated page or page fragment, otherwise return 0. */ static inline void *page_pool_dev_alloc_va(struct page_pool *pool, unsigned int *size) @@ -212,6 +215,11 @@ page_pool_get_dma_dir(const struct page_pool *pool) return pool->p.dma_dir; } +static inline void page_pool_fragment_netmem(netmem_ref netmem, long nr) +{ + atomic_long_set(&netmem_to_page(netmem)->pp_ref_count, nr); +} + /** * page_pool_fragment_page() - split a fresh page into fragments * @page: page to split @@ -232,11 +240,12 @@ page_pool_get_dma_dir(const struct page_pool *pool) */ static inline void page_pool_fragment_page(struct page *page, long nr) { - atomic_long_set(&page->pp_ref_count, nr); + page_pool_fragment_netmem(page_to_netmem(page), nr); } -static inline long page_pool_unref_page(struct page *page, long nr) +static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) { + struct page *page = netmem_to_page(netmem); long ret; /* If nr == pp_ref_count then we have cleared all remaining @@ -279,15 +288,41 @@ static inline long page_pool_unref_page(struct page *page, long nr) return ret; } +static inline long page_pool_unref_page(struct page *page, long nr) +{ + return page_pool_unref_netmem(page_to_netmem(page), nr); +} + +static inline void page_pool_ref_netmem(netmem_ref netmem) +{ + atomic_long_inc(&netmem_to_page(netmem)->pp_ref_count); +} + static inline void page_pool_ref_page(struct page *page) { - atomic_long_inc(&page->pp_ref_count); + page_pool_ref_netmem(page_to_netmem(page)); } -static inline bool page_pool_is_last_ref(struct page *page) +static inline bool page_pool_is_last_ref(netmem_ref netmem) { /* If page_pool_unref_page() returns 0, we were the last user */ - return page_pool_unref_page(page, 1) == 0; + return page_pool_unref_netmem(netmem, 1) == 0; +} + +static inline void page_pool_put_netmem(struct page_pool *pool, + netmem_ref netmem, + unsigned int dma_sync_size, + bool allow_direct) +{ + /* When page_pool isn't compiled-in, net/core/xdp.c doesn't + * allow registering MEM_TYPE_PAGE_POOL, but shield linker. + */ +#ifdef CONFIG_PAGE_POOL + if (!page_pool_is_last_ref(netmem)) + return; + + page_pool_put_unrefed_netmem(pool, netmem, dma_sync_size, allow_direct); +#endif } /** @@ -308,15 +343,15 @@ static inline void page_pool_put_page(struct page_pool *pool, unsigned int dma_sync_size, bool allow_direct) { - /* When page_pool isn't compiled-in, net/core/xdp.c doesn't - * allow registering MEM_TYPE_PAGE_POOL, but shield linker. - */ -#ifdef CONFIG_PAGE_POOL - if (!page_pool_is_last_ref(page)) - return; + page_pool_put_netmem(pool, page_to_netmem(page), dma_sync_size, + allow_direct); +} - page_pool_put_unrefed_page(pool, page, dma_sync_size, allow_direct); -#endif +static inline void page_pool_put_full_netmem(struct page_pool *pool, + netmem_ref netmem, + bool allow_direct) +{ + page_pool_put_netmem(pool, netmem, -1, allow_direct); } /** @@ -331,7 +366,7 @@ static inline void page_pool_put_page(struct page_pool *pool, static inline void page_pool_put_full_page(struct page_pool *pool, struct page *page, bool allow_direct) { - page_pool_put_page(pool, page, -1, allow_direct); + page_pool_put_netmem(pool, page_to_netmem(page), -1, allow_direct); } /** @@ -365,6 +400,18 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va, page_pool_put_page(pool, virt_to_head_page(va), -1, allow_direct); } +static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem) +{ + struct page *page = netmem_to_page(netmem); + + dma_addr_t ret = page->dma_addr; + + if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) + ret <<= PAGE_SHIFT; + + return ret; +} + /** * page_pool_get_dma_addr() - Retrieve the stored DMA address. * @page: page allocated from a page pool @@ -374,16 +421,14 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va, */ static inline dma_addr_t page_pool_get_dma_addr(const struct page *page) { - dma_addr_t ret = page->dma_addr; - - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) - ret <<= PAGE_SHIFT; - - return ret; + return page_pool_get_dma_addr_netmem(page_to_netmem((struct page *)page)); } -static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +static inline bool page_pool_set_dma_addr_netmem(netmem_ref netmem, + dma_addr_t addr) { + struct page *page = netmem_to_page(netmem); + if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { page->dma_addr = addr >> PAGE_SHIFT; @@ -419,6 +464,11 @@ static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool, page_pool_get_dma_dir(pool)); } +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +{ + return page_pool_set_dma_addr_netmem(page_to_netmem(page), addr); +} + static inline bool page_pool_put(struct page_pool *pool) { return refcount_dec_and_test(&pool->user_cnt); diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 9f3c3ee2ee755..0693117f8d74f 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -6,6 +6,7 @@ #include #include #include +#include #define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA * map/unmap @@ -40,7 +41,7 @@ #define PP_ALLOC_CACHE_REFILL 64 struct pp_alloc_cache { u32 count; - struct page *cache[PP_ALLOC_CACHE_SIZE]; + netmem_ref cache[PP_ALLOC_CACHE_SIZE]; }; /** @@ -73,7 +74,7 @@ struct page_pool_params { struct net_device *netdev; unsigned int flags; /* private: used by test code only */ - void (*init_callback)(struct page *page, void *arg); + void (*init_callback)(netmem_ref netmem, void *arg); void *init_arg; ); }; @@ -155,7 +156,7 @@ struct page_pool { */ __cacheline_group_begin(frag) __aligned(4 * sizeof(long)); long frag_users; - struct page *frag_page; + netmem_ref frag_page; unsigned int frag_offset; __cacheline_group_end(frag); @@ -226,8 +227,12 @@ struct page_pool { }; struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); +netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp); struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, unsigned int size, gfp_t gfp); +netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, + unsigned int *offset, unsigned int size, + gfp_t gfp); struct page_pool *page_pool_create(const struct page_pool_params *params); struct page_pool *page_pool_create_percpu(const struct page_pool_params *params, int cpuid); @@ -257,6 +262,9 @@ static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data, } #endif +void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, + unsigned int dma_sync_size, + bool allow_direct); void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page, unsigned int dma_sync_size, bool allow_direct); diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h index 6834356b2d2ae..41cda027731a0 100644 --- a/include/trace/events/page_pool.h +++ b/include/trace/events/page_pool.h @@ -42,51 +42,52 @@ TRACE_EVENT(page_pool_release, TRACE_EVENT(page_pool_state_release, TP_PROTO(const struct page_pool *pool, - const struct page *page, u32 release), + netmem_ref netmem, u32 release), - TP_ARGS(pool, page, release), + TP_ARGS(pool, netmem, release), TP_STRUCT__entry( __field(const struct page_pool *, pool) - __field(const struct page *, page) + __field(unsigned long, netmem) __field(u32, release) __field(unsigned long, pfn) ), TP_fast_assign( __entry->pool = pool; - __entry->page = page; + __entry->netmem = (__force unsigned long)netmem; __entry->release = release; - __entry->pfn = page_to_pfn(page); + __entry->pfn = netmem_to_pfn(netmem); ), - TP_printk("page_pool=%p page=%p pfn=0x%lx release=%u", - __entry->pool, __entry->page, __entry->pfn, __entry->release) + TP_printk("page_pool=%p netmem=%lu pfn=0x%lx release=%u", + __entry->pool, __entry->netmem, + __entry->pfn, __entry->release) ); TRACE_EVENT(page_pool_state_hold, TP_PROTO(const struct page_pool *pool, - const struct page *page, u32 hold), + netmem_ref netmem, u32 hold), - TP_ARGS(pool, page, hold), + TP_ARGS(pool, netmem, hold), TP_STRUCT__entry( __field(const struct page_pool *, pool) - __field(const struct page *, page) + __field(unsigned long, netmem) __field(u32, hold) __field(unsigned long, pfn) ), TP_fast_assign( __entry->pool = pool; - __entry->page = page; + __entry->netmem = (__force unsigned long)netmem; __entry->hold = hold; - __entry->pfn = page_to_pfn(page); + __entry->pfn = netmem_to_pfn(netmem); ), - TP_printk("page_pool=%p page=%p pfn=0x%lx hold=%u", - __entry->pool, __entry->page, __entry->pfn, __entry->hold) + TP_printk("page_pool=%p netmem=%lu pfn=0x%lx hold=%u", + __entry->pool, __entry->netmem, __entry->pfn, __entry->hold) ); TRACE_EVENT(page_pool_update_nid, diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 36ae54f57bf57..4df67f574d028 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -127,9 +127,10 @@ struct xdp_test_data { #define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)) #define TEST_XDP_MAX_BATCH 256 -static void xdp_test_run_init_page(struct page *page, void *arg) +static void xdp_test_run_init_page(netmem_ref netmem, void *arg) { - struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); + struct xdp_page_head *head = + phys_to_virt(page_to_phys(netmem_to_page(netmem))); struct xdp_buff *new_ctx, *orig_ctx; u32 headroom = XDP_PACKET_HEADROOM; struct xdp_test_data *xdp = arg; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index f4444b4e39e63..b79e702cd69fc 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -327,19 +327,18 @@ struct page_pool *page_pool_create(const struct page_pool_params *params) } EXPORT_SYMBOL(page_pool_create); -static void page_pool_return_page(struct page_pool *pool, struct page *page); +static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem); -noinline -static struct page *page_pool_refill_alloc_cache(struct page_pool *pool) +static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) { struct ptr_ring *r = &pool->ring; - struct page *page; + netmem_ref netmem; int pref_nid; /* preferred NUMA node */ /* Quicker fallback, avoid locks when ring is empty */ if (__ptr_ring_empty(r)) { alloc_stat_inc(pool, empty); - return NULL; + return 0; } /* Softirq guarantee CPU and thus NUMA node is stable. This, @@ -354,57 +353,57 @@ static struct page *page_pool_refill_alloc_cache(struct page_pool *pool) /* Refill alloc array, but only if NUMA match */ do { - page = __ptr_ring_consume(r); - if (unlikely(!page)) + netmem = (__force netmem_ref)__ptr_ring_consume(r); + if (unlikely(!netmem)) break; - if (likely(page_to_nid(page) == pref_nid)) { - pool->alloc.cache[pool->alloc.count++] = page; + if (likely(page_to_nid(netmem_to_page(netmem)) == pref_nid)) { + pool->alloc.cache[pool->alloc.count++] = netmem; } else { /* NUMA mismatch; * (1) release 1 page to page-allocator and * (2) break out to fallthrough to alloc_pages_node. * This limit stress on page buddy alloactor. */ - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); alloc_stat_inc(pool, waive); - page = NULL; + netmem = 0; break; } } while (pool->alloc.count < PP_ALLOC_CACHE_REFILL); /* Return last page */ if (likely(pool->alloc.count > 0)) { - page = pool->alloc.cache[--pool->alloc.count]; + netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, refill); } - return page; + return netmem; } /* fast path */ -static struct page *__page_pool_get_cached(struct page_pool *pool) +static netmem_ref __page_pool_get_cached(struct page_pool *pool) { - struct page *page; + netmem_ref netmem; /* Caller MUST guarantee safe non-concurrent access, e.g. softirq */ if (likely(pool->alloc.count)) { /* Fast-path */ - page = pool->alloc.cache[--pool->alloc.count]; + netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, fast); } else { - page = page_pool_refill_alloc_cache(pool); + netmem = page_pool_refill_alloc_cache(pool); } - return page; + return netmem; } static void __page_pool_dma_sync_for_device(const struct page_pool *pool, - const struct page *page, + netmem_ref netmem, u32 dma_sync_size) { #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) - dma_addr_t dma_addr = page_pool_get_dma_addr(page); + dma_addr_t dma_addr = page_pool_get_dma_addr_netmem(netmem); dma_sync_size = min(dma_sync_size, pool->p.max_len); __dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset, @@ -414,14 +413,14 @@ static void __page_pool_dma_sync_for_device(const struct page_pool *pool, static __always_inline void page_pool_dma_sync_for_device(const struct page_pool *pool, - const struct page *page, + netmem_ref netmem, u32 dma_sync_size) { if (pool->dma_sync && dma_dev_need_sync(pool->p.dev)) - __page_pool_dma_sync_for_device(pool, page, dma_sync_size); + __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); } -static bool page_pool_dma_map(struct page_pool *pool, struct page *page) +static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) { dma_addr_t dma; @@ -430,17 +429,17 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) * into page private data (i.e 32bit cpu with 64bit DMA caps) * This mapping is kept for lifetime of page, until leaving pool. */ - dma = dma_map_page_attrs(pool->p.dev, page, 0, - (PAGE_SIZE << pool->p.order), - pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | - DMA_ATTR_WEAK_ORDERING); + dma = dma_map_page_attrs(pool->p.dev, netmem_to_page(netmem), 0, + (PAGE_SIZE << pool->p.order), pool->p.dma_dir, + DMA_ATTR_SKIP_CPU_SYNC | + DMA_ATTR_WEAK_ORDERING); if (dma_mapping_error(pool->p.dev, dma)) return false; - if (page_pool_set_dma_addr(page, dma)) + if (page_pool_set_dma_addr_netmem(netmem, dma)) goto unmap_failed; - page_pool_dma_sync_for_device(pool, page, pool->p.max_len); + page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len); return true; @@ -452,9 +451,10 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) return false; } -static void page_pool_set_pp_info(struct page_pool *pool, - struct page *page) +static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { + struct page *page = netmem_to_page(netmem); + page->pp = pool; page->pp_magic |= PP_SIGNATURE; @@ -464,13 +464,15 @@ static void page_pool_set_pp_info(struct page_pool *pool, * is dirtying the same cache line as the page->pp_magic above, so * the overhead is negligible. */ - page_pool_fragment_page(page, 1); + page_pool_fragment_netmem(netmem, 1); if (pool->has_init_callback) - pool->slow.init_callback(page, pool->slow.init_arg); + pool->slow.init_callback(netmem, pool->slow.init_arg); } -static void page_pool_clear_pp_info(struct page *page) +static void page_pool_clear_pp_info(netmem_ref netmem) { + struct page *page = netmem_to_page(netmem); + page->pp_magic = 0; page->pp = NULL; } @@ -485,34 +487,34 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool, if (unlikely(!page)) return NULL; - if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page))) { + if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page_to_netmem(page)))) { put_page(page); return NULL; } alloc_stat_inc(pool, slow_high_order); - page_pool_set_pp_info(pool, page); + page_pool_set_pp_info(pool, page_to_netmem(page)); /* Track how many pages are held 'in-flight' */ pool->pages_state_hold_cnt++; - trace_page_pool_state_hold(pool, page, pool->pages_state_hold_cnt); + trace_page_pool_state_hold(pool, page_to_netmem(page), + pool->pages_state_hold_cnt); return page; } /* slow path */ -noinline -static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, - gfp_t gfp) +static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool, + gfp_t gfp) { const int bulk = PP_ALLOC_CACHE_REFILL; unsigned int pp_order = pool->p.order; bool dma_map = pool->dma_map; - struct page *page; + netmem_ref netmem; int i, nr_pages; /* Don't support bulk alloc for high-order pages */ if (unlikely(pp_order)) - return __page_pool_alloc_page_order(pool, gfp); + return page_to_netmem(__page_pool_alloc_page_order(pool, gfp)); /* Unnecessary as alloc cache is empty, but guarantees zero count */ if (unlikely(pool->alloc.count > 0)) @@ -521,56 +523,63 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk_array */ memset(&pool->alloc.cache, 0, sizeof(void *) * bulk); - nr_pages = alloc_pages_bulk_array_node(gfp, pool->p.nid, bulk, - pool->alloc.cache); + nr_pages = alloc_pages_bulk_array_node(gfp, + pool->p.nid, bulk, + (struct page **)pool->alloc.cache); if (unlikely(!nr_pages)) - return NULL; + return 0; /* Pages have been filled into alloc.cache array, but count is zero and * page element have not been (possibly) DMA mapped. */ for (i = 0; i < nr_pages; i++) { - page = pool->alloc.cache[i]; - if (dma_map && unlikely(!page_pool_dma_map(pool, page))) { - put_page(page); + netmem = pool->alloc.cache[i]; + if (dma_map && unlikely(!page_pool_dma_map(pool, netmem))) { + put_page(netmem_to_page(netmem)); continue; } - page_pool_set_pp_info(pool, page); - pool->alloc.cache[pool->alloc.count++] = page; + page_pool_set_pp_info(pool, netmem); + pool->alloc.cache[pool->alloc.count++] = netmem; /* Track how many pages are held 'in-flight' */ pool->pages_state_hold_cnt++; - trace_page_pool_state_hold(pool, page, + trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt); } /* Return last page */ if (likely(pool->alloc.count > 0)) { - page = pool->alloc.cache[--pool->alloc.count]; + netmem = pool->alloc.cache[--pool->alloc.count]; alloc_stat_inc(pool, slow); } else { - page = NULL; + netmem = 0; } /* When page just alloc'ed is should/must have refcnt 1. */ - return page; + return netmem; } /* For using page_pool replace: alloc_pages() API calls, but provide * synchronization guarantee for allocation side. */ -struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) +netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp) { - struct page *page; + netmem_ref netmem; /* Fast-path: Get a page from cache */ - page = __page_pool_get_cached(pool); - if (page) - return page; + netmem = __page_pool_get_cached(pool); + if (netmem) + return netmem; /* Slow-path: cache empty, do real allocation */ - page = __page_pool_alloc_pages_slow(pool, gfp); - return page; + netmem = __page_pool_alloc_pages_slow(pool, gfp); + return netmem; +} +EXPORT_SYMBOL(page_pool_alloc_netmem); + +struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) +{ + return netmem_to_page(page_pool_alloc_netmem(pool, gfp)); } EXPORT_SYMBOL(page_pool_alloc_pages); ALLOW_ERROR_INJECTION(page_pool_alloc_pages, NULL); @@ -599,8 +608,8 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict) return inflight; } -static __always_inline -void __page_pool_release_page_dma(struct page_pool *pool, struct page *page) +static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, + netmem_ref netmem) { dma_addr_t dma; @@ -610,13 +619,13 @@ void __page_pool_release_page_dma(struct page_pool *pool, struct page *page) */ return; - dma = page_pool_get_dma_addr(page); + dma = page_pool_get_dma_addr_netmem(netmem); /* When page is unmapped, it cannot be returned to our pool */ dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); - page_pool_set_dma_addr(page, 0); + page_pool_set_dma_addr_netmem(netmem, 0); } /* Disconnects a page (from a page_pool). API users can have a need @@ -624,35 +633,34 @@ void __page_pool_release_page_dma(struct page_pool *pool, struct page *page) * a regular page (that will eventually be returned to the normal * page-allocator via put_page). */ -void page_pool_return_page(struct page_pool *pool, struct page *page) +void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) { int count; - __page_pool_release_page_dma(pool, page); - - page_pool_clear_pp_info(page); + __page_pool_release_page_dma(pool, netmem); /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. */ count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt); - trace_page_pool_state_release(pool, page, count); + trace_page_pool_state_release(pool, netmem, count); - put_page(page); + page_pool_clear_pp_info(netmem); + put_page(netmem_to_page(netmem)); /* An optimization would be to call __free_pages(page, pool->p.order) * knowing page is not part of page-cache (thus avoiding a * __page_cache_release() call). */ } -static bool page_pool_recycle_in_ring(struct page_pool *pool, struct page *page) +static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem) { int ret; /* BH protection not needed if current is softirq */ if (in_softirq()) - ret = ptr_ring_produce(&pool->ring, page); + ret = ptr_ring_produce(&pool->ring, (__force void *)netmem); else - ret = ptr_ring_produce_bh(&pool->ring, page); + ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem); if (!ret) { recycle_stat_inc(pool, ring); @@ -667,7 +675,7 @@ static bool page_pool_recycle_in_ring(struct page_pool *pool, struct page *page) * * Caller must provide appropriate safe context. */ -static bool page_pool_recycle_in_cache(struct page *page, +static bool page_pool_recycle_in_cache(netmem_ref netmem, struct page_pool *pool) { if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE)) { @@ -676,14 +684,15 @@ static bool page_pool_recycle_in_cache(struct page *page, } /* Caller MUST have verified/know (page_ref_count(page) == 1) */ - pool->alloc.cache[pool->alloc.count++] = page; + pool->alloc.cache[pool->alloc.count++] = netmem; recycle_stat_inc(pool, cached); return true; } -static bool __page_pool_page_can_be_recycled(const struct page *page) +static bool __page_pool_page_can_be_recycled(netmem_ref netmem) { - return page_ref_count(page) == 1 && !page_is_pfmemalloc(page); + return page_ref_count(netmem_to_page(netmem)) == 1 && + !page_is_pfmemalloc(netmem_to_page(netmem)); } /* If the page refcnt == 1, this will try to recycle the page. @@ -692,8 +701,8 @@ static bool __page_pool_page_can_be_recycled(const struct page *page) * If the page refcnt != 1, then the page will be returned to memory * subsystem. */ -static __always_inline struct page * -__page_pool_put_page(struct page_pool *pool, struct page *page, +static __always_inline netmem_ref +__page_pool_put_page(struct page_pool *pool, netmem_ref netmem, unsigned int dma_sync_size, bool allow_direct) { lockdep_assert_no_hardirq(); @@ -707,16 +716,16 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, * page is NOT reusable when allocated when system is under * some pressure. (page_is_pfmemalloc) */ - if (likely(__page_pool_page_can_be_recycled(page))) { + if (likely(__page_pool_page_can_be_recycled(netmem))) { /* Read barrier done in page_ref_count / READ_ONCE */ - page_pool_dma_sync_for_device(pool, page, dma_sync_size); + page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); - if (allow_direct && page_pool_recycle_in_cache(page, pool)) - return NULL; + if (allow_direct && page_pool_recycle_in_cache(netmem, pool)) + return 0; /* Page found as candidate for recycling */ - return page; + return netmem; } /* Fallback/non-XDP mode: API user have elevated refcnt. * @@ -732,9 +741,9 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, * will be invoking put_page. */ recycle_stat_inc(pool, released_refcnt); - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); - return NULL; + return 0; } static bool page_pool_napi_local(const struct page_pool *pool) @@ -760,19 +769,28 @@ static bool page_pool_napi_local(const struct page_pool *pool) return napi && READ_ONCE(napi->list_owner) == cpuid; } -void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page, - unsigned int dma_sync_size, bool allow_direct) +void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, + unsigned int dma_sync_size, bool allow_direct) { if (!allow_direct) allow_direct = page_pool_napi_local(pool); - page = __page_pool_put_page(pool, page, dma_sync_size, allow_direct); - if (page && !page_pool_recycle_in_ring(pool, page)) { + netmem = + __page_pool_put_page(pool, netmem, dma_sync_size, allow_direct); + if (netmem && !page_pool_recycle_in_ring(pool, netmem)) { /* Cache full, fallback to free pages */ recycle_stat_inc(pool, ring_full); - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); } } +EXPORT_SYMBOL(page_pool_put_unrefed_netmem); + +void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page, + unsigned int dma_sync_size, bool allow_direct) +{ + page_pool_put_unrefed_netmem(pool, page_to_netmem(page), dma_sync_size, + allow_direct); +} EXPORT_SYMBOL(page_pool_put_unrefed_page); /** @@ -800,16 +818,16 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data, allow_direct = page_pool_napi_local(pool); for (i = 0; i < count; i++) { - struct page *page = virt_to_head_page(data[i]); + netmem_ref netmem = page_to_netmem(virt_to_head_page(data[i])); /* It is not the last user for the page frag case */ - if (!page_pool_is_last_ref(page)) + if (!page_pool_is_last_ref(netmem)) continue; - page = __page_pool_put_page(pool, page, -1, allow_direct); + netmem = __page_pool_put_page(pool, netmem, -1, allow_direct); /* Approved for bulk recycling in ptr_ring cache */ - if (page) - data[bulk_len++] = page; + if (netmem) + data[bulk_len++] = (__force void *)netmem; } if (!bulk_len) @@ -835,98 +853,106 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data, * since put_page() with refcnt == 1 can be an expensive operation */ for (; i < bulk_len; i++) - page_pool_return_page(pool, data[i]); + page_pool_return_page(pool, (__force netmem_ref)data[i]); } EXPORT_SYMBOL(page_pool_put_page_bulk); -static struct page *page_pool_drain_frag(struct page_pool *pool, - struct page *page) +static netmem_ref page_pool_drain_frag(struct page_pool *pool, + netmem_ref netmem) { long drain_count = BIAS_MAX - pool->frag_users; /* Some user is still using the page frag */ - if (likely(page_pool_unref_page(page, drain_count))) - return NULL; + if (likely(page_pool_unref_netmem(netmem, drain_count))) + return 0; - if (__page_pool_page_can_be_recycled(page)) { - page_pool_dma_sync_for_device(pool, page, -1); - return page; + if (__page_pool_page_can_be_recycled(netmem)) { + page_pool_dma_sync_for_device(pool, netmem, -1); + return netmem; } - page_pool_return_page(pool, page); - return NULL; + page_pool_return_page(pool, netmem); + return 0; } static void page_pool_free_frag(struct page_pool *pool) { long drain_count = BIAS_MAX - pool->frag_users; - struct page *page = pool->frag_page; + netmem_ref netmem = pool->frag_page; - pool->frag_page = NULL; + pool->frag_page = 0; - if (!page || page_pool_unref_page(page, drain_count)) + if (!netmem || page_pool_unref_netmem(netmem, drain_count)) return; - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); } -struct page *page_pool_alloc_frag(struct page_pool *pool, - unsigned int *offset, - unsigned int size, gfp_t gfp) +netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, + unsigned int *offset, unsigned int size, + gfp_t gfp) { unsigned int max_size = PAGE_SIZE << pool->p.order; - struct page *page = pool->frag_page; + netmem_ref netmem = pool->frag_page; if (WARN_ON(size > max_size)) - return NULL; + return 0; size = ALIGN(size, dma_get_cache_alignment()); *offset = pool->frag_offset; - if (page && *offset + size > max_size) { - page = page_pool_drain_frag(pool, page); - if (page) { + if (netmem && *offset + size > max_size) { + netmem = page_pool_drain_frag(pool, netmem); + if (netmem) { alloc_stat_inc(pool, fast); goto frag_reset; } } - if (!page) { - page = page_pool_alloc_pages(pool, gfp); - if (unlikely(!page)) { - pool->frag_page = NULL; - return NULL; + if (!netmem) { + netmem = page_pool_alloc_netmem(pool, gfp); + if (unlikely(!netmem)) { + pool->frag_page = 0; + return 0; } - pool->frag_page = page; + pool->frag_page = netmem; frag_reset: pool->frag_users = 1; *offset = 0; pool->frag_offset = size; - page_pool_fragment_page(page, BIAS_MAX); - return page; + page_pool_fragment_netmem(netmem, BIAS_MAX); + return netmem; } pool->frag_users++; pool->frag_offset = *offset + size; alloc_stat_inc(pool, fast); - return page; + return netmem; +} +EXPORT_SYMBOL(page_pool_alloc_frag_netmem); + +struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, + unsigned int size, gfp_t gfp) +{ + return netmem_to_page(page_pool_alloc_frag_netmem(pool, offset, size, + gfp)); } EXPORT_SYMBOL(page_pool_alloc_frag); static void page_pool_empty_ring(struct page_pool *pool) { - struct page *page; + netmem_ref netmem; /* Empty recycle ring */ - while ((page = ptr_ring_consume_bh(&pool->ring))) { + while ((netmem = (__force netmem_ref)ptr_ring_consume_bh(&pool->ring))) { /* Verify the refcnt invariant of cached pages */ - if (!(page_ref_count(page) == 1)) + if (!(page_ref_count(netmem_to_page(netmem)) == 1)) pr_crit("%s() page_pool refcnt %d violation\n", - __func__, page_ref_count(page)); + __func__, netmem_ref_count(netmem)); - page_pool_return_page(pool, page); + page_pool_return_page(pool, netmem); } } @@ -942,7 +968,7 @@ static void __page_pool_destroy(struct page_pool *pool) static void page_pool_empty_alloc_cache_once(struct page_pool *pool) { - struct page *page; + netmem_ref netmem; if (pool->destroy_cnt) return; @@ -952,8 +978,8 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool) * call concurrently. */ while (pool->alloc.count) { - page = pool->alloc.cache[--pool->alloc.count]; - page_pool_return_page(pool, page); + netmem = pool->alloc.cache[--pool->alloc.count]; + page_pool_return_page(pool, netmem); } } @@ -1059,15 +1085,15 @@ EXPORT_SYMBOL(page_pool_destroy); /* Caller must provide appropriate safe context, e.g. NAPI. */ void page_pool_update_nid(struct page_pool *pool, int new_nid) { - struct page *page; + netmem_ref netmem; trace_page_pool_update_nid(pool, new_nid); pool->p.nid = new_nid; /* Flush pool alloc cache, as refill will check NUMA node */ while (pool->alloc.count) { - page = pool->alloc.cache[--pool->alloc.count]; - page_pool_return_page(pool, page); + netmem = pool->alloc.cache[--pool->alloc.count]; + page_pool_return_page(pool, netmem); } } EXPORT_SYMBOL(page_pool_update_nid); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c8ac79851cd67..8d5e85d71998d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1002,8 +1002,10 @@ int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, EXPORT_SYMBOL(skb_cow_data_for_xdp); #if IS_ENABLED(CONFIG_PAGE_POOL) -bool napi_pp_put_page(struct page *page) +bool napi_pp_put_page(netmem_ref netmem) { + struct page *page = netmem_to_page(netmem); + page = compound_head(page); /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation @@ -1016,7 +1018,7 @@ bool napi_pp_put_page(struct page *page) if (unlikely(!is_pp_page(page))) return false; - page_pool_put_full_page(page->pp, page, false); + page_pool_put_full_netmem(page->pp, page_to_netmem(page), false); return true; } @@ -1027,7 +1029,7 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data) { if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle) return false; - return napi_pp_put_page(virt_to_page(data)); + return napi_pp_put_page(page_to_netmem(virt_to_page(data))); } /** From patchwork Fri Jun 7 00:51:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13689128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36D26C27C65 for ; Fri, 7 Jun 2024 00:51:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9B786B009F; Thu, 6 Jun 2024 20:51:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B4ABD6B00A1; Thu, 6 Jun 2024 20:51:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C40C6B00A2; Thu, 6 Jun 2024 20:51:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7748F6B009F for ; Thu, 6 Jun 2024 20:51:46 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2F50D1217A6 for ; Fri, 7 Jun 2024 00:51:46 +0000 (UTC) X-FDA: 82202265012.27.4E7ED28 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf09.hostedemail.com (Postfix) with ESMTP id 5FFB7140004 for ; Fri, 7 Jun 2024 00:51:44 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="NmYzkr5/"; spf=pass (imf09.hostedemail.com: domain of 3n1liZgsKCBYw78wEDK849w2AA270.yA8749GJ-886Hwy6.AD2@flex--almasrymina.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3n1liZgsKCBYw78wEDK849w2AA270.yA8749GJ-886Hwy6.AD2@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717721504; a=rsa-sha256; cv=none; b=mzmFDYrYjVU/+7zyTFYEK77B00ta2tfK8CHF9r8Kvbz1gf8xqPFrIyzclcRCur4RRdZJJq +OGRVgYUxHuXYd3S+9VLDvPsjegrZfO46zTn9L0f32H05DQOHYndZrsgTyDddBuR5+CpVa OhunVFMzhKSyBgkArXRn77X1lODTYds= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="NmYzkr5/"; spf=pass (imf09.hostedemail.com: domain of 3n1liZgsKCBYw78wEDK849w2AA270.yA8749GJ-886Hwy6.AD2@flex--almasrymina.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3n1liZgsKCBYw78wEDK849w2AA270.yA8749GJ-886Hwy6.AD2@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717721504; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WDzODV4kE+AC+4iqQ1+TO1hQ5kMzZc7aXXPvIhe/YEE=; b=XMIPGVmmAQapb/A4N3sT0Fwt0tUHsOkRzTeZdRcgx0f4cEMx0iqk014fXEEWXHQFsR51ok F8LeN/19zivM03xDoOOArVRBfKyA/UHcm5TJKe9+vJF7gUYjHUKNINkiZbdm6833fTiEex GavdvYSjrD4oDisXOI0SBM258V+KSS0= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dfa75354911so2591749276.0 for ; Thu, 06 Jun 2024 17:51:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717721503; x=1718326303; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WDzODV4kE+AC+4iqQ1+TO1hQ5kMzZc7aXXPvIhe/YEE=; b=NmYzkr5/iA/vtxJl4N++HMgmQm764+pXgMdtyPfK4WR3ddiABefnXDPAqXpK9uq+5I cOkkvaEwpGmwdex4pAdEwk3d02KPS12PyEwx81WDpUiIHYng72EgnDnx73Ym0cpsSkuB BgCyRZyVJGM4C4cTM8RcwizwiSin7y1vwW/RmxGVXumi3DAIEwhpqcIgskg+fgDGgHwr xTPT/KbCoQEs5QrRVxthKh8tNncpvZ1iKJcxUn4mXzLJVRTwEaaLs3HqJZ2WmDxWmFjh WeEFWYtVo3bllhXhZqXtVpB8p3aYLZp+TIzPzlQjLqYkaj/8UTghaS/9mGjLlrZfcg5h V33g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717721503; x=1718326303; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WDzODV4kE+AC+4iqQ1+TO1hQ5kMzZc7aXXPvIhe/YEE=; b=ZnAkWmt3RmVDNn81m+A/Fwk9SWLgVDX3FiT98+H07xgtF+HxKDRnYD43eFUzg+dhAm 40evvc/aL2kBJl9v8m1e/55fd6pcbKnIsPFQcygqVGsHcfA8/pIwLaMvEgjQBxXWvvBN Ckbo6GWw8WU9/vCAD2hvkvp69sK9O4IDi1D2QM8CzhcfQfXiMTHFNKszTRRMA3lQeTIY L+MfV2u26VHAEBdygLrTZCehLw0CWIpQkts05b4HFly/64EugJ05hoNFVWjegJZKIawu YkTNSycTfFxcgqvZFL/gJID8dJnsnI98wSqHuLCsCrYpjEE+4VeLT/po6c6ihPHVfaAv dkkw== X-Forwarded-Encrypted: i=1; AJvYcCXRdRtQBV8qWZh/bLve7zlPnAriSMGaes07UjqO/yl/UEKsM9E43Z2Q1xMRA7TndIjNAR8ycANoMVvXjN/5nYzIA/Y= X-Gm-Message-State: AOJu0YxAqvN9bCQqj59pzt74eA0SJNQuPlAl2KDGY/oDC5A4VTT6mSAn cMPmvpQI5YoyCprBJkKzYyukoEHtO3616NQfy/rFpbHJrKUWwUL2y43r1GpM5vv+bdbX576+TMl AB7Gx8ctaCxLzDz+jBqtmZw== X-Google-Smtp-Source: AGHT+IFeq5dDxmjk0AduEyidVi81Fs0IslrvqOaedzZAJjAMTUZzkv1MttaVFn2yQ+UOZXuFTDZ6BSwncJHPica7Pg== X-Received: from almasrymina.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:4bc5]) (user=almasrymina job=sendgmr) by 2002:a25:a3c3:0:b0:de5:bc2f:72bb with SMTP id 3f1490d57ef6-dfaf66800f7mr75004276.12.1717721503132; Thu, 06 Jun 2024 17:51:43 -0700 (PDT) Date: Fri, 7 Jun 2024 00:51:16 +0000 In-Reply-To: <20240607005127.3078656-1-almasrymina@google.com> Mime-Version: 1.0 References: <20240607005127.3078656-1-almasrymina@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240607005127.3078656-7-almasrymina@google.com> Subject: [PATCH net-next v11 06/13] page_pool: devmem support From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Bagas Sanjaya , Christoph Hellwig , Nikolay Aleksandrov , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , linux-mm@kvack.org, Matthew Wilcox X-Rspamd-Queue-Id: 5FFB7140004 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: h7qca9ocgrurrneg1pru7jrary6bas8f X-HE-Tag: 1717721504-149574 X-HE-Meta: U2FsdGVkX19ElGdKleS+chE92dJpbNbLiRs0+7Mco7wj4MGKbCxCZwCJ47nuElwsHkAXeOlyXCBjNRG8jQthXkwvdausbiYG840yn9HIQGdXyLhcplaPjLQ6VZrC1PYir/W95lqHJ7uLoQ2zvVCFT8uHcnuriBCSOUcL/UCIVZtt1sY8jHFH7RVVZ0TSZQzW7H/QiYuHZsmI1r3Z1oJl99QZk/3EjMgmer99+70QubJVlv4hTY/zT/2xhI1U/IrV52TG5+Yh738fu0JWTXG0RP8+THq8Fgj9X1m0+IrZ6pGcaqYHSxPtjT3s1ac6mqPnknDvu4jA+AqLtkhoz6ZOYOy5F23q0AE/DP8pSO3xptrUZbdyVT9NwofhQPFGvzSPFgeW+jDE1GKI+uzVti5SvsMNLQLU9hwLaFQ/OrqQnARn76Ajx8DGcQ/8zR0whDUuNfZYQ05PAqSEK9gz1iLaFRta4kWxDhYUGcC3P6Va1Q+aHvFNIBBVdjZhlJ0urXbmDmpVX2xMbxjfz2WYFnnhnAvpRdqWsB0z5n/UMpYYZF06xTvTQ3Hxce6L+TJSVgBWBXnCXWfyQvV4amzTdG1Ig+ddIp4kqbuonxIAyqSNijeAiAMYi8tP0T3ZmAN19/Uid237WGyoLF9krBKIFVFxesP1bClUGUuNYlT34Kn7Ov2UHQDa9gi+ZpD0yepAZ5t3I38R2hP4Pw+knCgtZOX9GgBomDyCx0nCSYh4cscvTwjlpDkN50PdWCyCCwQSdw69I6+9+WxZ5bvfzjwE/08q1vGUIQPr8fzVB8qbvmJx+vti6leWJ+J7FPVrcdresr3p9vIPsjvQ7kRbD3xCp/MlIfSBY6SVSpjSbtQSgtKaMNfbdgnTapa2xPsBUl744epJUJ6L/eCRbbN1auUfQs6xgjk5yh6HC4Zzrdsuc0ivNRiyJ/Fkl1nL0/UgrvotI2mtEgTyJecF0ch6w8yh5ML cUvE3VPs NJ+d/b+Pp19MftSSeufeOlaUZWZ0YjM2nJDKbbOKD+QhE2gTzuCJUhVqtt9NBk7Y/tWME3Z3l/6mHbhezvS/BpGuR5C0iwuagNdCKXi6ScxxdDhXnl5w5QnYWdhATfATaRdunxc6+Y7NZDrgt9hrBkHkDo60X2vglUhjri60JY2b1w5MXeV8B2TztfwDqYHhqE1NU2ie1lpx6npcOkZSMPpCp8dooa4EAWYRJNgf+xwT4hKid3YnMOdYlmyTNHoD15nSqDtbUXyjbrmxluWxuMaR8K1SzWuZfRioGip6OfYWG+SHHBEHufDgz/fR6BJqUCnQlqKTerokvxp4Q82xN3nnbRAR/6iiElLTqSKlaKNlmgLEFr9nhG9jVZ8q1xRR6LiqQazMdGL8xy3rNdRDGYprP0JssqI9i4VMhflV1LYtKr4Vsi2RHcMEy4Jbz9+GJcMPDFpKHfIBACCJOy4VF4ZrndKdXCVHD+xZXpNuxgDqPXyf2Kfe4bOLaGzspWRRPuQ4LEgA9QNKBP0Ov8c0Ir3WDc7x9f6JJKMv6SKbQdd3Mqdjh+u7RPwDtiWpv21A4GCOvEDevHdyH7gUqIx1+1GufqO8IPJsNdFZ/EbNJld9GYJOkyCGxdqH2jqHrHMNOvbWvlLNYHzt8N67i/e3kjn3dvH4sOBzdGkmxCrKM+XgQCA4MOi59rqd8nRAwaPMEw3WIagdqi17VnzqTPtYmwRxF9PiLj4qQ9phpCaQmhmkky+MUn4dSC9MVHilJOBnJM9UUFOA/I+MEgQM1nE07I13DbiwnAyrPCGk9mQPngj1Po4CYoXox45qs/kCyJvfsj5E2kYsG+9Wyk469RO+lSMRdB626pGcSJVTLou0KMIL+FAy80VZaE1QfKxs9qLthXI2D1DzMkcUG0TDxAn4G67llAgJjM+oEDHzuNsJGLV+OAZGxiljsDmfqCQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Convert netmem to be a union of struct page and struct netmem. Overload the LSB of struct netmem* to indicate that it's a net_iov, otherwise it's a page. Currently these entries in struct page are rented by the page_pool and used exclusively by the net stack: struct { unsigned long pp_magic; struct page_pool *pp; unsigned long _pp_mapping_pad; unsigned long dma_addr; atomic_long_t pp_ref_count; }; Mirror these (and only these) entries into struct net_iov and implement netmem helpers that can access these common fields regardless of whether the underlying type is page or net_iov. Implement checks for net_iov in netmem helpers which delegate to mm APIs, to ensure net_iov are never passed to the mm stack. Signed-off-by: Mina Almasry --- v9: https://lore.kernel.org/netdev/20240403002053.2376017-8-almasrymina@google.com/ - Remove CONFIG checks in netmem_is_net_iov() (Pavel/David/Jens) v7: - Remove static_branch_unlikely from netmem_to_net_iov(). We're getting better results from the fast path in bench_page_pool_simple tests without the static_branch_unlikely, and the addition of static_branch_unlikely doesn't improve performance of devmem TCP. Additionally only check netmem_to_net_iov() if CONFIG_DMA_SHARED_BUFFER is enabled, otherwise dmabuf net_iovs cannot exist anyway. net-next base: 8 cycle fast path. with static_branch_unlikely: 10 cycle fast path. without static_branch_unlikely: 9 cycle fast path. CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path as baseline. Performance of devmem TCP is at 95% line rate is regardless of static_branch_unlikely or not. v6: - Rebased on top of the merged netmem_ref type. - Rebased on top of the merged skb_pp_frag_ref() changes. v5: - Use netmem instead of page* with LSB set. - Use pp_ref_count for refcounting net_iov. - Removed many of the custom checks for netmem. v1: - Disable fragmentation support for iov properly. - fix napi_pp_put_page() path (Yunsheng). - Use pp_frag_count for devmem refcounting. Cc: linux-mm@kvack.org Cc: Matthew Wilcox --- include/net/netmem.h | 137 ++++++++++++++++++++++++++++++-- include/net/page_pool/helpers.h | 25 +++--- net/core/devmem.c | 3 + net/core/page_pool.c | 26 +++--- net/core/skbuff.c | 22 +++-- 5 files changed, 168 insertions(+), 45 deletions(-) diff --git a/include/net/netmem.h b/include/net/netmem.h index 664df8325ece5..35ad237fdf29e 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -9,14 +9,51 @@ #define _NET_NETMEM_H #include +#include /* net_iov */ +DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers); + +/* We overload the LSB of the struct page pointer to indicate whether it's + * a page or net_iov. + */ +#define NET_IOV 0x01UL + struct net_iov { + unsigned long __unused_padding; + unsigned long pp_magic; + struct page_pool *pp; struct dmabuf_genpool_chunk_owner *owner; unsigned long dma_addr; + atomic_long_t pp_ref_count; }; +/* These fields in struct page are used by the page_pool and net stack: + * + * struct { + * unsigned long pp_magic; + * struct page_pool *pp; + * unsigned long _pp_mapping_pad; + * unsigned long dma_addr; + * atomic_long_t pp_ref_count; + * }; + * + * We mirror the page_pool fields here so the page_pool can access these fields + * without worrying whether the underlying fields belong to a page or net_iov. + * + * The non-net stack fields of struct page are private to the mm stack and must + * never be mirrored to net_iov. + */ +#define NET_IOV_ASSERT_OFFSET(pg, iov) \ + static_assert(offsetof(struct page, pg) == \ + offsetof(struct net_iov, iov)) +NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic); +NET_IOV_ASSERT_OFFSET(pp, pp); +NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr); +NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count); +#undef NET_IOV_ASSERT_OFFSET + static inline struct dmabuf_genpool_chunk_owner * net_iov_owner(const struct net_iov *niov) { @@ -47,20 +84,22 @@ net_iov_binding(const struct net_iov *niov) */ typedef unsigned long __bitwise netmem_ref; +static inline bool netmem_is_net_iov(const netmem_ref netmem) +{ + return (__force unsigned long)netmem & NET_IOV; +} + /* This conversion fails (returns NULL) if the netmem_ref is not struct page * backed. - * - * Currently struct page is the only possible netmem, and this helper never - * fails. */ static inline struct page *netmem_to_page(netmem_ref netmem) { + if (WARN_ON_ONCE(netmem_is_net_iov(netmem))) + return NULL; + return (__force struct page *)netmem; } -/* Converting from page to netmem is always safe, because a page can always be - * a netmem. - */ static inline netmem_ref page_to_netmem(struct page *page) { return (__force netmem_ref)page; @@ -68,17 +107,103 @@ static inline netmem_ref page_to_netmem(struct page *page) static inline int netmem_ref_count(netmem_ref netmem) { + /* The non-pp refcount of net_iov is always 1. On net_iov, we only + * support pp refcounting which uses the pp_ref_count field. + */ + if (netmem_is_net_iov(netmem)) + return 1; + return page_ref_count(netmem_to_page(netmem)); } static inline unsigned long netmem_to_pfn(netmem_ref netmem) { + if (netmem_is_net_iov(netmem)) + return 0; + return page_to_pfn(netmem_to_page(netmem)); } +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) +{ + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV); +} + +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp_magic; +} + +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic) +{ + __netmem_clear_lsb(netmem)->pp_magic |= pp_magic; +} + +static inline void netmem_clear_pp_magic(netmem_ref netmem) +{ + __netmem_clear_lsb(netmem)->pp_magic = 0; +} + +static inline struct page_pool *netmem_get_pp(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->pp; +} + +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) +{ + __netmem_clear_lsb(netmem)->pp = pool; +} + +static inline unsigned long netmem_get_dma_addr(netmem_ref netmem) +{ + return __netmem_clear_lsb(netmem)->dma_addr; +} + +static inline void netmem_set_dma_addr(netmem_ref netmem, + unsigned long dma_addr) +{ + __netmem_clear_lsb(netmem)->dma_addr = dma_addr; +} + +static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem) +{ + return &__netmem_clear_lsb(netmem)->pp_ref_count; +} + +static inline bool netmem_is_pref_nid(netmem_ref netmem, int pref_nid) +{ + /* Assume net_iov are on the preferred node without actually + * checking... + * + * This check is only used to check for recycling memory in the page + * pool's fast paths. Currently the only implementation of net_iov + * is dmabuf device memory. It's a deliberate decision by the user to + * bind a certain dmabuf to a certain netdev, and the netdev rx queue + * would not be able to reallocate memory from another dmabuf that + * exists on the preferred node, so, this check doesn't make much sense + * in this case. Assume all net_iovs can be recycled for now. + */ + if (netmem_is_net_iov(netmem)) + return true; + + return page_to_nid(netmem_to_page(netmem)) == pref_nid; +} + static inline netmem_ref netmem_compound_head(netmem_ref netmem) { + /* niov are never compounded */ + if (netmem_is_net_iov(netmem)) + return netmem; + return page_to_netmem(compound_head(netmem_to_page(netmem))); } +static inline void *netmem_address(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) + return NULL; + + return page_address(netmem_to_page(netmem)); +} + #endif /* _NET_NETMEM_H */ diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 5e129d5304f53..1770c7be24afc 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -217,7 +217,7 @@ page_pool_get_dma_dir(const struct page_pool *pool) static inline void page_pool_fragment_netmem(netmem_ref netmem, long nr) { - atomic_long_set(&netmem_to_page(netmem)->pp_ref_count, nr); + atomic_long_set(netmem_get_pp_ref_count_ref(netmem), nr); } /** @@ -245,7 +245,7 @@ static inline void page_pool_fragment_page(struct page *page, long nr) static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) { - struct page *page = netmem_to_page(netmem); + atomic_long_t *pp_ref_count = netmem_get_pp_ref_count_ref(netmem); long ret; /* If nr == pp_ref_count then we have cleared all remaining @@ -262,19 +262,19 @@ static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) * initially, and only overwrite it when the page is partitioned into * more than one piece. */ - if (atomic_long_read(&page->pp_ref_count) == nr) { + if (atomic_long_read(pp_ref_count) == nr) { /* As we have ensured nr is always one for constant case using * the BUILD_BUG_ON(), only need to handle the non-constant case * here for pp_ref_count draining, which is a rare case. */ BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1); if (!__builtin_constant_p(nr)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); return 0; } - ret = atomic_long_sub_return(nr, &page->pp_ref_count); + ret = atomic_long_sub_return(nr, pp_ref_count); WARN_ON(ret < 0); /* We are the last user here too, reset pp_ref_count back to 1 to @@ -283,7 +283,7 @@ static inline long page_pool_unref_netmem(netmem_ref netmem, long nr) * page_pool_unref_page() currently. */ if (unlikely(!ret)) - atomic_long_set(&page->pp_ref_count, 1); + atomic_long_set(pp_ref_count, 1); return ret; } @@ -402,9 +402,7 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va, static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - dma_addr_t ret = page->dma_addr; + dma_addr_t ret = netmem_get_dma_addr(netmem); if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) ret <<= PAGE_SHIFT; @@ -427,18 +425,17 @@ static inline dma_addr_t page_pool_get_dma_addr(const struct page *page) static inline bool page_pool_set_dma_addr_netmem(netmem_ref netmem, dma_addr_t addr) { - struct page *page = netmem_to_page(netmem); - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { - page->dma_addr = addr >> PAGE_SHIFT; + netmem_set_dma_addr(netmem, addr >> PAGE_SHIFT); /* We assume page alignment to shave off bottom bits, * if this "compression" doesn't work we need to drop. */ - return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT; + return addr != (dma_addr_t)netmem_get_dma_addr(netmem) + << PAGE_SHIFT; } - page->dma_addr = addr; + netmem_set_dma_addr(netmem, addr); return false; } diff --git a/net/core/devmem.c b/net/core/devmem.c index b53ea67c020d3..f4fd9b9dbb675 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -80,7 +80,10 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) index = offset / PAGE_SIZE; niov = &owner->niovs[index]; + niov->pp_magic = 0; + niov->pp = NULL; niov->dma_addr = 0; + atomic_long_set(&niov->pp_ref_count, 0); net_devmem_dmabuf_binding_get(binding); diff --git a/net/core/page_pool.c b/net/core/page_pool.c index b79e702cd69fc..0050a51d82dba 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -26,6 +26,8 @@ #include "page_pool_priv.h" +DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); + #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) @@ -357,7 +359,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) if (unlikely(!netmem)) break; - if (likely(page_to_nid(netmem_to_page(netmem)) == pref_nid)) { + if (likely(netmem_is_pref_nid(netmem, pref_nid))) { pool->alloc.cache[pool->alloc.count++] = netmem; } else { /* NUMA mismatch; @@ -453,10 +455,8 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page->pp = pool; - page->pp_magic |= PP_SIGNATURE; + netmem_set_pp(netmem, pool); + netmem_or_pp_magic(netmem, PP_SIGNATURE); /* Ensuring all pages have been split into one fragment initially: * page_pool_set_pp_info() is only called once for every page when it @@ -471,10 +471,8 @@ static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) static void page_pool_clear_pp_info(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page->pp_magic = 0; - page->pp = NULL; + netmem_clear_pp_magic(netmem); + netmem_set_pp(netmem, NULL); } static struct page *__page_pool_alloc_page_order(struct page_pool *pool, @@ -691,8 +689,9 @@ static bool page_pool_recycle_in_cache(netmem_ref netmem, static bool __page_pool_page_can_be_recycled(netmem_ref netmem) { - return page_ref_count(netmem_to_page(netmem)) == 1 && - !page_is_pfmemalloc(netmem_to_page(netmem)); + return netmem_is_net_iov(netmem) || + (page_ref_count(netmem_to_page(netmem)) == 1 && + !page_is_pfmemalloc(netmem_to_page(netmem))); } /* If the page refcnt == 1, this will try to recycle the page. @@ -714,7 +713,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, * refcnt == 1 means page_pool owns page, and can recycle it. * * page is NOT reusable when allocated when system is under - * some pressure. (page_is_pfmemalloc) + * some pressure. (page_pool_page_is_pfmemalloc) */ if (likely(__page_pool_page_can_be_recycled(netmem))) { /* Read barrier done in page_ref_count / READ_ONCE */ @@ -727,6 +726,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, /* Page found as candidate for recycling */ return netmem; } + /* Fallback/non-XDP mode: API user have elevated refcnt. * * Many drivers split up the page into fragments, and some @@ -948,7 +948,7 @@ static void page_pool_empty_ring(struct page_pool *pool) /* Empty recycle ring */ while ((netmem = (__force netmem_ref)ptr_ring_consume_bh(&pool->ring))) { /* Verify the refcnt invariant of cached pages */ - if (!(page_ref_count(netmem_to_page(netmem)) == 1)) + if (!(netmem_ref_count(netmem) == 1)) pr_crit("%s() page_pool refcnt %d violation\n", __func__, netmem_ref_count(netmem)); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8d5e85d71998d..5aaaa796c5fb4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -904,9 +904,9 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } -static bool is_pp_page(struct page *page) +static bool is_pp_netmem(netmem_ref netmem) { - return (page->pp_magic & ~0x3UL) == PP_SIGNATURE; + return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE; } int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb, @@ -1004,9 +1004,7 @@ EXPORT_SYMBOL(skb_cow_data_for_xdp); #if IS_ENABLED(CONFIG_PAGE_POOL) bool napi_pp_put_page(netmem_ref netmem) { - struct page *page = netmem_to_page(netmem); - - page = compound_head(page); + netmem = netmem_compound_head(netmem); /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation * in order to preserve any existing bits, such as bit 0 for the @@ -1015,10 +1013,10 @@ bool napi_pp_put_page(netmem_ref netmem) * and page_is_pfmemalloc() is checked in __page_pool_put_page() * to avoid recycling the pfmemalloc page. */ - if (unlikely(!is_pp_page(page))) + if (unlikely(!is_pp_netmem(netmem))) return false; - page_pool_put_full_netmem(page->pp, page_to_netmem(page), false); + page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false); return true; } @@ -1045,7 +1043,7 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data) static int skb_pp_frag_ref(struct sk_buff *skb) { struct skb_shared_info *shinfo; - struct page *head_page; + netmem_ref head_netmem; int i; if (!skb->pp_recycle) @@ -1054,11 +1052,11 @@ static int skb_pp_frag_ref(struct sk_buff *skb) shinfo = skb_shinfo(skb); for (i = 0; i < shinfo->nr_frags; i++) { - head_page = compound_head(skb_frag_page(&shinfo->frags[i])); - if (likely(is_pp_page(head_page))) - page_pool_ref_page(head_page); + head_netmem = netmem_compound_head(shinfo->frags[i].netmem); + if (likely(is_pp_netmem(head_netmem))) + page_pool_ref_netmem(head_netmem); else - page_ref_inc(head_page); + page_ref_inc(netmem_to_page(head_netmem)); } return 0; }