diff mbox series

[RFC,v3,08/12] net: support non paged skb frags

Message ID 20231106024413.2801438-9-almasrymina@google.com (mailing list archive)
State New
Headers show
Series Device Memory TCP | expand

Commit Message

Mina Almasry Nov. 6, 2023, 2:44 a.m. UTC
Make skb_frag_page() fail in the case where the frag is not backed
by a page, and fix its relevent callers to handle this case.

Correctly handle skb_frag refcounting in the page_pool_iovs case.

Signed-off-by: Mina Almasry <almasrymina@google.com>

---
 include/linux/skbuff.h | 42 +++++++++++++++++++++++++++++++++++-------
 net/core/gro.c         |  2 +-
 net/core/skbuff.c      |  3 +++
 net/ipv4/tcp.c         | 10 +++++++++-
 4 files changed, 48 insertions(+), 9 deletions(-)

Comments

Yunsheng Lin Nov. 7, 2023, 9 a.m. UTC | #1
On 2023/11/6 10:44, Mina Almasry wrote:
> Make skb_frag_page() fail in the case where the frag is not backed
> by a page, and fix its relevent callers to handle this case.
> 
> Correctly handle skb_frag refcounting in the page_pool_iovs case.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> 

...

>  /**
>   * skb_frag_page - retrieve the page referred to by a paged fragment
>   * @frag: the paged fragment
>   *
> - * Returns the &struct page associated with @frag.
> + * Returns the &struct page associated with @frag. Returns NULL if this frag
> + * has no associated page.
>   */
>  static inline struct page *skb_frag_page(const skb_frag_t *frag)
>  {
> -	return frag->bv_page;
> +	if (!page_is_page_pool_iov(frag->bv_page))
> +		return frag->bv_page;
> +
> +	return NULL;

It seems most of callers don't expect NULL returning for skb_frag_page(),
and this patch only changes a few relevant callers to handle the NULL case.

It may make more sense to add a new helper to do the above checking, and
add a warning in skb_frag_page() to catch any missing NULL checking for
skb_frag_page() caller, something like below?

 static inline struct page *skb_frag_page(const skb_frag_t *frag)
 {
-       return frag->bv_page;
+       struct page *page = frag->bv_page;
+
+       BUG_ON(page_is_page_pool_iov(page));
+
+       return page;
+}
+
+static inline struct page *skb_frag_readable_page(const skb_frag_t *frag)
+{
+       struct page *page = frag->bv_page;
+
+       if (!page_is_page_pool_iov(page))
+               return page;
+
+       return NULL;
 }
Mina Almasry Nov. 7, 2023, 9:19 p.m. UTC | #2
On Tue, Nov 7, 2023 at 1:00 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/11/6 10:44, Mina Almasry wrote:
> > Make skb_frag_page() fail in the case where the frag is not backed
> > by a page, and fix its relevent callers to handle this case.
> >
> > Correctly handle skb_frag refcounting in the page_pool_iovs case.
> >
> > Signed-off-by: Mina Almasry <almasrymina@google.com>
> >
>
> ...
>
> >  /**
> >   * skb_frag_page - retrieve the page referred to by a paged fragment
> >   * @frag: the paged fragment
> >   *
> > - * Returns the &struct page associated with @frag.
> > + * Returns the &struct page associated with @frag. Returns NULL if this frag
> > + * has no associated page.
> >   */
> >  static inline struct page *skb_frag_page(const skb_frag_t *frag)
> >  {
> > -     return frag->bv_page;
> > +     if (!page_is_page_pool_iov(frag->bv_page))
> > +             return frag->bv_page;
> > +
> > +     return NULL;
>
> It seems most of callers don't expect NULL returning for skb_frag_page(),
> and this patch only changes a few relevant callers to handle the NULL case.
>

Yes, I did not change code that I guessed was not likely to be
affected or enable the devmem TCP case. Here is my breakdown:

➜  cos-kernel git:(tcpdevmem) ✗ ack -i "skb_frag_page\("
--ignore-dir=drivers -t cc -l
net/core/dev.c
net/core/datagram.c
net/core/xdp.c
net/core/skbuff.c
net/core/filter.c
net/core/gro.c
net/appletalk/ddp.c
net/wireless/util.c
net/tls/tls_device.c
net/tls/tls_device_fallback.c
net/ipv4/tcp.c
net/ipv4/tcp_output.c
net/bpf/test_run.c
include/linux/skbuff.h

I'm ignoring ank skb_frag_page() calls in drivers because drivers need
to add support for devmem TCP, and handle these calls at time of
adding support, I think that's reasonable.

net/core/dev.c:
I think I missed ilegal_highdma()

net/core/datagram.c:
__skb_datagram_iter() protected by not_readable(skb) check.

net/core/skbuff.c:
protected by not_readable(skb) check.

net/core/filter.c:
bpf_xdp_frags_shrink_tail seems like xdp specific, not sure it's relevant here.

net/core/gro.c:
skb_gro_reset_offset: protected by NULL check

net/ipv4/tcp.c:
tcp_zerocopy_receive protected by NULL check.

net/ipv4/tcp_output.c:
tcp_clone_payload: handles NULL return fine.

net/bpf/test_run.c:
seems xdp specific and not sure if it can run into devmem issues.

include/linux/skbuff.h:
I think the multiple calls here are being handled correctly, but let
me know if not.

All the calls in these files, I think, are code paths not possible to
hit devmem TCP with the current support, I think:
net/core/xdp.c
net/appletalk/ddp.c
net/wireless/util.c
net/tls/tls_device.c
net/tls/tls_device_fallback.c

All in all I think maybe all in all I missed illegal_highdma(). I'll
fix it in the next iteration.

> It may make more sense to add a new helper to do the above checking, and
> add a warning in skb_frag_page() to catch any missing NULL checking for
> skb_frag_page() caller, something like below?
>
>  static inline struct page *skb_frag_page(const skb_frag_t *frag)
>  {
> -       return frag->bv_page;
> +       struct page *page = frag->bv_page;
> +
> +       BUG_ON(page_is_page_pool_iov(page));
> +
> +       return page;
> +}
> +
> +static inline struct page *skb_frag_readable_page(const skb_frag_t *frag)
> +{
> +       struct page *page = frag->bv_page;
> +
> +       if (!page_is_page_pool_iov(page))
> +               return page;
> +
> +       return NULL;
>  }
>
>

My personal immediate reaction is that this may just introduce code
churn without significant benefit. If an unsuspecting caller call
skb_frag_page() on devmem frag and doesn't correctly handle NULL
return, it will crash or error out anyway, and likely in some obvious
way, so maybe the BUG_ON() isn't so useful that it's worth changing
all the call sites. But if there is consensus on adding a change like
you propose, I have no problem adding it.
Yunsheng Lin Nov. 8, 2023, 11:25 a.m. UTC | #3
On 2023/11/8 5:19, Mina Almasry wrote:
>>
>>
> 
> My personal immediate reaction is that this may just introduce code
> churn without significant benefit. If an unsuspecting caller call
> skb_frag_page() on devmem frag and doesn't correctly handle NULL
> return, it will crash or error out anyway, and likely in some obvious
> way, so maybe the BUG_ON() isn't so useful that it's worth changing

If it will always crash or error out, then I agree that BUG_ON() is
unnecessary.

> all the call sites. But if there is consensus on adding a change like
> you propose, I have no problem adding it.

One obvious benefit I forget to mention is that, it provides a better
semantic that if a caller need to do the return checking:
1. For the old helper, the semantic is not to do the checking if the caller
   has ensure that it has passed a readable frag to skb_frag_page(), which
   avoid adding some overhead for non-devmen supported drivers.
2. For the new helper, the semantic is to do the checking and we may provide
   a compiler '__must_check' function-attribute to ensure the caller to do
   the checking.


>
Paolo Abeni Nov. 9, 2023, 9:14 a.m. UTC | #4
On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote:
[...]
> @@ -3421,7 +3446,7 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
>   */
>  static inline void __skb_frag_ref(skb_frag_t *frag)
>  {
> -	get_page(skb_frag_page(frag));
> +	page_pool_page_get_many(frag->bv_page, 1);

I guess the above needs #ifdef CONFIG_PAGE_POOL guards and explicit
skb_frag_is_page_pool_iov() check ?


Cheers,

Paolo
Mina Almasry Nov. 10, 2023, 4:06 a.m. UTC | #5
On Thu, Nov 9, 2023 at 1:15 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Sun, 2023-11-05 at 18:44 -0800, Mina Almasry wrote:
> [...]
> > @@ -3421,7 +3446,7 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
> >   */
> >  static inline void __skb_frag_ref(skb_frag_t *frag)
> >  {
> > -     get_page(skb_frag_page(frag));
> > +     page_pool_page_get_many(frag->bv_page, 1);
>
> I guess the above needs #ifdef CONFIG_PAGE_POOL guards and explicit
> skb_frag_is_page_pool_iov() check ?
>

It doesn't actually. page_pool_page_* helpers are compiled in
regardless of CONFIG_PAGE_POOL, and handle both page_pool_iov* & page*
just fine (the checking happens inside the function).

You may yell at me that it's too confusing... I somewhat agree, but
I'm unsure of what is a better name or location for the helpers. The
helpers handle (page_pool_iov* || page*) gracefully, so they seem to
belong in the page pool for me, but it is indeed surprising/confusing
that these helpers are available even if !CONFIG_PAGE_POOL.

>
> Cheers,
>
> Paolo
>
>
Jakub Kicinski Nov. 10, 2023, 11:19 p.m. UTC | #6
On Sun,  5 Nov 2023 18:44:07 -0800 Mina Almasry wrote:
>  #include <net/net_debug.h>
>  #include <net/dropreason-core.h>
> +#include <net/page_pool/types.h>
> +#include <net/page_pool/helpers.h>

>  /**
>   * DOC: skb checksums
> @@ -3402,15 +3404,38 @@ static inline void skb_frag_off_copy(skb_frag_t *fragto,
>  	fragto->bv_offset = fragfrom->bv_offset;
>  }
>  
> +/* Returns true if the skb_frag contains a page_pool_iov. */
> +static inline bool skb_frag_is_page_pool_iov(const skb_frag_t *frag)
> +{
> +	return page_is_page_pool_iov(frag->bv_page);
> +}

Maybe we can create a new header? For skb + page pool.

skbuff.h is included by 1/4th of the kernel objects, we should not
be adding dependencies to this header, it really slows down incremental
builds.
Mina Almasry Nov. 13, 2023, 6:05 a.m. UTC | #7
On Fri, Nov 10, 2023 at 3:19 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun,  5 Nov 2023 18:44:07 -0800 Mina Almasry wrote:
> >  #include <net/net_debug.h>
> >  #include <net/dropreason-core.h>
> > +#include <net/page_pool/types.h>
> > +#include <net/page_pool/helpers.h>
>
> >  /**
> >   * DOC: skb checksums
> > @@ -3402,15 +3404,38 @@ static inline void skb_frag_off_copy(skb_frag_t *fragto,
> >       fragto->bv_offset = fragfrom->bv_offset;
> >  }
> >
> > +/* Returns true if the skb_frag contains a page_pool_iov. */
> > +static inline bool skb_frag_is_page_pool_iov(const skb_frag_t *frag)
> > +{
> > +     return page_is_page_pool_iov(frag->bv_page);
> > +}
>
> Maybe we can create a new header? For skb + page pool.
>
> skbuff.h is included by 1/4th of the kernel objects, we should not
> be adding dependencies to this header, it really slows down incremental
> builds.
>

My issue here is that all these skb helpers call each other so I end
up having to move a lot of the unrelated skb helpers to this new
header (maybe that is acceptable but it feels weird).

What I could do here is move all the page_pool_page|iov_* helpers to a
minimal header, and include only that one from skbuff.h, rather than
including all of net/page_pool/helpers.h
Jakub Kicinski Nov. 13, 2023, 10:17 p.m. UTC | #8
On Sun, 12 Nov 2023 22:05:30 -0800 Mina Almasry wrote:
> My issue here is that all these skb helpers call each other so I end
> up having to move a lot of the unrelated skb helpers to this new
> header (maybe that is acceptable but it feels weird).

Splitting pp headers again is not an option, we already split it into
types and helpers.

The series doesn't apply and it's a bit hard for me to, in between LPC
talks, figure out how to extract your patches from the GH UI to take a
proper look :(
We can defer this for now, and hopefully v4 will apply to net-next.
But it will need to get solved.
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 97bfef071255..1fae276c1353 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -37,6 +37,8 @@ 
 #endif
 #include <net/net_debug.h>
 #include <net/dropreason-core.h>
+#include <net/page_pool/types.h>
+#include <net/page_pool/helpers.h>
 
 /**
  * DOC: skb checksums
@@ -3402,15 +3404,38 @@  static inline void skb_frag_off_copy(skb_frag_t *fragto,
 	fragto->bv_offset = fragfrom->bv_offset;
 }
 
+/* Returns true if the skb_frag contains a page_pool_iov. */
+static inline bool skb_frag_is_page_pool_iov(const skb_frag_t *frag)
+{
+	return page_is_page_pool_iov(frag->bv_page);
+}
+
 /**
  * skb_frag_page - retrieve the page referred to by a paged fragment
  * @frag: the paged fragment
  *
- * Returns the &struct page associated with @frag.
+ * Returns the &struct page associated with @frag. Returns NULL if this frag
+ * has no associated page.
  */
 static inline struct page *skb_frag_page(const skb_frag_t *frag)
 {
-	return frag->bv_page;
+	if (!page_is_page_pool_iov(frag->bv_page))
+		return frag->bv_page;
+
+	return NULL;
+}
+
+/**
+ * skb_frag_page_pool_iov - retrieve the page_pool_iov referred to by fragment
+ * @frag: the fragment
+ *
+ * Returns the &struct page_pool_iov associated with @frag. Returns NULL if this
+ * frag has no associated page_pool_iov.
+ */
+static inline struct page_pool_iov *
+skb_frag_page_pool_iov(const skb_frag_t *frag)
+{
+	return page_to_page_pool_iov(frag->bv_page);
 }
 
 /**
@@ -3421,7 +3446,7 @@  static inline struct page *skb_frag_page(const skb_frag_t *frag)
  */
 static inline void __skb_frag_ref(skb_frag_t *frag)
 {
-	get_page(skb_frag_page(frag));
+	page_pool_page_get_many(frag->bv_page, 1);
 }
 
 /**
@@ -3441,13 +3466,13 @@  bool napi_pp_put_page(struct page *page, bool napi_safe);
 static inline void
 napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
 {
-	struct page *page = skb_frag_page(frag);
-
 #ifdef CONFIG_PAGE_POOL
-	if (recycle && napi_pp_put_page(page, napi_safe))
+	if (recycle && napi_pp_put_page(frag->bv_page, napi_safe))
 		return;
+	page_pool_page_put_many(frag->bv_page, 1);
+#else
+	put_page(skb_frag_page(frag));
 #endif
-	put_page(page);
 }
 
 /**
@@ -3487,6 +3512,9 @@  static inline void skb_frag_unref(struct sk_buff *skb, int f)
  */
 static inline void *skb_frag_address(const skb_frag_t *frag)
 {
+	if (!skb_frag_page(frag))
+		return NULL;
+
 	return page_address(skb_frag_page(frag)) + skb_frag_off(frag);
 }
 
diff --git a/net/core/gro.c b/net/core/gro.c
index 0759277dc14e..42d7f6755f32 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -376,7 +376,7 @@  static inline void skb_gro_reset_offset(struct sk_buff *skb, u32 nhoff)
 	NAPI_GRO_CB(skb)->frag0 = NULL;
 	NAPI_GRO_CB(skb)->frag0_len = 0;
 
-	if (!skb_headlen(skb) && pinfo->nr_frags &&
+	if (!skb_headlen(skb) && pinfo->nr_frags && skb_frag_page(frag0) &&
 	    !PageHighMem(skb_frag_page(frag0)) &&
 	    (!NET_IP_ALIGN || !((skb_frag_off(frag0) + nhoff) & 3))) {
 		NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c52ddd6891d9..13eca4fd25e1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2994,6 +2994,9 @@  static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
 	for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) {
 		const skb_frag_t *f = &skb_shinfo(skb)->frags[seg];
 
+		if (WARN_ON_ONCE(!skb_frag_page(f)))
+			return false;
+
 		if (__splice_segment(skb_frag_page(f),
 				     skb_frag_off(f), skb_frag_size(f),
 				     offset, len, spd, false, sk, pipe))
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a86d8200a1e8..23b29dc49271 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2155,6 +2155,9 @@  static int tcp_zerocopy_receive(struct sock *sk,
 			break;
 		}
 		page = skb_frag_page(frags);
+		if (WARN_ON_ONCE(!page))
+			break;
+
 		prefetchw(page);
 		pages[pages_to_map++] = page;
 		length += PAGE_SIZE;
@@ -4411,7 +4414,12 @@  int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 	for (i = 0; i < shi->nr_frags; ++i) {
 		const skb_frag_t *f = &shi->frags[i];
 		unsigned int offset = skb_frag_off(f);
-		struct page *page = skb_frag_page(f) + (offset >> PAGE_SHIFT);
+		struct page *page = skb_frag_page(f);
+
+		if (WARN_ON_ONCE(!page))
+			return 1;
+
+		page += offset >> PAGE_SHIFT;
 
 		sg_set_page(&sg, page, skb_frag_size(f),
 			    offset_in_page(offset));