diff mbox series

[RFC,2/6] net: add callback for setting a ubuf_info to skb

Message ID d0d9e3fffcaba4ace1fb8f437bd4783928bb2d24.1712923998.git.asml.silence@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series implement io_uring notification (ubuf_info) stacking | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Pavel Begunkov April 12, 2024, 12:55 p.m. UTC
At the moment an skb can only have one ubuf_info associated with it,
which might be a performance problem for zerocopy sends in cases like
TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
way we will implement smarter assignment later like linking ubuf_info
together.

Note, it's an optional callback, which should be compatible with
skb_zcopy_set(), that's because the net stack might potentially decide
to clone an skb and take another reference to ubuf_info whenever it
wishes. Also, a correct implementation should always be able to bind to
an skb without prior ubuf_info, otherwise we could end up in a situation
when the send would not be able to progress.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h |  2 ++
 net/core/skbuff.c      | 20 ++++++++++++++------
 2 files changed, 16 insertions(+), 6 deletions(-)

Comments

David Ahern April 13, 2024, 5:18 p.m. UTC | #1
On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> At the moment an skb can only have one ubuf_info associated with it,
> which might be a performance problem for zerocopy sends in cases like
> TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
> way we will implement smarter assignment later like linking ubuf_info
> together.
> 
> Note, it's an optional callback, which should be compatible with
> skb_zcopy_set(), that's because the net stack might potentially decide
> to clone an skb and take another reference to ubuf_info whenever it
> wishes. Also, a correct implementation should always be able to bind to
> an skb without prior ubuf_info, otherwise we could end up in a situation
> when the send would not be able to progress.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/skbuff.h |  2 ++
>  net/core/skbuff.c      | 20 ++++++++++++++------
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a110e97e074a..ced69f37977f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -530,6 +530,8 @@  enum {
 struct ubuf_info_ops {
 	void (*complete)(struct sk_buff *, struct ubuf_info *,
 			 bool zerocopy_success);
+	/* has to be compatible with skb_zcopy_set() */
+	int (*link_skb)(struct sk_buff *skb, struct ubuf_info *uarg);
 };
 
 /*
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 749abab23a67..1922e3d09c7f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1881,11 +1881,18 @@  int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 	struct ubuf_info *orig_uarg = skb_zcopy(skb);
 	int err, orig_len = skb->len;
 
-	/* An skb can only point to one uarg. This edge case happens when
-	 * TCP appends to an skb, but zerocopy_realloc triggered a new alloc.
-	 */
-	if (orig_uarg && uarg != orig_uarg)
-		return -EEXIST;
+	if (uarg->ops->link_skb) {
+		err = uarg->ops->link_skb(skb, uarg);
+		if (err)
+			return err;
+	} else {
+		/* An skb can only point to one uarg. This edge case happens
+		 * when TCP appends to an skb, but zerocopy_realloc triggered
+		 * a new alloc.
+		 */
+		if (orig_uarg && uarg != orig_uarg)
+			return -EEXIST;
+	}
 
 	err = __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len);
 	if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) {
@@ -1899,7 +1906,8 @@  int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 		return err;
 	}
 
-	skb_zcopy_set(skb, uarg, NULL);
+	if (!uarg->ops->link_skb)
+		skb_zcopy_set(skb, uarg, NULL);
 	return skb->len - orig_len;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream);