diff mbox series

[RFC,1/6] net: extend ubuf_info callback to ops structure

Message ID 62a4e09968a9a0f73780876dc6fb0f784bee5fae.1712923998.git.asml.silence@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series implement io_uring notification (ubuf_info) stacking | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Pavel Begunkov April 12, 2024, 12:55 p.m. UTC
We'll need to associate additional callbacks with ubuf_info, introduce
a structure holding ubuf_info callbacks. Apart from a more smarter
io_uring notification management introduced in next patches, it can be
used to generalise msg_zerocopy_put_abort() and also store
->sg_from_iter, which is currently passed in struct msghdr.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/tap.c      |  2 +-
 drivers/net/tun.c      |  2 +-
 drivers/vhost/net.c    |  8 ++++++--
 include/linux/skbuff.h | 19 +++++++++++--------
 io_uring/notif.c       |  8 ++++++--
 net/core/skbuff.c      | 17 +++++++++++------
 6 files changed, 36 insertions(+), 20 deletions(-)

Comments

David Ahern April 13, 2024, 5:17 p.m. UTC | #1
On 4/12/24 6:55 AM, Pavel Begunkov wrote:
> We'll need to associate additional callbacks with ubuf_info, introduce
> a structure holding ubuf_info callbacks. Apart from a more smarter
> io_uring notification management introduced in next patches, it can be
> used to generalise msg_zerocopy_put_abort() and also store
> ->sg_from_iter, which is currently passed in struct msghdr.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  drivers/net/tap.c      |  2 +-
>  drivers/net/tun.c      |  2 +-
>  drivers/vhost/net.c    |  8 ++++++--
>  include/linux/skbuff.h | 19 +++++++++++--------
>  io_uring/notif.c       |  8 ++++++--
>  net/core/skbuff.c      | 17 +++++++++++------
>  6 files changed, 36 insertions(+), 20 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>
Willem de Bruijn April 14, 2024, 5:07 p.m. UTC | #2
Pavel Begunkov wrote:
> We'll need to associate additional callbacks with ubuf_info, introduce
> a structure holding ubuf_info callbacks. Apart from a more smarter
> io_uring notification management introduced in next patches, it can be
> used to generalise msg_zerocopy_put_abort() and also store
> ->sg_from_iter, which is currently passed in struct msghdr.

This adds an extra indirection for all other ubuf implementations.
Can that be avoided?
Pavel Begunkov April 15, 2024, 12:07 a.m. UTC | #3
On 4/14/24 18:07, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> We'll need to associate additional callbacks with ubuf_info, introduce
>> a structure holding ubuf_info callbacks. Apart from a more smarter
>> io_uring notification management introduced in next patches, it can be
>> used to generalise msg_zerocopy_put_abort() and also store
>> ->sg_from_iter, which is currently passed in struct msghdr.
> 
> This adds an extra indirection for all other ubuf implementations.
> Can that be avoided?

It could be fitted directly into ubuf_info, but that doesn't feel
right. It should be hot, so does it even matter? On the bright side,
with the patch I'll also ->sg_from_iter from msghdr into it, so it
doesn't have to be in the generic path.

I think it's the right approach, but if you have a strong opinion
I can fit it as a new field in ubuf_info.
Willem de Bruijn April 15, 2024, 3:06 p.m. UTC | #4
Pavel Begunkov wrote:
> On 4/14/24 18:07, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> We'll need to associate additional callbacks with ubuf_info, introduce
> >> a structure holding ubuf_info callbacks. Apart from a more smarter
> >> io_uring notification management introduced in next patches, it can be
> >> used to generalise msg_zerocopy_put_abort() and also store
> >> ->sg_from_iter, which is currently passed in struct msghdr.
> > 
> > This adds an extra indirection for all other ubuf implementations.
> > Can that be avoided?
> 
> It could be fitted directly into ubuf_info, but that doesn't feel
> right. It should be hot, so does it even matter?

That depends on the workload (working set size)?

> On the bright side,
> with the patch I'll also ->sg_from_iter from msghdr into it, so it
> doesn't have to be in the generic path.

I don't follow this: is this suggested future work?

> 
> I think it's the right approach, but if you have a strong opinion
> I can fit it as a new field in ubuf_info.

If there is a significant cost, I suppose we could use
INDIRECT_CALL or go one step further and demultiplex
based on the new ops

    if (uarg->ops == &msg_zerocopy_ubuf_ops)
        msg_zerocopy_callback(..);
Pavel Begunkov April 15, 2024, 6:55 p.m. UTC | #5
On 4/15/24 16:06, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> On 4/14/24 18:07, Willem de Bruijn wrote:
>>> Pavel Begunkov wrote:
>>>> We'll need to associate additional callbacks with ubuf_info, introduce
>>>> a structure holding ubuf_info callbacks. Apart from a more smarter
>>>> io_uring notification management introduced in next patches, it can be
>>>> used to generalise msg_zerocopy_put_abort() and also store
>>>> ->sg_from_iter, which is currently passed in struct msghdr.
>>>
>>> This adds an extra indirection for all other ubuf implementations.
>>> Can that be avoided?
>>
>> It could be fitted directly into ubuf_info, but that doesn't feel
>> right. It should be hot, so does it even matter?
> 
> That depends on the workload (working set size)?
>>> On the bright side,
>> with the patch I'll also ->sg_from_iter from msghdr into it, so it
>> doesn't have to be in the generic path.
> 
> I don't follow this: is this suggested future work?

Right, a small change I will add later. Without ops though
having 3 callback fields in uargs would be out of hands.

>> I think it's the right approach, but if you have a strong opinion
>> I can fit it as a new field in ubuf_info.
> 
> If there is a significant cost, I suppose we could use
> INDIRECT_CALL or go one step further and demultiplex
> based on the new ops
> 
>      if (uarg->ops == &msg_zerocopy_ubuf_ops)
>          msg_zerocopy_callback(..);

Let me note that the patch doesn't change the number of indirect
calls but only adds one extra deref to get the callback, i.e.
uarg->ops->callback() instead of uarg->callback(). Your snippet
goes an extra mile and removes the indirect call.

Can I take it as that you're fine with the direction of the
patch? Or do you want me to change anything?
Willem de Bruijn April 15, 2024, 7:01 p.m. UTC | #6
Pavel Begunkov wrote:
> On 4/15/24 16:06, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> On 4/14/24 18:07, Willem de Bruijn wrote:
> >>> Pavel Begunkov wrote:
> >>>> We'll need to associate additional callbacks with ubuf_info, introduce
> >>>> a structure holding ubuf_info callbacks. Apart from a more smarter
> >>>> io_uring notification management introduced in next patches, it can be
> >>>> used to generalise msg_zerocopy_put_abort() and also store
> >>>> ->sg_from_iter, which is currently passed in struct msghdr.
> >>>
> >>> This adds an extra indirection for all other ubuf implementations.
> >>> Can that be avoided?
> >>
> >> It could be fitted directly into ubuf_info, but that doesn't feel
> >> right. It should be hot, so does it even matter?
> > 
> > That depends on the workload (working set size)?
> >>> On the bright side,
> >> with the patch I'll also ->sg_from_iter from msghdr into it, so it
> >> doesn't have to be in the generic path.
> > 
> > I don't follow this: is this suggested future work?
> 
> Right, a small change I will add later. Without ops though
> having 3 callback fields in uargs would be out of hands.
> 
> >> I think it's the right approach, but if you have a strong opinion
> >> I can fit it as a new field in ubuf_info.
> > 
> > If there is a significant cost, I suppose we could use
> > INDIRECT_CALL or go one step further and demultiplex
> > based on the new ops
> > 
> >      if (uarg->ops == &msg_zerocopy_ubuf_ops)
> >          msg_zerocopy_callback(..);
> 
> Let me note that the patch doesn't change the number of indirect
> calls but only adds one extra deref to get the callback, i.e.
> uarg->ops->callback() instead of uarg->callback().

Of course. Didn't mean to imply otherwise.

> Your snippet
> goes an extra mile and removes the indirect call.
>
> Can I take it as that you're fine with the direction of the
> patch? Or do you want me to change anything?

It's fine. I want to avoid new paths slowing down existing code where
possible. But if this extra deref would prove significant we have a
workaround.
David Ahern April 16, 2024, 2:50 p.m. UTC | #7
On 4/14/24 6:07 PM, Pavel Begunkov wrote:
> On the bright side,
> with the patch I'll also ->sg_from_iter from msghdr into it, so it
> doesn't have to be in the generic path.

So, what's old is new again? That's where it started:

https://lore.kernel.org/netdev/20220628225204.GA27554@u2004-local/
Pavel Begunkov April 16, 2024, 3:31 p.m. UTC | #8
On 4/16/24 15:50, David Ahern wrote:
> On 4/14/24 6:07 PM, Pavel Begunkov wrote:
>> On the bright side,
>> with the patch I'll also ->sg_from_iter from msghdr into it, so it
>> doesn't have to be in the generic path.
> 
> So, what's old is new again? That's where it started:
> 
> https://lore.kernel.org/netdev/20220628225204.GA27554@u2004-local/

Hah, indeed, your patch had it in uarg. I wonder why I didn't put
them all in a table back then, if the argument was to keep struct
ubuf_info leaner.
diff mbox series

Patch

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 9f0495e8df4d..bfdd3875fe86 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -754,7 +754,7 @@  static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 		skb_zcopy_init(skb, msg_control);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(NULL, uarg, false);
+		uarg->ops->complete(NULL, uarg, false);
 	}
 
 	dev_queue_xmit(skb);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0b3f21cba552..b7401d990680 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1906,7 +1906,7 @@  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_zcopy_init(skb, msg_control);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(NULL, uarg, false);
+		uarg->ops->complete(NULL, uarg, false);
 	}
 
 	skb_reset_network_header(skb);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c64ded183f8d..f16279351db5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -380,7 +380,7 @@  static void vhost_zerocopy_signal_used(struct vhost_net *net,
 	}
 }
 
-static void vhost_zerocopy_callback(struct sk_buff *skb,
+static void vhost_zerocopy_complete(struct sk_buff *skb,
 				    struct ubuf_info *ubuf_base, bool success)
 {
 	struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base);
@@ -408,6 +408,10 @@  static void vhost_zerocopy_callback(struct sk_buff *skb,
 	rcu_read_unlock_bh();
 }
 
+static const struct ubuf_info_ops vhost_ubuf_ops = {
+	.complete = vhost_zerocopy_complete,
+};
+
 static inline unsigned long busy_clock(void)
 {
 	return local_clock() >> 10;
@@ -879,7 +883,7 @@  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
-			ubuf->ubuf.callback = vhost_zerocopy_callback;
+			ubuf->ubuf.ops = &vhost_ubuf_ops;
 			ubuf->ubuf.flags = SKBFL_ZEROCOPY_FRAG;
 			refcount_set(&ubuf->ubuf.refcnt, 1);
 			msg.msg_control = &ctl;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9d24aec064e8..a110e97e074a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -527,6 +527,11 @@  enum {
 #define SKBFL_ALL_ZEROCOPY	(SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \
 				 SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS)
 
+struct ubuf_info_ops {
+	void (*complete)(struct sk_buff *, struct ubuf_info *,
+			 bool zerocopy_success);
+};
+
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
@@ -536,8 +541,7 @@  enum {
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-	void (*callback)(struct sk_buff *, struct ubuf_info *,
-			 bool zerocopy_success);
+	const struct ubuf_info_ops *ops;
 	refcount_t refcnt;
 	u8 flags;
 };
@@ -1662,14 +1666,13 @@  static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset)
 }
 #endif
 
+extern const struct ubuf_info_ops msg_zerocopy_ubuf_ops;
+
 struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 				       struct ubuf_info *uarg);
 
 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
-void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
-			   bool success);
-
 int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
 			    struct sk_buff *skb, struct iov_iter *from,
 			    size_t length);
@@ -1757,13 +1760,13 @@  static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
 static inline void net_zcopy_put(struct ubuf_info *uarg)
 {
 	if (uarg)
-		uarg->callback(NULL, uarg, true);
+		uarg->ops->complete(NULL, uarg, true);
 }
 
 static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
 	if (uarg) {
-		if (uarg->callback == msg_zerocopy_callback)
+		if (uarg->ops == &msg_zerocopy_ubuf_ops)
 			msg_zerocopy_put_abort(uarg, have_uref);
 		else if (have_uref)
 			net_zcopy_put(uarg);
@@ -1777,7 +1780,7 @@  static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
 
 	if (uarg) {
 		if (!skb_zcopy_is_nouarg(skb))
-			uarg->callback(skb, uarg, zerocopy_success);
+			uarg->ops->complete(skb, uarg, zerocopy_success);
 
 		skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
 	}
diff --git a/io_uring/notif.c b/io_uring/notif.c
index b561bd763435..7caaebf94312 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -24,7 +24,7 @@  void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 	io_req_task_complete(notif, ts);
 }
 
-static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
 				bool success)
 {
 	struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg);
@@ -43,6 +43,10 @@  static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 	}
 }
 
+static const struct ubuf_info_ops io_ubuf_ops = {
+	.complete = io_tx_ubuf_complete,
+};
+
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	__must_hold(&ctx->uring_lock)
 {
@@ -62,7 +66,7 @@  struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	nd->zc_report = false;
 	nd->account_pages = 0;
 	nd->uarg.flags = IO_NOTIF_UBUF_FLAGS;
-	nd->uarg.callback = io_tx_ubuf_callback;
+	nd->uarg.ops = &io_ubuf_ops;
 	refcount_set(&nd->uarg.refcnt, 1);
 	return notif;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b99127712e67..749abab23a67 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1708,7 +1708,7 @@  static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 		return NULL;
 	}
 
-	uarg->ubuf.callback = msg_zerocopy_callback;
+	uarg->ubuf.ops = &msg_zerocopy_ubuf_ops;
 	uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1;
 	uarg->len = 1;
 	uarg->bytelen = size;
@@ -1734,7 +1734,7 @@  struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 		u32 bytelen, next;
 
 		/* there might be non MSG_ZEROCOPY users */
-		if (uarg->callback != msg_zerocopy_callback)
+		if (uarg->ops != &msg_zerocopy_ubuf_ops)
 			return NULL;
 
 		/* realloc only when socket is locked (TCP, UDP cork),
@@ -1845,8 +1845,8 @@  static void __msg_zerocopy_callback(struct ubuf_info_msgzc *uarg)
 	sock_put(sk);
 }
 
-void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
-			   bool success)
+static void msg_zerocopy_complete(struct sk_buff *skb, struct ubuf_info *uarg,
+				  bool success)
 {
 	struct ubuf_info_msgzc *uarg_zc = uarg_to_msgzc(uarg);
 
@@ -1855,7 +1855,7 @@  void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 	if (refcount_dec_and_test(&uarg->refcnt))
 		__msg_zerocopy_callback(uarg_zc);
 }
-EXPORT_SYMBOL_GPL(msg_zerocopy_callback);
+
 
 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
@@ -1865,10 +1865,15 @@  void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 	uarg_to_msgzc(uarg)->len--;
 
 	if (have_uref)
-		msg_zerocopy_callback(NULL, uarg, true);
+		msg_zerocopy_complete(NULL, uarg, true);
 }
 EXPORT_SYMBOL_GPL(msg_zerocopy_put_abort);
 
+const struct ubuf_info_ops msg_zerocopy_ubuf_ops = {
+	.complete = msg_zerocopy_complete,
+};
+EXPORT_SYMBOL_GPL(msg_zerocopy_ubuf_ops);
+
 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 			     struct msghdr *msg, int len,
 			     struct ubuf_info *uarg)