diff mbox series

[v3,bpf-next] net: veth: alloc skb in bulk for ndo_xdp_xmit

Message ID a14a30d3c06fff24e13f836c733d80efc0bd6eb5.1611957532.git.lorenzo@kernel.org (mailing list archive)
State Accepted
Commit 65e6dcf73398ddb64bb782ff2acd918d3a37a53a
Delegated to: BPF
Headers show
Series [v3,bpf-next] net: veth: alloc skb in bulk for ndo_xdp_xmit | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 7 maintainers not CCed: songliubraving@fb.com andrii@kernel.org hawk@kernel.org kpsingh@kernel.org john.fastabend@gmail.com kafai@fb.com yhs@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 7149 this patch: 7149
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 153 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 7549 this patch: 7549
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

lorenzo@kernel.org Jan. 29, 2021, 10:04 p.m. UTC
Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
in order to alloc skbs in bulk for XDP_PASS verdict.
Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
The proposed approach has been tested in the following scenario:

eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS

XDP_REDIRECT: xdp_redirect_map bpf sample
XDP_PASS: xdp_rxq_info bpf sample

traffic generator: pkt_gen sending udp traffic on a remote device

bpf-next master: ~3.64Mpps
bpf-next + skb bulking allocation: ~3.79Mpps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes since v2:
- use __GFP_ZERO flag instead of memset
- move some veth_xdp_rcv_batch() logic in veth_xdp_rcv_skb()

Changes since v1:
- drop patch 2/3, squash patch 1/3 and 3/3
- set VETH_XDP_BATCH to 16
- rework veth_xdp_rcv to use __ptr_ring_consume
---
 drivers/net/veth.c | 78 ++++++++++++++++++++++++++++++++++------------
 include/net/xdp.h  |  1 +
 net/core/xdp.c     | 11 +++++++
 3 files changed, 70 insertions(+), 20 deletions(-)

Comments

Toshiaki Makita Jan. 31, 2021, 2:16 p.m. UTC | #1
On 2021/01/30 7:04, Lorenzo Bianconi wrote:
> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
> in order to alloc skbs in bulk for XDP_PASS verdict.
> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
> The proposed approach has been tested in the following scenario:
> 
> eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS
> 
> XDP_REDIRECT: xdp_redirect_map bpf sample
> XDP_PASS: xdp_rxq_info bpf sample
> 
> traffic generator: pkt_gen sending udp traffic on a remote device
> 
> bpf-next master: ~3.64Mpps
> bpf-next + skb bulking allocation: ~3.79Mpps
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>

Thanks,
Toshiaki Makita
Jesper Dangaard Brouer Feb. 2, 2021, 2:14 p.m. UTC | #2
On Fri, 29 Jan 2021 23:04:08 +0100
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
> in order to alloc skbs in bulk for XDP_PASS verdict.
> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
> The proposed approach has been tested in the following scenario:
> 
> eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS
> 
> XDP_REDIRECT: xdp_redirect_map bpf sample
> XDP_PASS: xdp_rxq_info bpf sample
> 
> traffic generator: pkt_gen sending udp traffic on a remote device
> 
> bpf-next master: ~3.64Mpps
> bpf-next + skb bulking allocation: ~3.79Mpps
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---

I wanted Lorenzo to test 8 vs 16 bulking, but after much testing and
IRC dialog, we cannot find and measure any difference with enough
accuracy. Thus:

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

> Changes since v2:
> - use __GFP_ZERO flag instead of memset
> - move some veth_xdp_rcv_batch() logic in veth_xdp_rcv_skb()
> 
> Changes since v1:
> - drop patch 2/3, squash patch 1/3 and 3/3
> - set VETH_XDP_BATCH to 16
> - rework veth_xdp_rcv to use __ptr_ring_consume
> ---
>  drivers/net/veth.c | 78 ++++++++++++++++++++++++++++++++++------------
>  include/net/xdp.h  |  1 +
>  net/core/xdp.c     | 11 +++++++
>  3 files changed, 70 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 6e03b619c93c..aa1a66ad2ce5 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -35,6 +35,7 @@
>  #define VETH_XDP_HEADROOM	(XDP_PACKET_HEADROOM + NET_IP_ALIGN)
>  
>  #define VETH_XDP_TX_BULK_SIZE	16
> +#define VETH_XDP_BATCH		16
>
patchwork-bot+netdevbpf@kernel.org Feb. 4, 2021, 12:10 a.m. UTC | #3
Hello:

This patch was applied to bpf/bpf-next.git (refs/heads/master):

On Fri, 29 Jan 2021 23:04:08 +0100 you wrote:
> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
> in order to alloc skbs in bulk for XDP_PASS verdict.
> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
> The proposed approach has been tested in the following scenario:
> 
> eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS
> 
> [...]

Here is the summary with links:
  - [v3,bpf-next] net: veth: alloc skb in bulk for ndo_xdp_xmit
    https://git.kernel.org/bpf/bpf-next/c/65e6dcf73398

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Daniel Borkmann Feb. 4, 2021, 12:14 a.m. UTC | #4
On 1/29/21 11:04 PM, Lorenzo Bianconi wrote:
> Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
> in order to alloc skbs in bulk for XDP_PASS verdict.
> Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
> The proposed approach has been tested in the following scenario:
[...]
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 0d2630a35c3e..05354976c1fc 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -514,6 +514,17 @@ void xdp_warn(const char *msg, const char *func, const int line)
>   };
>   EXPORT_SYMBOL_GPL(xdp_warn);
>   
> +int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
> +{
> +	n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp,
> +				      n_skb, skbs);

Applied, but one question I was wondering about when reading the kmem_cache_alloc_bulk()
code was whether it would be safer to simply test for kmem_cache_alloc_bulk() != n_skb
given it could potentially in future also alloc less objs than requested, but I presume
if such extension would get implemented then call-sites might need to indicate 'best
effort' somehow via flag instead (to handle < n_skb case). Either way all current callers
assume for != 0 that everything went well, so lgtm.

> +	if (unlikely(!n_skb))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);
> +
>   struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
>   					   struct sk_buff *skb,
>   					   struct net_device *dev)
>
Jesper Dangaard Brouer Feb. 4, 2021, 9:05 a.m. UTC | #5
On Thu, 4 Feb 2021 01:14:56 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 1/29/21 11:04 PM, Lorenzo Bianconi wrote:
> > Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
> > in order to alloc skbs in bulk for XDP_PASS verdict.
> > Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
> > The proposed approach has been tested in the following scenario:  
> [...]
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index 0d2630a35c3e..05354976c1fc 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -514,6 +514,17 @@ void xdp_warn(const char *msg, const char *func, const int line)
> >   };
> >   EXPORT_SYMBOL_GPL(xdp_warn);
> >   
> > +int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
> > +{
> > +	n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp,
> > +				      n_skb, skbs);  
> 
> Applied, but one question I was wondering about when reading the kmem_cache_alloc_bulk()
> code was whether it would be safer to simply test for kmem_cache_alloc_bulk() != n_skb
> given it could potentially in future also alloc less objs than requested, but I presume
> if such extension would get implemented then call-sites might need to indicate 'best
> effort' somehow via flag instead (to handle < n_skb case). Either way all current callers
> assume for != 0 that everything went well, so lgtm.

It was Andrew (AKPM) that wanted the API to either return the requested
number of objects or fail. I respected the MM-maintainers request at
that point, even-though I wanted the other API as there is a small
performance advantage (not crossing page boundary in SLUB).

At that time we discussed it on MM-list, and I see his/the point:
If API can allocate less objs than requested, then think about how this
complicated the surrounding code. E.g. in this specific code we already
have VETH_XDP_BATCH(16) xdp_frame objects, which we need to get 16 SKB
objects for.  What should the code do if it cannot get 16 SKBs(?).
Daniel Borkmann Feb. 4, 2021, 3:43 p.m. UTC | #6
On 2/4/21 10:05 AM, Jesper Dangaard Brouer wrote:
[...]
> It was Andrew (AKPM) that wanted the API to either return the requested
> number of objects or fail. I respected the MM-maintainers request at
> that point, even-though I wanted the other API as there is a small
> performance advantage (not crossing page boundary in SLUB).
> 
> At that time we discussed it on MM-list, and I see his/the point:
> If API can allocate less objs than requested, then think about how this
> complicated the surrounding code. E.g. in this specific code we already
> have VETH_XDP_BATCH(16) xdp_frame objects, which we need to get 16 SKB
> objects for.  What should the code do if it cannot get 16 SKBs(?).

Right, I mentioned the error handling complications above wrt < n_skb case. I think iff this
ever gets implemented and there's a need, it would probably be best to add a new flag like
__GFP_BULK_BEST_EFFORT to indicate that it would be okay to return x elements with x being
in (0, size], so that only those callers need to deal with this, and all others can expect
[as today] that != 0 means all #size elements were bulk alloc'ed.

Thanks,
Daniel
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 6e03b619c93c..aa1a66ad2ce5 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -35,6 +35,7 @@ 
 #define VETH_XDP_HEADROOM	(XDP_PACKET_HEADROOM + NET_IP_ALIGN)
 
 #define VETH_XDP_TX_BULK_SIZE	16
+#define VETH_XDP_BATCH		16
 
 struct veth_stats {
 	u64	rx_drops;
@@ -562,14 +563,13 @@  static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp,
 	return 0;
 }
 
-static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
-					struct xdp_frame *frame,
-					struct veth_xdp_tx_bq *bq,
-					struct veth_stats *stats)
+static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
+					  struct xdp_frame *frame,
+					  struct veth_xdp_tx_bq *bq,
+					  struct veth_stats *stats)
 {
 	struct xdp_frame orig_frame;
 	struct bpf_prog *xdp_prog;
-	struct sk_buff *skb;
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
@@ -623,13 +623,7 @@  static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	}
 	rcu_read_unlock();
 
-	skb = xdp_build_skb_from_frame(frame, rq->dev);
-	if (!skb) {
-		xdp_return_frame(frame);
-		stats->rx_drops++;
-	}
-
-	return skb;
+	return frame;
 err_xdp:
 	rcu_read_unlock();
 	xdp_return_frame(frame);
@@ -637,6 +631,37 @@  static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	return NULL;
 }
 
+/* frames array contains VETH_XDP_BATCH at most */
+static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
+				  int n_xdpf, struct veth_xdp_tx_bq *bq,
+				  struct veth_stats *stats)
+{
+	void *skbs[VETH_XDP_BATCH];
+	int i;
+
+	if (xdp_alloc_skb_bulk(skbs, n_xdpf,
+			       GFP_ATOMIC | __GFP_ZERO) < 0) {
+		for (i = 0; i < n_xdpf; i++)
+			xdp_return_frame(frames[i]);
+		stats->rx_drops += n_xdpf;
+
+		return;
+	}
+
+	for (i = 0; i < n_xdpf; i++) {
+		struct sk_buff *skb = skbs[i];
+
+		skb = __xdp_build_skb_from_frame(frames[i], skb,
+						 rq->dev);
+		if (!skb) {
+			xdp_return_frame(frames[i]);
+			stats->rx_drops++;
+			continue;
+		}
+		napi_gro_receive(&rq->xdp_napi, skb);
+	}
+}
+
 static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 					struct sk_buff *skb,
 					struct veth_xdp_tx_bq *bq,
@@ -784,32 +809,45 @@  static int veth_xdp_rcv(struct veth_rq *rq, int budget,
 			struct veth_xdp_tx_bq *bq,
 			struct veth_stats *stats)
 {
-	int i, done = 0;
+	int i, done = 0, n_xdpf = 0;
+	void *xdpf[VETH_XDP_BATCH];
 
 	for (i = 0; i < budget; i++) {
 		void *ptr = __ptr_ring_consume(&rq->xdp_ring);
-		struct sk_buff *skb;
 
 		if (!ptr)
 			break;
 
 		if (veth_is_xdp_frame(ptr)) {
+			/* ndo_xdp_xmit */
 			struct xdp_frame *frame = veth_ptr_to_xdp(ptr);
 
 			stats->xdp_bytes += frame->len;
-			skb = veth_xdp_rcv_one(rq, frame, bq, stats);
+			frame = veth_xdp_rcv_one(rq, frame, bq, stats);
+			if (frame) {
+				/* XDP_PASS */
+				xdpf[n_xdpf++] = frame;
+				if (n_xdpf == VETH_XDP_BATCH) {
+					veth_xdp_rcv_bulk_skb(rq, xdpf, n_xdpf,
+							      bq, stats);
+					n_xdpf = 0;
+				}
+			}
 		} else {
-			skb = ptr;
+			/* ndo_start_xmit */
+			struct sk_buff *skb = ptr;
+
 			stats->xdp_bytes += skb->len;
 			skb = veth_xdp_rcv_skb(rq, skb, bq, stats);
+			if (skb)
+				napi_gro_receive(&rq->xdp_napi, skb);
 		}
-
-		if (skb)
-			napi_gro_receive(&rq->xdp_napi, skb);
-
 		done++;
 	}
 
+	if (n_xdpf)
+		veth_xdp_rcv_bulk_skb(rq, xdpf, n_xdpf, bq, stats);
+
 	u64_stats_update_begin(&rq->stats.syncp);
 	rq->stats.vs.xdp_redirect += stats->xdp_redirect;
 	rq->stats.vs.xdp_bytes += stats->xdp_bytes;
diff --git a/include/net/xdp.h b/include/net/xdp.h
index c4bfdc9a8b79..a5bc214a49d9 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -169,6 +169,7 @@  struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct net_device *dev);
 struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					 struct net_device *dev);
+int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp);
 
 static inline
 void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 0d2630a35c3e..05354976c1fc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -514,6 +514,17 @@  void xdp_warn(const char *msg, const char *func, const int line)
 };
 EXPORT_SYMBOL_GPL(xdp_warn);
 
+int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
+{
+	n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp,
+				      n_skb, skbs);
+	if (unlikely(!n_skb))
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);
+
 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)