diff mbox series

[RFC,net-next,2/2] net: veth: Improving page pool recycling

Message ID 20230719072907.100948-2-liangchen.linux@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next,1/2] net: veth: Page pool creation error handling for existing pools only | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1342 this patch: 1342
netdev/cc_maintainers warning 2 maintainers not CCed: bpf@vger.kernel.org john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 83 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Liang Chen July 19, 2023, 7:29 a.m. UTC
Page pool is supported for veth. But for XDP_TX and XDP_REDIRECT cases,
the pages are not effectively recycled. "ethtool -S" statistics for the
page pool are as follows:

NIC statistics:
rx_pp_alloc_fast: 18041186
rx_pp_alloc_slow: 286369
rx_pp_recycle_ring: 0
rx_pp_recycle_released_ref: 18327555

This failure to recycle page pool pages is a result of the code snippet
below, which converts page pool pages into regular pages and releases
the skb data structure:

veth_xdp_get(xdp);
consume_skb(skb);

The reason behind is some skbs received from the veth peer are not page
pool pages, and remain so after conversion to xdp frame. In order to not
confusing __xdp_return with mixed regular pages and page pool pages, they
are all converted to regular pages. So registering xdp memory model as
MEM_TYPE_PAGE_SHARED is sufficient.

If we replace the above code with kfree_skb_partial, directly releasing
the skb data structure, we can retain the original page pool page behavior.
However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is
not a solution as explained above. Therefore, we introduced an additionally
MEM_TYPE_PAGE_POOL model for each rq.

The following tests were conducted using pktgen to generate traffic and
evaluate the performance improvement after page pool pages get successfully
recycled in scenarios involving XDP_TX, XDP_REDIRECT, and AF_XDP.

Test environment setup:
ns1                 ns2
veth0   <-peer->    veth1
veth2   <-peer->    veth3

Test Results:
pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP)
    without PP recycle: 1,780,392
    with PP recycle: 1,984,680
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS)
    without PP recycle: 1,433,491
    with PP recycle: 1,511,680
    improvement: 5~6%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP)
    without PP recycle: 1,527,708
    with PP recycle: 1,672,101
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS)
    without PP recycle: 1,325,804
    with PP recycle: 1,392,704
    improvement: ~5.5%

pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP)
    without PP recycle: 1,607,609
    with PP recycle: 1,736,957
    improvement: ~8%

Additionally, the performance improvement were measured when converting to
xdp_buff doesn't require buffer copy and original skb uses regular pages,
i.e. page pool recycle not involved. This still gives around 2% improvement
attributed to the changes from consume_skb to kfree_skb_partial.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 drivers/net/veth.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

Comments

Yunsheng Lin July 21, 2023, 12:18 p.m. UTC | #1
On 2023/7/19 15:29, Liang Chen wrote:

...

> 
> The reason behind is some skbs received from the veth peer are not page
> pool pages, and remain so after conversion to xdp frame. In order to not
> confusing __xdp_return with mixed regular pages and page pool pages, they
> are all converted to regular pages. So registering xdp memory model as
> MEM_TYPE_PAGE_SHARED is sufficient.
> 
> If we replace the above code with kfree_skb_partial, directly releasing
> the skb data structure, we can retain the original page pool page behavior.
> However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is
> not a solution as explained above. Therefore, we introduced an additionally
> MEM_TYPE_PAGE_POOL model for each rq.
> 

...

> @@ -874,9 +862,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>  		rcu_read_unlock();
>  		goto xdp_xmit;
>  	case XDP_REDIRECT:
> -		veth_xdp_get(xdp);
> -		consume_skb(skb);
> -		xdp->rxq->mem = rq->xdp_mem;
> +		xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem;

I am not really familiar with the veth here, so some question here:
Is it possible that skbs received from the veth peer are also page pool pages?
Does using the local rq->xdp_mem_pp for page allocated from veth peer cause
some problem here? As there is type and id for a specific page_pool instance,
type may be the same, but I suppose id is not the same for veth and it's veth
peer.

> +		kfree_skb_partial(skb, true);
> +
Liang Chen July 24, 2023, 9:44 a.m. UTC | #2
On Fri, Jul 21, 2023 at 8:18 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/7/19 15:29, Liang Chen wrote:
>
> ...
>
> >
> > The reason behind is some skbs received from the veth peer are not page
> > pool pages, and remain so after conversion to xdp frame. In order to not
> > confusing __xdp_return with mixed regular pages and page pool pages, they
> > are all converted to regular pages. So registering xdp memory model as
> > MEM_TYPE_PAGE_SHARED is sufficient.
> >
> > If we replace the above code with kfree_skb_partial, directly releasing
> > the skb data structure, we can retain the original page pool page behavior.
> > However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is
> > not a solution as explained above. Therefore, we introduced an additionally
> > MEM_TYPE_PAGE_POOL model for each rq.
> >
>
> ...
>
> > @@ -874,9 +862,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >               rcu_read_unlock();
> >               goto xdp_xmit;
> >       case XDP_REDIRECT:
> > -             veth_xdp_get(xdp);
> > -             consume_skb(skb);
> > -             xdp->rxq->mem = rq->xdp_mem;
> > +             xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem;
>
> I am not really familiar with the veth here, so some question here:
> Is it possible that skbs received from the veth peer are also page pool pages?
> Does using the local rq->xdp_mem_pp for page allocated from veth peer cause
> some problem here? As there is type and id for a specific page_pool instance,
> type may be the same, but I suppose id is not the same for veth and it's veth
> peer.
>

Yeah, I understand your concern. If a skb uses a page pool page whose
pool has previously been registered with a xdp memory model, this will
lead to a situation where veth compose a xdp frame indicating its
buffer coming from the local xdp_mem_pp pool from its xdp_mem_info.id
field, and the page structure itself refers to another pool from where
it was originally allocated. This may cause problems for things like
xdp_return_frame_bulk. We will address it in V2. Thank you for
bringing up this issue.


Thanks,
Liang

> > +             kfree_skb_partial(skb, true);
> > +
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 509e901da41d..a825b086f744 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -62,6 +62,7 @@  struct veth_rq {
 	struct net_device	*dev;
 	struct bpf_prog __rcu	*xdp_prog;
 	struct xdp_mem_info	xdp_mem;
+	struct xdp_mem_info	xdp_mem_pp;
 	struct veth_rq_stats	stats;
 	bool			rx_notify_masked;
 	struct ptr_ring		xdp_ring;
@@ -713,19 +714,6 @@  static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
 	}
 }
 
-static void veth_xdp_get(struct xdp_buff *xdp)
-{
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i;
-
-	get_page(virt_to_page(xdp->data));
-	if (likely(!xdp_buff_has_frags(xdp)))
-		return;
-
-	for (i = 0; i < sinfo->nr_frags; i++)
-		__skb_frag_ref(&sinfo->frags[i]);
-}
-
 static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 					struct xdp_buff *xdp,
 					struct sk_buff **pskb)
@@ -862,9 +850,9 @@  static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	case XDP_PASS:
 		break;
 	case XDP_TX:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem;
+		kfree_skb_partial(skb, true);
+
 		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 			trace_xdp_exception(rq->dev, xdp_prog, act);
 			stats->rx_drops++;
@@ -874,9 +862,9 @@  static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		rcu_read_unlock();
 		goto xdp_xmit;
 	case XDP_REDIRECT:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem;
+		kfree_skb_partial(skb, true);
+
 		if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 			stats->rx_drops++;
 			goto err_xdp;
@@ -1061,6 +1049,14 @@  static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 			goto err_page_pool;
 	}
 
+	for (i = start; i < end; i++) {
+		err = xdp_reg_mem_model(&priv->rq[i].xdp_mem_pp,
+					MEM_TYPE_PAGE_POOL,
+					priv->rq[i].page_pool);
+		if (err)
+			goto err_reg_mem;
+	}
+
 	for (i = start; i < end; i++) {
 		struct veth_rq *rq = &priv->rq[i];
 
@@ -1082,6 +1078,10 @@  static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 	for (i--; i >= start; i--)
 		ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free);
 	i = end;
+err_reg_mem:
+	for (i--; i >= start; i--)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+	i = end;
 err_page_pool:
 	for (i--; i >= start; i--) {
 		page_pool_destroy(priv->rq[i].page_pool);
@@ -1117,6 +1117,9 @@  static void veth_napi_del_range(struct net_device *dev, int start, int end)
 		ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free);
 	}
 
+	for (i = start; i < end; i++)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+
 	for (i = start; i < end; i++) {
 		page_pool_destroy(priv->rq[i].page_pool);
 		priv->rq[i].page_pool = NULL;