Message ID | 20230801061932.10335-2-liangchen.linux@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC,net-next,v2,1/2] net: veth: Page pool creation error handling for existing pools only | expand |
On 2023/8/1 14:19, Liang Chen wrote: > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > case XDP_PASS: > break; > case XDP_TX: > - veth_xdp_get(xdp); > - consume_skb(skb); > - xdp->rxq->mem = rq->xdp_mem; > + if (skb != skb_orig) { > + xdp->rxq->mem = rq->xdp_mem_pp; > + kfree_skb_partial(skb, true); For this case, I suppose that we can safely call kfree_skb_partial() as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but I am not sure about the !skb->pp_recycle case. > + } else if (!skb->pp_recycle) { > + xdp->rxq->mem = rq->xdp_mem; > + kfree_skb_partial(skb, true); For consume_skb(), there is skb_unref() checking and other checking/operation. Can we really assume that we can call kfree_skb_partial() with head_stolen being true? Is it possible that skb->users is bigger than 1? If it is possible, don't we free the 'skb' back to skbuff_cache when other may still be using it? > + } else { > + veth_xdp_get(xdp); > + consume_skb(skb); > + xdp->rxq->mem = rq->xdp_mem; > + } > +
On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2023/8/1 14:19, Liang Chen wrote: > > > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > case XDP_PASS: > > break; > > case XDP_TX: > > - veth_xdp_get(xdp); > > - consume_skb(skb); > > - xdp->rxq->mem = rq->xdp_mem; > > + if (skb != skb_orig) { > > + xdp->rxq->mem = rq->xdp_mem_pp; > > + kfree_skb_partial(skb, true); > > For this case, I suppose that we can safely call kfree_skb_partial() > as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but > I am not sure about the !skb->pp_recycle case. > > > + } else if (!skb->pp_recycle) { > > + xdp->rxq->mem = rq->xdp_mem; > > + kfree_skb_partial(skb, true); > > For consume_skb(), there is skb_unref() checking and other checking/operation. > Can we really assume that we can call kfree_skb_partial() with head_stolen > being true? Is it possible that skb->users is bigger than 1? If it is possible, > don't we free the 'skb' back to skbuff_cache when other may still be using > it? > Thanks for raising the concern. If there are multiple references to the skb (skb->users is greater than 1), the skb will be reallocated in veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig case. In fact, entering the !skb->pp_recycle case implies that the skb meets the following conditions: 1. It is neither shared nor cloned. 2. It is not allocated using kmalloc. 3. It does not have fragment data. 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM. Thanks, Liang > > + } else { > > + veth_xdp_get(xdp); > > + consume_skb(skb); > > + xdp->rxq->mem = rq->xdp_mem; > > + } > > + >
On 2023/8/7 20:20, Liang Chen wrote: > On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: >> >> On 2023/8/1 14:19, Liang Chen wrote: >> >>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, >>> case XDP_PASS: >>> break; >>> case XDP_TX: >>> - veth_xdp_get(xdp); >>> - consume_skb(skb); >>> - xdp->rxq->mem = rq->xdp_mem; >>> + if (skb != skb_orig) { >>> + xdp->rxq->mem = rq->xdp_mem_pp; >>> + kfree_skb_partial(skb, true); >> >> For this case, I suppose that we can safely call kfree_skb_partial() >> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but >> I am not sure about the !skb->pp_recycle case. >> >>> + } else if (!skb->pp_recycle) { >>> + xdp->rxq->mem = rq->xdp_mem; >>> + kfree_skb_partial(skb, true); >> >> For consume_skb(), there is skb_unref() checking and other checking/operation. >> Can we really assume that we can call kfree_skb_partial() with head_stolen >> being true? Is it possible that skb->users is bigger than 1? If it is possible, >> don't we free the 'skb' back to skbuff_cache when other may still be using >> it? >> > > Thanks for raising the concern. If there are multiple references to > the skb (skb->users is greater than 1), the skb will be reallocated in > veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig > case. > > In fact, entering the !skb->pp_recycle case implies that the skb meets > the following conditions: > 1. It is neither shared nor cloned. > 2. It is not allocated using kmalloc. > 3. It does not have fragment data. > 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM. > You are right, I missed the checking in veth_convert_skb_to_xdp_buff(), it seems the xdp is pretty strict about the buffer owner, it need to have exclusive access to all the buffer. And it seems there is only one difference left then, with kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and consume_skb() calling 'kfree_skbmem(skb)'. If we are true about 'skb' only allocated from 'skbuff_cache', this patch looks good to me then. >
On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2023/8/7 20:20, Liang Chen wrote: > > On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > >> > >> On 2023/8/1 14:19, Liang Chen wrote: > >> > >>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > >>> case XDP_PASS: > >>> break; > >>> case XDP_TX: > >>> - veth_xdp_get(xdp); > >>> - consume_skb(skb); > >>> - xdp->rxq->mem = rq->xdp_mem; > >>> + if (skb != skb_orig) { > >>> + xdp->rxq->mem = rq->xdp_mem_pp; > >>> + kfree_skb_partial(skb, true); > >> > >> For this case, I suppose that we can safely call kfree_skb_partial() > >> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but > >> I am not sure about the !skb->pp_recycle case. > >> > >>> + } else if (!skb->pp_recycle) { > >>> + xdp->rxq->mem = rq->xdp_mem; > >>> + kfree_skb_partial(skb, true); > >> > >> For consume_skb(), there is skb_unref() checking and other checking/operation. > >> Can we really assume that we can call kfree_skb_partial() with head_stolen > >> being true? Is it possible that skb->users is bigger than 1? If it is possible, > >> don't we free the 'skb' back to skbuff_cache when other may still be using > >> it? > >> > > > > Thanks for raising the concern. If there are multiple references to > > the skb (skb->users is greater than 1), the skb will be reallocated in > > veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig > > case. > > > > In fact, entering the !skb->pp_recycle case implies that the skb meets > > the following conditions: > > 1. It is neither shared nor cloned. > > 2. It is not allocated using kmalloc. > > 3. It does not have fragment data. > > 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM. > > > > You are right, I missed the checking in veth_convert_skb_to_xdp_buff(), > it seems the xdp is pretty strict about the buffer owner, it need to > have exclusive access to all the buffer. > > And it seems there is only one difference left then, with > kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and > consume_skb() calling 'kfree_skbmem(skb)'. If we are true about > 'skb' only allocated from 'skbuff_cache', this patch looks good to me > then. > The difference between kmem_cache_free and kfree_skbmem lies in the fact that kfree_skbmem checks whether the skb is an fclone (fast clone) skb. If it is, it should be returned to the skbuff_fclone_cache. Currently, fclone skbs can only be allocated through __alloc_skb, and their head buffer is allocated by kmalloc_reserve, which does not meet the condition mentioned above - "2. It is not allocated using kmalloc.". Therefore, the fclone skb will still be reallocated by veth_convert_skb_to_xdp_buff, leading to the skb != skb_orig case. In other words, entering the !skb->pp_recycle case indicates that the skb was allocated from skbuff_cache. Thanks for the review, Liang > >
On 2023/8/9 18:01, Liang Chen wrote: > On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: >> >> On 2023/8/7 20:20, Liang Chen wrote: >>> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: >>>> >>>> On 2023/8/1 14:19, Liang Chen wrote: >>>> >>>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, >>>>> case XDP_PASS: >>>>> break; >>>>> case XDP_TX: >>>>> - veth_xdp_get(xdp); >>>>> - consume_skb(skb); >>>>> - xdp->rxq->mem = rq->xdp_mem; >>>>> + if (skb != skb_orig) { >>>>> + xdp->rxq->mem = rq->xdp_mem_pp; >>>>> + kfree_skb_partial(skb, true); >>>> >>>> For this case, I suppose that we can safely call kfree_skb_partial() >>>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but >>>> I am not sure about the !skb->pp_recycle case. >>>> >>>>> + } else if (!skb->pp_recycle) { >>>>> + xdp->rxq->mem = rq->xdp_mem; >>>>> + kfree_skb_partial(skb, true); >>>> >>>> For consume_skb(), there is skb_unref() checking and other checking/operation. >>>> Can we really assume that we can call kfree_skb_partial() with head_stolen >>>> being true? Is it possible that skb->users is bigger than 1? If it is possible, >>>> don't we free the 'skb' back to skbuff_cache when other may still be using >>>> it? >>>> >>> >>> Thanks for raising the concern. If there are multiple references to >>> the skb (skb->users is greater than 1), the skb will be reallocated in >>> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig >>> case. >>> >>> In fact, entering the !skb->pp_recycle case implies that the skb meets >>> the following conditions: >>> 1. It is neither shared nor cloned. >>> 2. It is not allocated using kmalloc. >>> 3. It does not have fragment data. >>> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM. >>> >> >> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(), >> it seems the xdp is pretty strict about the buffer owner, it need to >> have exclusive access to all the buffer. >> >> And it seems there is only one difference left then, with >> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and >> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about >> 'skb' only allocated from 'skbuff_cache', this patch looks good to me >> then. >> > > The difference between kmem_cache_free and kfree_skbmem lies in the > fact that kfree_skbmem checks whether the skb is an fclone (fast > clone) skb. If it is, it should be returned to the > skbuff_fclone_cache. Currently, fclone skbs can only be allocated > through __alloc_skb, and their head buffer is allocated by > kmalloc_reserve, which does not meet the condition mentioned above - > "2. It is not allocated using kmalloc.". Therefore, the fclone skb > will still be reallocated by veth_convert_skb_to_xdp_buff, leading to > the skb != skb_orig case. In other words, entering the > !skb->pp_recycle case indicates that the skb was allocated from > skbuff_cache. It might need some comment to make it clear or add some compile testing such as BUILD_BUG_ON() to ensure that, as it is not so obvious if someone change it to allocate a fclone skb with a frag head data in the future. Also I suppose the veth_xdp_rcv_skb() is called in NAPI context, and we might be able to reuse the 'skb' if we can use something like napi_skb_free_stolen_head().
On 01/08/2023 08.19, Liang Chen wrote: [...] > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 509e901da41d..ea1b344e5db4 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c [...] > @@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > goto out; > } > > + skb_orig = skb; > __skb_push(skb, skb->data - skb_mac_header(skb)); > if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb)) > goto drop; > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > case XDP_PASS: > break; > case XDP_TX: > - veth_xdp_get(xdp); > - consume_skb(skb); > - xdp->rxq->mem = rq->xdp_mem; > + if (skb != skb_orig) { > + xdp->rxq->mem = rq->xdp_mem_pp; > + kfree_skb_partial(skb, true); > + } else if (!skb->pp_recycle) { > + xdp->rxq->mem = rq->xdp_mem; > + kfree_skb_partial(skb, true); > + } else { > + veth_xdp_get(xdp); > + consume_skb(skb); > + xdp->rxq->mem = rq->xdp_mem; > + } > + Above code section, and below section looks the same. It begs for a common function. > if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) { > trace_xdp_exception(rq->dev, xdp_prog, act); > stats->rx_drops++; > @@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > rcu_read_unlock(); > goto xdp_xmit; > case XDP_REDIRECT: > - veth_xdp_get(xdp); > - consume_skb(skb); > - xdp->rxq->mem = rq->xdp_mem; > + if (skb != skb_orig) { > + xdp->rxq->mem = rq->xdp_mem_pp; > + kfree_skb_partial(skb, true); > + } else if (!skb->pp_recycle) { > + xdp->rxq->mem = rq->xdp_mem; > + kfree_skb_partial(skb, true); > + } else { > + veth_xdp_get(xdp); > + consume_skb(skb); > + xdp->rxq->mem = rq->xdp_mem; > + } > + The common function can be named to reflect what the purpose of this code section is. According to my understanding, the code steals the (packet) data section from the SKB and free the SKB. And prepare/associate the correct memory type in xdp_buff->rxq. Function name proposals: __skb_steal_data __free_skb_and_steal_data __free_skb_and_steal_data_for_xdp __free_skb_and_xdp_steal_data __skb2xdp_steal_data When doing this in a function, it will also allow us to add some comments explaining the different cases and assumptions. These assumptions can get broken as a result of (future) changes in surrounding the code, thus we need to explain our assumptions/intent (to help our future selves). For code readability, I think we should convert (skb != skb_orig) into a boolean that says what this case captures, e.g. local_pp_alloc. Func prototype: __skb2xdp_steal_data(skb, xdp, rq, bool local_pp_alloc) Always feel free to challenge my view, --Jesper
On Wed, Aug 9, 2023 at 8:35 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2023/8/9 18:01, Liang Chen wrote: > > On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > >> > >> On 2023/8/7 20:20, Liang Chen wrote: > >>> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > >>>> > >>>> On 2023/8/1 14:19, Liang Chen wrote: > >>>> > >>>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > >>>>> case XDP_PASS: > >>>>> break; > >>>>> case XDP_TX: > >>>>> - veth_xdp_get(xdp); > >>>>> - consume_skb(skb); > >>>>> - xdp->rxq->mem = rq->xdp_mem; > >>>>> + if (skb != skb_orig) { > >>>>> + xdp->rxq->mem = rq->xdp_mem_pp; > >>>>> + kfree_skb_partial(skb, true); > >>>> > >>>> For this case, I suppose that we can safely call kfree_skb_partial() > >>>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but > >>>> I am not sure about the !skb->pp_recycle case. > >>>> > >>>>> + } else if (!skb->pp_recycle) { > >>>>> + xdp->rxq->mem = rq->xdp_mem; > >>>>> + kfree_skb_partial(skb, true); > >>>> > >>>> For consume_skb(), there is skb_unref() checking and other checking/operation. > >>>> Can we really assume that we can call kfree_skb_partial() with head_stolen > >>>> being true? Is it possible that skb->users is bigger than 1? If it is possible, > >>>> don't we free the 'skb' back to skbuff_cache when other may still be using > >>>> it? > >>>> > >>> > >>> Thanks for raising the concern. If there are multiple references to > >>> the skb (skb->users is greater than 1), the skb will be reallocated in > >>> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig > >>> case. > >>> > >>> In fact, entering the !skb->pp_recycle case implies that the skb meets > >>> the following conditions: > >>> 1. It is neither shared nor cloned. > >>> 2. It is not allocated using kmalloc. > >>> 3. It does not have fragment data. > >>> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM. > >>> > >> > >> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(), > >> it seems the xdp is pretty strict about the buffer owner, it need to > >> have exclusive access to all the buffer. > >> > >> And it seems there is only one difference left then, with > >> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and > >> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about > >> 'skb' only allocated from 'skbuff_cache', this patch looks good to me > >> then. > >> > > > > The difference between kmem_cache_free and kfree_skbmem lies in the > > fact that kfree_skbmem checks whether the skb is an fclone (fast > > clone) skb. If it is, it should be returned to the > > skbuff_fclone_cache. Currently, fclone skbs can only be allocated > > through __alloc_skb, and their head buffer is allocated by > > kmalloc_reserve, which does not meet the condition mentioned above - > > "2. It is not allocated using kmalloc.". Therefore, the fclone skb > > will still be reallocated by veth_convert_skb_to_xdp_buff, leading to > > the skb != skb_orig case. In other words, entering the > > !skb->pp_recycle case indicates that the skb was allocated from > > skbuff_cache. > > It might need some comment to make it clear or add some compile testing > such as BUILD_BUG_ON() to ensure that, as it is not so obvious if > someone change it to allocate a fclone skb with a frag head data in > the future. > Sure. We will add a comment to explain that like below /* * We can safely use kfree_skb_partial here because this cannot be an fclone skb. * Fclone skbs are exclusively allocated via __alloc_skb, with their head buffer * allocated by kmalloc_reserve (so, skb->head_frag = 0), satisfying the * skb_head_is_locked condition in veth_convert_skb_to_xdp_buff, leading to skb * being reallocated. */ > Also I suppose the veth_xdp_rcv_skb() is called in NAPI context, and > we might be able to reuse the 'skb' if we can use something like > napi_skb_free_stolen_head(). Sure. Using napi_skb_free_stolen_head seems to be a good idea, it further accelerates the skb != skb_orig case in our primitive test. However, it's not suitable for the !skb->pp_recycle case, as the skb isn't allocated in the current NAPI context. Thanks, Liang
On Sat, Aug 12, 2023 at 2:16 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > On 01/08/2023 08.19, Liang Chen wrote: > [...] > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index 509e901da41d..ea1b344e5db4 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > [...] > > @@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > goto out; > > } > > > > + skb_orig = skb; > > __skb_push(skb, skb->data - skb_mac_header(skb)); > > if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb)) > > goto drop; > > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > case XDP_PASS: > > break; > > case XDP_TX: > > - veth_xdp_get(xdp); > > - consume_skb(skb); > > - xdp->rxq->mem = rq->xdp_mem; > > + if (skb != skb_orig) { > > + xdp->rxq->mem = rq->xdp_mem_pp; > > + kfree_skb_partial(skb, true); > > + } else if (!skb->pp_recycle) { > > + xdp->rxq->mem = rq->xdp_mem; > > + kfree_skb_partial(skb, true); > > + } else { > > + veth_xdp_get(xdp); > > + consume_skb(skb); > > + xdp->rxq->mem = rq->xdp_mem; > > + } > > + > > Above code section, and below section looks the same. > It begs for a common function. > > > if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) { > > trace_xdp_exception(rq->dev, xdp_prog, act); > > stats->rx_drops++; > > @@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > rcu_read_unlock(); > > goto xdp_xmit; > > case XDP_REDIRECT: > > - veth_xdp_get(xdp); > > - consume_skb(skb); > > - xdp->rxq->mem = rq->xdp_mem; > > + if (skb != skb_orig) { > > + xdp->rxq->mem = rq->xdp_mem_pp; > > + kfree_skb_partial(skb, true); > > + } else if (!skb->pp_recycle) { > > + xdp->rxq->mem = rq->xdp_mem; > > + kfree_skb_partial(skb, true); > > + } else { > > + veth_xdp_get(xdp); > > + consume_skb(skb); > > + xdp->rxq->mem = rq->xdp_mem; > > + } > > + > > The common function can be named to reflect what the purpose of this > code section is. According to my understanding, the code steals the > (packet) data section from the SKB and free the SKB. And > prepare/associate the correct memory type in xdp_buff->rxq. > > Function name proposals: > __skb_steal_data > __free_skb_and_steal_data > __free_skb_and_steal_data_for_xdp > __free_skb_and_xdp_steal_data > __skb2xdp_steal_data > > When doing this in a function, it will also allow us to add some > comments explaining the different cases and assumptions. These > assumptions can get broken as a result of (future) changes in > surrounding the code, thus we need to explain our assumptions/intent (to > help our future selves). > > For code readability, I think we should convert (skb != skb_orig) into a > boolean that says what this case captures, e.g. local_pp_alloc. > > Func prototype: > __skb2xdp_steal_data(skb, xdp, rq, bool local_pp_alloc) > > > Always feel free to challenge my view, > --Jesper > Thank you for the detailed suggestion! We will move the code into a function as you pointed out and comment on each of the different cases to explain our assumption and intent. Thanks, Liang
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 509e901da41d..ea1b344e5db4 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -62,6 +62,7 @@ struct veth_rq { struct net_device *dev; struct bpf_prog __rcu *xdp_prog; struct xdp_mem_info xdp_mem; + struct xdp_mem_info xdp_mem_pp; struct veth_rq_stats stats; bool rx_notify_masked; struct ptr_ring xdp_ring; @@ -836,6 +837,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct bpf_prog *xdp_prog; struct veth_xdp_buff vxbuf; struct xdp_buff *xdp = &vxbuf.xdp; + struct sk_buff *skb_orig; u32 act, metalen; int off; @@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, goto out; } + skb_orig = skb; __skb_push(skb, skb->data - skb_mac_header(skb)); if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb)) goto drop; @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, case XDP_PASS: break; case XDP_TX: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem = rq->xdp_mem; + if (skb != skb_orig) { + xdp->rxq->mem = rq->xdp_mem_pp; + kfree_skb_partial(skb, true); + } else if (!skb->pp_recycle) { + xdp->rxq->mem = rq->xdp_mem; + kfree_skb_partial(skb, true); + } else { + veth_xdp_get(xdp); + consume_skb(skb); + xdp->rxq->mem = rq->xdp_mem; + } + if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) { trace_xdp_exception(rq->dev, xdp_prog, act); stats->rx_drops++; @@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem = rq->xdp_mem; + if (skb != skb_orig) { + xdp->rxq->mem = rq->xdp_mem_pp; + kfree_skb_partial(skb, true); + } else if (!skb->pp_recycle) { + xdp->rxq->mem = rq->xdp_mem; + kfree_skb_partial(skb, true); + } else { + veth_xdp_get(xdp); + consume_skb(skb); + xdp->rxq->mem = rq->xdp_mem; + } + if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { stats->rx_drops++; goto err_xdp; @@ -1061,6 +1082,14 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) goto err_page_pool; } + for (i = start; i < end; i++) { + err = xdp_reg_mem_model(&priv->rq[i].xdp_mem_pp, + MEM_TYPE_PAGE_POOL, + priv->rq[i].page_pool); + if (err) + goto err_reg_mem; + } + for (i = start; i < end; i++) { struct veth_rq *rq = &priv->rq[i]; @@ -1082,6 +1111,10 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) for (i--; i >= start; i--) ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); i = end; +err_reg_mem: + for (i--; i >= start; i--) + xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp); + i = end; err_page_pool: for (i--; i >= start; i--) { page_pool_destroy(priv->rq[i].page_pool); @@ -1117,6 +1150,9 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free); } + for (i = start; i < end; i++) + xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp); + for (i = start; i < end; i++) { page_pool_destroy(priv->rq[i].page_pool); priv->rq[i].page_pool = NULL;
Page pool is supported for veth. But for XDP_TX and XDP_REDIRECT cases, the pages are not effectively recycled. "ethtool -S" statistics for the page pool are as follows: NIC statistics: rx_pp_alloc_fast: 18041186 rx_pp_alloc_slow: 286369 rx_pp_recycle_ring: 0 rx_pp_recycle_released_ref: 18327555 This failure to recycle page pool pages is a result of the code snippet below, which converts page pool pages into regular pages and releases the skb data structure: veth_xdp_get(xdp); consume_skb(skb); The reason behind is some skbs received from the veth peer are not page pool pages, and remain so after conversion to xdp frame. In order to not confusing __xdp_return with mixed regular pages and page pool pages, they are all converted to regular pages. So registering xdp memory model as MEM_TYPE_PAGE_SHARED is sufficient. If we replace the above code with kfree_skb_partial, directly releasing the skb data structure, we can retain the original page pool page behavior. However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is not a solution as explained above. Therefore, we introduced an additionally MEM_TYPE_PAGE_POOL model for each rq. In addition, to avoid mixing up pages from page pools with different xdp_mem_id, page pool pages directly coming from the peers are still converted into regular pages. This is not common, as most of the time, they will be reallocated in veth_convert_skb_to_xdp_buff. The following tests were conducted using pktgen to generate traffic and evaluate the performance improvement after page pool pages get successfully recycled in scenarios involving XDP_TX, XDP_REDIRECT, and AF_XDP. Test environment setup: ns1 ns2 veth0 <-peer-> veth1 veth2 <-peer-> veth3 Test Results: pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP) without PP recycle: 1,780,392 with PP recycle: 1,984,680 improvement: ~10% pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS) without PP recycle: 1,433,491 with PP recycle: 1,511,680 improvement: 5~6% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP) without PP recycle: 1,527,708 with PP recycle: 1,672,101 improvement: ~10% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS) without PP recycle: 1,325,804 with PP recycle: 1,392,704 improvement: ~5.5% pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP) without PP recycle: 1,607,609 with PP recycle: 1,736,957 improvement: ~8% Additionally, the performance improvement were measured when converting to xdp_buff doesn't require buffer copy and original skb uses regular pages, i.e. page pool recycle not involved. This still gives around 2% improvement attributed to the changes from consume_skb to kfree_skb_partial. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> --- Changes from v1: - pp pages from the peers are still converted into regular pages. --- drivers/net/veth.c | 48 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-)