diff mbox series

[RFC,net-next,v2,2/2] net: veth: Improving page pool pages recycling

Message ID 20230801061932.10335-2-liangchen.linux@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next,v2,1/2] net: veth: Page pool creation error handling for existing pools only | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers warning 2 maintainers not CCed: bpf@vger.kernel.org john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 96 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Liang Chen Aug. 1, 2023, 6:19 a.m. UTC
Page pool is supported for veth. But for XDP_TX and XDP_REDIRECT cases,
the pages are not effectively recycled. "ethtool -S" statistics for the
page pool are as follows:

NIC statistics:
rx_pp_alloc_fast: 18041186
rx_pp_alloc_slow: 286369
rx_pp_recycle_ring: 0
rx_pp_recycle_released_ref: 18327555

This failure to recycle page pool pages is a result of the code snippet
below, which converts page pool pages into regular pages and releases
the skb data structure:

veth_xdp_get(xdp);
consume_skb(skb);

The reason behind is some skbs received from the veth peer are not page
pool pages, and remain so after conversion to xdp frame. In order to not
confusing __xdp_return with mixed regular pages and page pool pages, they
are all converted to regular pages. So registering xdp memory model as
MEM_TYPE_PAGE_SHARED is sufficient.

If we replace the above code with kfree_skb_partial, directly releasing
the skb data structure, we can retain the original page pool page behavior.
However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is
not a solution as explained above. Therefore, we introduced an additionally
MEM_TYPE_PAGE_POOL model for each rq.

In addition, to avoid mixing up pages from page pools with different
xdp_mem_id, page pool pages directly coming from the peers are still
converted into regular pages. This is not common, as most of the time,
they will be reallocated in veth_convert_skb_to_xdp_buff.

The following tests were conducted using pktgen to generate traffic and
evaluate the performance improvement after page pool pages get successfully
recycled in scenarios involving XDP_TX, XDP_REDIRECT, and AF_XDP.

Test environment setup:
ns1                 ns2
veth0   <-peer->    veth1
veth2   <-peer->    veth3

Test Results:
pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP)
    without PP recycle: 1,780,392
    with PP recycle: 1,984,680
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS)
    without PP recycle: 1,433,491
    with PP recycle: 1,511,680
    improvement: 5~6%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP)
    without PP recycle: 1,527,708
    with PP recycle: 1,672,101
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS)
    without PP recycle: 1,325,804
    with PP recycle: 1,392,704
    improvement: ~5.5%

pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP)
    without PP recycle: 1,607,609
    with PP recycle: 1,736,957
    improvement: ~8%

Additionally, the performance improvement were measured when converting to
xdp_buff doesn't require buffer copy and original skb uses regular pages,
i.e. page pool recycle not involved. This still gives around 2% improvement
attributed to the changes from consume_skb to kfree_skb_partial.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 Changes from v1:
- pp pages from the peers are still converted into regular pages.
---
 drivers/net/veth.c | 48 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

Comments

Yunsheng Lin Aug. 2, 2023, 12:32 p.m. UTC | #1
On 2023/8/1 14:19, Liang Chen wrote:

> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>  	case XDP_PASS:
>  		break;
>  	case XDP_TX:
> -		veth_xdp_get(xdp);
> -		consume_skb(skb);
> -		xdp->rxq->mem = rq->xdp_mem;
> +		if (skb != skb_orig) {
> +			xdp->rxq->mem = rq->xdp_mem_pp;
> +			kfree_skb_partial(skb, true);

For this case, I suppose that we can safely call kfree_skb_partial()
as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
I am not sure about the !skb->pp_recycle case.

> +		} else if (!skb->pp_recycle) {
> +			xdp->rxq->mem = rq->xdp_mem;
> +			kfree_skb_partial(skb, true);

For consume_skb(), there is skb_unref() checking and other checking/operation.
Can we really assume that we can call kfree_skb_partial() with head_stolen
being true? Is it possible that skb->users is bigger than 1? If it is possible,
don't we free the 'skb' back to skbuff_cache when other may still be using
it?

> +		} else {
> +			veth_xdp_get(xdp);
> +			consume_skb(skb);
> +			xdp->rxq->mem = rq->xdp_mem;
> +		}
> +
Liang Chen Aug. 7, 2023, 12:20 p.m. UTC | #2
On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/8/1 14:19, Liang Chen wrote:
>
> > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >       case XDP_PASS:
> >               break;
> >       case XDP_TX:
> > -             veth_xdp_get(xdp);
> > -             consume_skb(skb);
> > -             xdp->rxq->mem = rq->xdp_mem;
> > +             if (skb != skb_orig) {
> > +                     xdp->rxq->mem = rq->xdp_mem_pp;
> > +                     kfree_skb_partial(skb, true);
>
> For this case, I suppose that we can safely call kfree_skb_partial()
> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
> I am not sure about the !skb->pp_recycle case.
>
> > +             } else if (!skb->pp_recycle) {
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +                     kfree_skb_partial(skb, true);
>
> For consume_skb(), there is skb_unref() checking and other checking/operation.
> Can we really assume that we can call kfree_skb_partial() with head_stolen
> being true? Is it possible that skb->users is bigger than 1? If it is possible,
> don't we free the 'skb' back to skbuff_cache when other may still be using
> it?
>

Thanks for raising the concern. If there are multiple references to
the skb (skb->users is greater than 1), the skb will be reallocated in
veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
case.

In fact, entering the !skb->pp_recycle case implies that the skb meets
the following conditions:
1. It is neither shared nor cloned.
2. It is not allocated using kmalloc.
3. It does not have fragment data.
4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.


Thanks,
Liang

> > +             } else {
> > +                     veth_xdp_get(xdp);
> > +                     consume_skb(skb);
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +             }
> > +
>
Yunsheng Lin Aug. 8, 2023, 11:16 a.m. UTC | #3
On 2023/8/7 20:20, Liang Chen wrote:
> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/8/1 14:19, Liang Chen wrote:
>>
>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>       case XDP_PASS:
>>>               break;
>>>       case XDP_TX:
>>> -             veth_xdp_get(xdp);
>>> -             consume_skb(skb);
>>> -             xdp->rxq->mem = rq->xdp_mem;
>>> +             if (skb != skb_orig) {
>>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
>>> +                     kfree_skb_partial(skb, true);
>>
>> For this case, I suppose that we can safely call kfree_skb_partial()
>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
>> I am not sure about the !skb->pp_recycle case.
>>
>>> +             } else if (!skb->pp_recycle) {
>>> +                     xdp->rxq->mem = rq->xdp_mem;
>>> +                     kfree_skb_partial(skb, true);
>>
>> For consume_skb(), there is skb_unref() checking and other checking/operation.
>> Can we really assume that we can call kfree_skb_partial() with head_stolen
>> being true? Is it possible that skb->users is bigger than 1? If it is possible,
>> don't we free the 'skb' back to skbuff_cache when other may still be using
>> it?
>>
> 
> Thanks for raising the concern. If there are multiple references to
> the skb (skb->users is greater than 1), the skb will be reallocated in
> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
> case.
> 
> In fact, entering the !skb->pp_recycle case implies that the skb meets
> the following conditions:
> 1. It is neither shared nor cloned.
> 2. It is not allocated using kmalloc.
> 3. It does not have fragment data.
> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
> 

You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
it seems the xdp is pretty strict about the buffer owner, it need to
have exclusive access to all the buffer.

And it seems there is only one difference left then, with
kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
'skb' only allocated from 'skbuff_cache', this patch looks good to me
then.

>
Liang Chen Aug. 9, 2023, 10:01 a.m. UTC | #4
On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/8/7 20:20, Liang Chen wrote:
> > On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2023/8/1 14:19, Liang Chen wrote:
> >>
> >>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >>>       case XDP_PASS:
> >>>               break;
> >>>       case XDP_TX:
> >>> -             veth_xdp_get(xdp);
> >>> -             consume_skb(skb);
> >>> -             xdp->rxq->mem = rq->xdp_mem;
> >>> +             if (skb != skb_orig) {
> >>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
> >>> +                     kfree_skb_partial(skb, true);
> >>
> >> For this case, I suppose that we can safely call kfree_skb_partial()
> >> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
> >> I am not sure about the !skb->pp_recycle case.
> >>
> >>> +             } else if (!skb->pp_recycle) {
> >>> +                     xdp->rxq->mem = rq->xdp_mem;
> >>> +                     kfree_skb_partial(skb, true);
> >>
> >> For consume_skb(), there is skb_unref() checking and other checking/operation.
> >> Can we really assume that we can call kfree_skb_partial() with head_stolen
> >> being true? Is it possible that skb->users is bigger than 1? If it is possible,
> >> don't we free the 'skb' back to skbuff_cache when other may still be using
> >> it?
> >>
> >
> > Thanks for raising the concern. If there are multiple references to
> > the skb (skb->users is greater than 1), the skb will be reallocated in
> > veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
> > case.
> >
> > In fact, entering the !skb->pp_recycle case implies that the skb meets
> > the following conditions:
> > 1. It is neither shared nor cloned.
> > 2. It is not allocated using kmalloc.
> > 3. It does not have fragment data.
> > 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
> >
>
> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
> it seems the xdp is pretty strict about the buffer owner, it need to
> have exclusive access to all the buffer.
>
> And it seems there is only one difference left then, with
> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
> 'skb' only allocated from 'skbuff_cache', this patch looks good to me
> then.
>

The difference between kmem_cache_free and kfree_skbmem lies in the
fact that kfree_skbmem checks whether the skb is an fclone (fast
clone) skb. If it is, it should be returned to the
skbuff_fclone_cache. Currently, fclone skbs can only be allocated
through __alloc_skb, and their head buffer is allocated by
kmalloc_reserve, which does not meet the condition mentioned above -
"2. It is not allocated using kmalloc.". Therefore, the fclone skb
will still be reallocated by veth_convert_skb_to_xdp_buff, leading to
the skb != skb_orig case. In other words, entering the
!skb->pp_recycle case indicates that the skb was allocated from
skbuff_cache.

Thanks for the review,
Liang

> >
Yunsheng Lin Aug. 9, 2023, 12:35 p.m. UTC | #5
On 2023/8/9 18:01, Liang Chen wrote:
> On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/8/7 20:20, Liang Chen wrote:
>>> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>>
>>>> On 2023/8/1 14:19, Liang Chen wrote:
>>>>
>>>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>>>       case XDP_PASS:
>>>>>               break;
>>>>>       case XDP_TX:
>>>>> -             veth_xdp_get(xdp);
>>>>> -             consume_skb(skb);
>>>>> -             xdp->rxq->mem = rq->xdp_mem;
>>>>> +             if (skb != skb_orig) {
>>>>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
>>>>> +                     kfree_skb_partial(skb, true);
>>>>
>>>> For this case, I suppose that we can safely call kfree_skb_partial()
>>>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
>>>> I am not sure about the !skb->pp_recycle case.
>>>>
>>>>> +             } else if (!skb->pp_recycle) {
>>>>> +                     xdp->rxq->mem = rq->xdp_mem;
>>>>> +                     kfree_skb_partial(skb, true);
>>>>
>>>> For consume_skb(), there is skb_unref() checking and other checking/operation.
>>>> Can we really assume that we can call kfree_skb_partial() with head_stolen
>>>> being true? Is it possible that skb->users is bigger than 1? If it is possible,
>>>> don't we free the 'skb' back to skbuff_cache when other may still be using
>>>> it?
>>>>
>>>
>>> Thanks for raising the concern. If there are multiple references to
>>> the skb (skb->users is greater than 1), the skb will be reallocated in
>>> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
>>> case.
>>>
>>> In fact, entering the !skb->pp_recycle case implies that the skb meets
>>> the following conditions:
>>> 1. It is neither shared nor cloned.
>>> 2. It is not allocated using kmalloc.
>>> 3. It does not have fragment data.
>>> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
>>>
>>
>> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
>> it seems the xdp is pretty strict about the buffer owner, it need to
>> have exclusive access to all the buffer.
>>
>> And it seems there is only one difference left then, with
>> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
>> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
>> 'skb' only allocated from 'skbuff_cache', this patch looks good to me
>> then.
>>
> 
> The difference between kmem_cache_free and kfree_skbmem lies in the
> fact that kfree_skbmem checks whether the skb is an fclone (fast
> clone) skb. If it is, it should be returned to the
> skbuff_fclone_cache. Currently, fclone skbs can only be allocated
> through __alloc_skb, and their head buffer is allocated by
> kmalloc_reserve, which does not meet the condition mentioned above -
> "2. It is not allocated using kmalloc.". Therefore, the fclone skb
> will still be reallocated by veth_convert_skb_to_xdp_buff, leading to
> the skb != skb_orig case. In other words, entering the
> !skb->pp_recycle case indicates that the skb was allocated from
> skbuff_cache.

It might need some comment to make it clear or add some compile testing
such as BUILD_BUG_ON() to ensure that, as it is not so obvious if
someone change it to allocate a fclone skb with a frag head data in
the future.

Also I suppose the veth_xdp_rcv_skb() is called in NAPI context, and
we might be able to reuse the 'skb' if we can use something like
napi_skb_free_stolen_head().
Jesper Dangaard Brouer Aug. 11, 2023, 6:16 p.m. UTC | #6
On 01/08/2023 08.19, Liang Chen wrote:
[...]
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 509e901da41d..ea1b344e5db4 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
[...]
> @@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   		goto out;
>   	}
>   
> +	skb_orig = skb;
>   	__skb_push(skb, skb->data - skb_mac_header(skb));
>   	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>   		goto drop;
> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   	case XDP_PASS:
>   		break;
>   	case XDP_TX:
> -		veth_xdp_get(xdp);
> -		consume_skb(skb);
> -		xdp->rxq->mem = rq->xdp_mem;
> +		if (skb != skb_orig) {
> +			xdp->rxq->mem = rq->xdp_mem_pp;
> +			kfree_skb_partial(skb, true);
> +		} else if (!skb->pp_recycle) {
> +			xdp->rxq->mem = rq->xdp_mem;
> +			kfree_skb_partial(skb, true);
> +		} else {
> +			veth_xdp_get(xdp);
> +			consume_skb(skb);
> +			xdp->rxq->mem = rq->xdp_mem;
> +		}
> +

Above code section, and below section looks the same.
It begs for a common function.

>   		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
>   			trace_xdp_exception(rq->dev, xdp_prog, act);
>   			stats->rx_drops++;
> @@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   		rcu_read_unlock();
>   		goto xdp_xmit;
>   	case XDP_REDIRECT:
> -		veth_xdp_get(xdp);
> -		consume_skb(skb);
> -		xdp->rxq->mem = rq->xdp_mem;
> +		if (skb != skb_orig) {
> +			xdp->rxq->mem = rq->xdp_mem_pp;
> +			kfree_skb_partial(skb, true);
> +		} else if (!skb->pp_recycle) {
> +			xdp->rxq->mem = rq->xdp_mem;
> +			kfree_skb_partial(skb, true);
> +		} else {
> +			veth_xdp_get(xdp);
> +			consume_skb(skb);
> +			xdp->rxq->mem = rq->xdp_mem;
> +		}
> +

The common function can be named to reflect what the purpose of this 
code section is.  According to my understanding, the code steals the 
(packet) data section from the SKB and free the SKB.  And 
prepare/associate the correct memory type in xdp_buff->rxq.

Function name proposals:
  __skb_steal_data
  __free_skb_and_steal_data
  __free_skb_and_steal_data_for_xdp
  __free_skb_and_xdp_steal_data
  __skb2xdp_steal_data

When doing this in a function, it will also allow us to add some 
comments explaining the different cases and assumptions.  These 
assumptions can get broken as a result of (future) changes in 
surrounding the code, thus we need to explain our assumptions/intent (to 
help our future selves).

For code readability, I think we should convert (skb != skb_orig) into a 
boolean that says what this case captures, e.g. local_pp_alloc.

Func prototype:
  __skb2xdp_steal_data(skb, xdp, rq, bool local_pp_alloc)


Always feel free to challenge my view,
--Jesper
Liang Chen Aug. 12, 2023, 1:52 a.m. UTC | #7
On Wed, Aug 9, 2023 at 8:35 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/8/9 18:01, Liang Chen wrote:
> > On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2023/8/7 20:20, Liang Chen wrote:
> >>> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>>>
> >>>> On 2023/8/1 14:19, Liang Chen wrote:
> >>>>
> >>>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >>>>>       case XDP_PASS:
> >>>>>               break;
> >>>>>       case XDP_TX:
> >>>>> -             veth_xdp_get(xdp);
> >>>>> -             consume_skb(skb);
> >>>>> -             xdp->rxq->mem = rq->xdp_mem;
> >>>>> +             if (skb != skb_orig) {
> >>>>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
> >>>>> +                     kfree_skb_partial(skb, true);
> >>>>
> >>>> For this case, I suppose that we can safely call kfree_skb_partial()
> >>>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
> >>>> I am not sure about the !skb->pp_recycle case.
> >>>>
> >>>>> +             } else if (!skb->pp_recycle) {
> >>>>> +                     xdp->rxq->mem = rq->xdp_mem;
> >>>>> +                     kfree_skb_partial(skb, true);
> >>>>
> >>>> For consume_skb(), there is skb_unref() checking and other checking/operation.
> >>>> Can we really assume that we can call kfree_skb_partial() with head_stolen
> >>>> being true? Is it possible that skb->users is bigger than 1? If it is possible,
> >>>> don't we free the 'skb' back to skbuff_cache when other may still be using
> >>>> it?
> >>>>
> >>>
> >>> Thanks for raising the concern. If there are multiple references to
> >>> the skb (skb->users is greater than 1), the skb will be reallocated in
> >>> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
> >>> case.
> >>>
> >>> In fact, entering the !skb->pp_recycle case implies that the skb meets
> >>> the following conditions:
> >>> 1. It is neither shared nor cloned.
> >>> 2. It is not allocated using kmalloc.
> >>> 3. It does not have fragment data.
> >>> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
> >>>
> >>
> >> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
> >> it seems the xdp is pretty strict about the buffer owner, it need to
> >> have exclusive access to all the buffer.
> >>
> >> And it seems there is only one difference left then, with
> >> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
> >> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
> >> 'skb' only allocated from 'skbuff_cache', this patch looks good to me
> >> then.
> >>
> >
> > The difference between kmem_cache_free and kfree_skbmem lies in the
> > fact that kfree_skbmem checks whether the skb is an fclone (fast
> > clone) skb. If it is, it should be returned to the
> > skbuff_fclone_cache. Currently, fclone skbs can only be allocated
> > through __alloc_skb, and their head buffer is allocated by
> > kmalloc_reserve, which does not meet the condition mentioned above -
> > "2. It is not allocated using kmalloc.". Therefore, the fclone skb
> > will still be reallocated by veth_convert_skb_to_xdp_buff, leading to
> > the skb != skb_orig case. In other words, entering the
> > !skb->pp_recycle case indicates that the skb was allocated from
> > skbuff_cache.
>
> It might need some comment to make it clear or add some compile testing
> such as BUILD_BUG_ON() to ensure that, as it is not so obvious if
> someone change it to allocate a fclone skb with a frag head data in
> the future.
>

Sure. We will add a comment to explain that like below
/*
 * We can safely use kfree_skb_partial here because this cannot be an
fclone skb.
 * Fclone skbs are exclusively allocated via __alloc_skb, with their head buffer
 * allocated by kmalloc_reserve (so, skb->head_frag = 0), satisfying the
 * skb_head_is_locked condition in veth_convert_skb_to_xdp_buff, leading to skb
 * being reallocated.
 */

> Also I suppose the veth_xdp_rcv_skb() is called in NAPI context, and
> we might be able to reuse the 'skb' if we can use something like
> napi_skb_free_stolen_head().

Sure. Using napi_skb_free_stolen_head seems to be a good idea, it
further accelerates the skb != skb_orig case in our primitive test.
However, it's not suitable for the !skb->pp_recycle case, as the skb
isn't allocated in the current NAPI context.

Thanks,
Liang
Liang Chen Aug. 12, 2023, 2:02 a.m. UTC | #8
On Sat, Aug 12, 2023 at 2:16 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 01/08/2023 08.19, Liang Chen wrote:
> [...]
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 509e901da41d..ea1b344e5db4 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> [...]
> > @@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >               goto out;
> >       }
> >
> > +     skb_orig = skb;
> >       __skb_push(skb, skb->data - skb_mac_header(skb));
> >       if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
> >               goto drop;
> > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >       case XDP_PASS:
> >               break;
> >       case XDP_TX:
> > -             veth_xdp_get(xdp);
> > -             consume_skb(skb);
> > -             xdp->rxq->mem = rq->xdp_mem;
> > +             if (skb != skb_orig) {
> > +                     xdp->rxq->mem = rq->xdp_mem_pp;
> > +                     kfree_skb_partial(skb, true);
> > +             } else if (!skb->pp_recycle) {
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +                     kfree_skb_partial(skb, true);
> > +             } else {
> > +                     veth_xdp_get(xdp);
> > +                     consume_skb(skb);
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +             }
> > +
>
> Above code section, and below section looks the same.
> It begs for a common function.
>
> >               if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
> >                       trace_xdp_exception(rq->dev, xdp_prog, act);
> >                       stats->rx_drops++;
> > @@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >               rcu_read_unlock();
> >               goto xdp_xmit;
> >       case XDP_REDIRECT:
> > -             veth_xdp_get(xdp);
> > -             consume_skb(skb);
> > -             xdp->rxq->mem = rq->xdp_mem;
> > +             if (skb != skb_orig) {
> > +                     xdp->rxq->mem = rq->xdp_mem_pp;
> > +                     kfree_skb_partial(skb, true);
> > +             } else if (!skb->pp_recycle) {
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +                     kfree_skb_partial(skb, true);
> > +             } else {
> > +                     veth_xdp_get(xdp);
> > +                     consume_skb(skb);
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +             }
> > +
>
> The common function can be named to reflect what the purpose of this
> code section is.  According to my understanding, the code steals the
> (packet) data section from the SKB and free the SKB.  And
> prepare/associate the correct memory type in xdp_buff->rxq.
>
> Function name proposals:
>   __skb_steal_data
>   __free_skb_and_steal_data
>   __free_skb_and_steal_data_for_xdp
>   __free_skb_and_xdp_steal_data
>   __skb2xdp_steal_data
>
> When doing this in a function, it will also allow us to add some
> comments explaining the different cases and assumptions.  These
> assumptions can get broken as a result of (future) changes in
> surrounding the code, thus we need to explain our assumptions/intent (to
> help our future selves).
>
> For code readability, I think we should convert (skb != skb_orig) into a
> boolean that says what this case captures, e.g. local_pp_alloc.
>
> Func prototype:
>   __skb2xdp_steal_data(skb, xdp, rq, bool local_pp_alloc)
>
>
> Always feel free to challenge my view,
> --Jesper
>

Thank you for the detailed suggestion! We will move the code into a
function as you pointed out and comment on each of the different cases
to explain our assumption and intent.

Thanks,
Liang
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 509e901da41d..ea1b344e5db4 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -62,6 +62,7 @@  struct veth_rq {
 	struct net_device	*dev;
 	struct bpf_prog __rcu	*xdp_prog;
 	struct xdp_mem_info	xdp_mem;
+	struct xdp_mem_info	xdp_mem_pp;
 	struct veth_rq_stats	stats;
 	bool			rx_notify_masked;
 	struct ptr_ring		xdp_ring;
@@ -836,6 +837,7 @@  static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	struct bpf_prog *xdp_prog;
 	struct veth_xdp_buff vxbuf;
 	struct xdp_buff *xdp = &vxbuf.xdp;
+	struct sk_buff *skb_orig;
 	u32 act, metalen;
 	int off;
 
@@ -848,6 +850,7 @@  static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		goto out;
 	}
 
+	skb_orig = skb;
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
 		goto drop;
@@ -862,9 +865,18 @@  static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	case XDP_PASS:
 		break;
 	case XDP_TX:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		if (skb != skb_orig) {
+			xdp->rxq->mem = rq->xdp_mem_pp;
+			kfree_skb_partial(skb, true);
+		} else if (!skb->pp_recycle) {
+			xdp->rxq->mem = rq->xdp_mem;
+			kfree_skb_partial(skb, true);
+		} else {
+			veth_xdp_get(xdp);
+			consume_skb(skb);
+			xdp->rxq->mem = rq->xdp_mem;
+		}
+
 		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 			trace_xdp_exception(rq->dev, xdp_prog, act);
 			stats->rx_drops++;
@@ -874,9 +886,18 @@  static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		rcu_read_unlock();
 		goto xdp_xmit;
 	case XDP_REDIRECT:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		if (skb != skb_orig) {
+			xdp->rxq->mem = rq->xdp_mem_pp;
+			kfree_skb_partial(skb, true);
+		} else if (!skb->pp_recycle) {
+			xdp->rxq->mem = rq->xdp_mem;
+			kfree_skb_partial(skb, true);
+		} else {
+			veth_xdp_get(xdp);
+			consume_skb(skb);
+			xdp->rxq->mem = rq->xdp_mem;
+		}
+
 		if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 			stats->rx_drops++;
 			goto err_xdp;
@@ -1061,6 +1082,14 @@  static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 			goto err_page_pool;
 	}
 
+	for (i = start; i < end; i++) {
+		err = xdp_reg_mem_model(&priv->rq[i].xdp_mem_pp,
+					MEM_TYPE_PAGE_POOL,
+					priv->rq[i].page_pool);
+		if (err)
+			goto err_reg_mem;
+	}
+
 	for (i = start; i < end; i++) {
 		struct veth_rq *rq = &priv->rq[i];
 
@@ -1082,6 +1111,10 @@  static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 	for (i--; i >= start; i--)
 		ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free);
 	i = end;
+err_reg_mem:
+	for (i--; i >= start; i--)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+	i = end;
 err_page_pool:
 	for (i--; i >= start; i--) {
 		page_pool_destroy(priv->rq[i].page_pool);
@@ -1117,6 +1150,9 @@  static void veth_napi_del_range(struct net_device *dev, int start, int end)
 		ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free);
 	}
 
+	for (i = start; i < end; i++)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+
 	for (i = start; i < end; i++) {
 		page_pool_destroy(priv->rq[i].page_pool);
 		priv->rq[i].page_pool = NULL;