diff mbox series

[v4,net-next,1/9] i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb

Message ID 20211208140702.642741-2-alexandr.lobakin@intel.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series net: intel: napi_alloc_skb() vs metadata | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 17 of 17 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Alexander Lobakin Dec. 8, 2021, 2:06 p.m. UTC
{__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD
+ NET_IP_ALIGN for any skb.
OTOH, i40e_construct_skb_zc() currently allocates and reserves
additional `xdp->data - xdp->data_hard_start`, which is
XDP_PACKET_HEADROOM for XSK frames.
There's no need for that at all as the frame is post-XDP and will
go only to the networking stack core.
Pass the size of the actual data only to __napi_alloc_skb() and
don't reserve anything. This will give enough headroom for stack
processing.

Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Jesper Dangaard Brouer Dec. 9, 2021, 8:19 a.m. UTC | #1
On 08/12/2021 15.06, Alexander Lobakin wrote:
> {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD
> + NET_IP_ALIGN for any skb.
> OTOH, i40e_construct_skb_zc() currently allocates and reserves
> additional `xdp->data - xdp->data_hard_start`, which is
> XDP_PACKET_HEADROOM for XSK frames.
> There's no need for that at all as the frame is post-XDP and will
> go only to the networking stack core.

I disagree with this assumption, that headroom is not needed by netstack.
Why "no need for that at all" for netstack?

Having headroom is important for netstack in general.  When packet will 
grow we avoid realloc of SKB.  Use-case could also be cpumap or veth 
redirect, or XDP-generic, that expect this headroom.


> Pass the size of the actual data only to __napi_alloc_skb() and
> don't reserve anything. This will give enough headroom for stack
> processing.
> 
> Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index f08d19b8c554..9564906b7da8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -245,13 +245,11 @@ static struct sk_buff *i40e_construct_skb_zc(struct i40e_ring *rx_ring,
>   	struct sk_buff *skb;
>   
>   	/* allocate a skb to store the frags */
> -	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
> -			       xdp->data_end - xdp->data_hard_start,
> +	skb = __napi_alloc_skb(&rx_ring->q_vector->napi, datasize,
>   			       GFP_ATOMIC | __GFP_NOWARN);
>   	if (unlikely(!skb))
>   		goto out;
>   
> -	skb_reserve(skb, xdp->data - xdp->data_hard_start);
>   	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
>   	if (metasize)
>   		skb_metadata_set(skb, metasize);
>
Alexander Lobakin Dec. 9, 2021, 5:33 p.m. UTC | #2
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
Date: Thu, 9 Dec 2021 09:19:46 +0100

> On 08/12/2021 15.06, Alexander Lobakin wrote:
> > {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD
> > + NET_IP_ALIGN for any skb.
> > OTOH, i40e_construct_skb_zc() currently allocates and reserves
> > additional `xdp->data - xdp->data_hard_start`, which is
> > XDP_PACKET_HEADROOM for XSK frames.
> > There's no need for that at all as the frame is post-XDP and will
> > go only to the networking stack core.
> 
> I disagree with this assumption, that headroom is not needed by netstack.
> Why "no need for that at all" for netstack?

napi_alloc_skb() in our particular case will reserve 64 bytes, it is
sufficient for {TCP,UDP,SCTP,...}/IPv{4,6} etc.

> 
> Having headroom is important for netstack in general.  When packet will 
> grow we avoid realloc of SKB.  Use-case could also be cpumap or veth 
> redirect, or XDP-generic, that expect this headroom.

Well, those are not common cases at all.
Allocating 256 bytes more for some hypothetical usecases (and having
320 in total) is more expensive than expanding headroom in-place.
I don't know any other drivers or ifaces which reserve
XDP_PACKET_HEADROOM just for the case of using both driver-side
and generic XDP at the same time. To be more precise, I can't
remember any driver which would check whether generic XDP is enabled
for its netdev(s).

As a second option, I was trying to get exactly XDP_PACKET_HEADROOM
of headroom, but this involves either __alloc_skb() which is slower
than napi_alloc_skb(), or

	skb = napi_alloc_skb(napi, xdp->data_end -
				   xdp->data_hard_start -
				   NET_SKB_PAD);
	skb_reserve(skb, xdp->data_meta - xdp->data_hard_start -
			 NET_SKB_PAD);

Doesn't look good for me.

We could probably introduce a version of napi_alloc_skb() which
wouldn't reserve any headroom for you to have more control over it,
but that's more global material than these local fixes I'd say.

> 
> 
> > Pass the size of the actual data only to __napi_alloc_skb() and
> > don't reserve anything. This will give enough headroom for stack
> > processing.
> > 
> > Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
> > Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > ---
> >   drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 +---
> >   1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > index f08d19b8c554..9564906b7da8 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > @@ -245,13 +245,11 @@ static struct sk_buff *i40e_construct_skb_zc(struct i40e_ring *rx_ring,
> >   	struct sk_buff *skb;
> >   
> >   	/* allocate a skb to store the frags */
> > -	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
> > -			       xdp->data_end - xdp->data_hard_start,
> > +	skb = __napi_alloc_skb(&rx_ring->q_vector->napi, datasize,
> >   			       GFP_ATOMIC | __GFP_NOWARN);
> >   	if (unlikely(!skb))
> >   		goto out;
> >   
> > -	skb_reserve(skb, xdp->data - xdp->data_hard_start);
> >   	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
> >   	if (metasize)
> >   		skb_metadata_set(skb, metasize);

Thanks,
Al
Jesper Dangaard Brouer Dec. 10, 2021, 1:31 p.m. UTC | #3
On 09/12/2021 18.33, Alexander Lobakin wrote:
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Date: Thu, 9 Dec 2021 09:19:46 +0100
> 
>> On 08/12/2021 15.06, Alexander Lobakin wrote:
>>> {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD
>>> + NET_IP_ALIGN for any skb.
>>> OTOH, i40e_construct_skb_zc() currently allocates and reserves
>>> additional `xdp->data - xdp->data_hard_start`, which is
>>> XDP_PACKET_HEADROOM for XSK frames.
>>> There's no need for that at all as the frame is post-XDP and will
>>> go only to the networking stack core.
>>
>> I disagree with this assumption, that headroom is not needed by netstack.
>> Why "no need for that at all" for netstack?
> 
> napi_alloc_skb() in our particular case will reserve 64 bytes, it is
> sufficient for {TCP,UDP,SCTP,...}/IPv{4,6} etc.

My bad, I misunderstood you. I now see (looking at code) that (as you 
say) 64 bytes of headroom *is* reserved (in bottom of __napi_alloc_skb).
Thus, the SKB *do* have headroom, so this patch should be fine.

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Do watch out that 64 bytes is not always enough. Notice the define 
LL_MAX_HEADER and MAX_HEADER in include/linux/netdevice.h (that tries to 
determine worst-case header length) which is above 64 bytes. It is also 
affected by HyperV and WiFi configs.
Bhandare, KiranX Jan. 29, 2022, 8:55 a.m. UTC | #4
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Alexander Lobakin
> Sent: Wednesday, December 8, 2021 7:37 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Song Liu <songliubraving@fb.com>; Jesper Dangaard Brouer
> <hawk@kernel.org>; Daniel Borkmann <daniel@iogearbox.net>; Yonghong
> Song <yhs@fb.com>; Martin KaFai Lau <kafai@fb.com>; John Fastabend
> <john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; Andrii
> Nakryiko <andrii@kernel.org>; Björn Töpel <bjorn@kernel.org>;
> netdev@vger.kernel.org; Jakub Kicinski <kuba@kernel.org>; KP Singh
> <kpsingh@kernel.org>; bpf@vger.kernel.org; David S. Miller
> <davem@davemloft.net>; linux-kernel@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH v4 net-next 1/9] i40e: don't reserve
> excessive XDP_PACKET_HEADROOM on XSK Rx to skb
> 
> {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD
> + NET_IP_ALIGN for any skb.
> OTOH, i40e_construct_skb_zc() currently allocates and reserves additional
> `xdp->data - xdp->data_hard_start`, which is XDP_PACKET_HEADROOM for
> XSK frames.
> There's no need for that at all as the frame is post-XDP and will go only to the
> networking stack core.
> Pass the size of the actual data only to __napi_alloc_skb() and don't reserve
> anything. This will give enough headroom for stack processing.
> 
> Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 

Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>  A Contingent Worker at Intel
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index f08d19b8c554..9564906b7da8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -245,13 +245,11 @@  static struct sk_buff *i40e_construct_skb_zc(struct i40e_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* allocate a skb to store the frags */
-	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
-			       xdp->data_end - xdp->data_hard_start,
+	skb = __napi_alloc_skb(&rx_ring->q_vector->napi, datasize,
 			       GFP_ATOMIC | __GFP_NOWARN);
 	if (unlikely(!skb))
 		goto out;
 
-	skb_reserve(skb, xdp->data - xdp->data_hard_start);
 	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
 	if (metasize)
 		skb_metadata_set(skb, metasize);