diff mbox series

[bpf-next,V2] bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo

Message ID 168563651438.3436004.17735707525651776648.stgit@firesoul (mailing list archive)
State Accepted
Commit 411486626e5779bd85439282985ff3fc25a3f6d2
Delegated to: BPF
Headers show
Series [bpf-next,V2] bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31 success Logs for veristat
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 34 this patch: 34
netdev/cc_maintainers warning 15 maintainers not CCed: kuba@kernel.org hawk@kernel.org daniel@iogearbox.net yhs@fb.com kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com sdf@google.com song@kernel.org andrii@kernel.org jolsa@kernel.org davem@davemloft.net pabeni@redhat.com haoluo@google.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 34 this patch: 34
netdev/checkpatch warning CHECK: Unnecessary parentheses around 'offset < size'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat

Commit Message

Jesper Dangaard Brouer June 1, 2023, 4:21 p.m. UTC
Currently we observed a significant performance degradation in
samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling,
added in commit 772251742262 ("samples/bpf: fixup some tools to be able
to support xdp multibuffer").

This patch reduce the overhead by avoiding to read/load shared_info
(sinfo) memory area, when XDP packet don't have any frags. This improves
performance because sinfo is located in another cacheline.

Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes()
and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can
potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check
to avoid accessing sinfo in no-frags case.

The likely/unlikely instrumentation lays out asm code such that sinfo
access isn't interleaved with no-frags case (checked on GCC 12.2.1-4).
The generated asm code is more compact towards the no-frags case.

The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it
should also take effect for that.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Lorenzo Bianconi June 1, 2023, 8:34 p.m. UTC | #1
> Currently we observed a significant performance degradation in
> samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling,
> added in commit 772251742262 ("samples/bpf: fixup some tools to be able
> to support xdp multibuffer").
> 
> This patch reduce the overhead by avoiding to read/load shared_info
> (sinfo) memory area, when XDP packet don't have any frags. This improves
> performance because sinfo is located in another cacheline.
> 
> Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes()
> and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can
> potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check
> to avoid accessing sinfo in no-frags case.
> 
> The likely/unlikely instrumentation lays out asm code such that sinfo
> access isn't interleaved with no-frags case (checked on GCC 12.2.1-4).
> The generated asm code is more compact towards the no-frags case.
> 
> The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it
> should also take effect for that.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  net/core/filter.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 968139f4a1ac..961db5bd2f94 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3948,20 +3948,21 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
>  
>  void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len)
>  {
> -	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
>  	u32 size = xdp->data_end - xdp->data;
> +	struct skb_shared_info *sinfo;
>  	void *addr = xdp->data;
>  	int i;
>  
>  	if (unlikely(offset > 0xffff || len > 0xffff))
>  		return ERR_PTR(-EFAULT);
>  
> -	if (offset + len > xdp_get_buff_len(xdp))
> +	if (unlikely(offset + len > xdp_get_buff_len(xdp)))
>  		return ERR_PTR(-EINVAL);
>  
> -	if (offset < size) /* linear area */
> +	if (likely((offset < size))) /* linear area */

nit: you can drop a round bracket here. Other than that:

Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

>  		goto out;
>  
> +	sinfo = xdp_get_shared_info_from_buff(xdp);
>  	offset -= size;
>  	for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */
>  		u32 frag_size = skb_frag_size(&sinfo->frags[i]);
> 
>
Toke Høiland-Jørgensen June 1, 2023, 8:37 p.m. UTC | #2
Jesper Dangaard Brouer <brouer@redhat.com> writes:

> Currently we observed a significant performance degradation in
> samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling,
> added in commit 772251742262 ("samples/bpf: fixup some tools to be able
> to support xdp multibuffer").
>
> This patch reduce the overhead by avoiding to read/load shared_info
> (sinfo) memory area, when XDP packet don't have any frags. This improves
> performance because sinfo is located in another cacheline.
>
> Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes()
> and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can
> potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check
> to avoid accessing sinfo in no-frags case.
>
> The likely/unlikely instrumentation lays out asm code such that sinfo
> access isn't interleaved with no-frags case (checked on GCC 12.2.1-4).
> The generated asm code is more compact towards the no-frags case.
>
> The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it
> should also take effect for that.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Thanks for fixing this!

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Alexei Starovoitov June 5, 2023, 8:41 p.m. UTC | #3
On Thu, Jun 1, 2023 at 1:34 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> > Currently we observed a significant performance degradation in
> > samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling,
> > added in commit 772251742262 ("samples/bpf: fixup some tools to be able
> > to support xdp multibuffer").
> >
> > This patch reduce the overhead by avoiding to read/load shared_info
> > (sinfo) memory area, when XDP packet don't have any frags. This improves
> > performance because sinfo is located in another cacheline.
> >
> > Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes()
> > and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can
> > potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check
> > to avoid accessing sinfo in no-frags case.
> >
> > The likely/unlikely instrumentation lays out asm code such that sinfo
> > access isn't interleaved with no-frags case (checked on GCC 12.2.1-4).
> > The generated asm code is more compact towards the no-frags case.
> >
> > The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it
> > should also take effect for that.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  net/core/filter.c |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 968139f4a1ac..961db5bd2f94 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3948,20 +3948,21 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
> >
> >  void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len)
> >  {
> > -     struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> >       u32 size = xdp->data_end - xdp->data;
> > +     struct skb_shared_info *sinfo;
> >       void *addr = xdp->data;
> >       int i;
> >
> >       if (unlikely(offset > 0xffff || len > 0xffff))
> >               return ERR_PTR(-EFAULT);
> >
> > -     if (offset + len > xdp_get_buff_len(xdp))
> > +     if (unlikely(offset + len > xdp_get_buff_len(xdp)))
> >               return ERR_PTR(-EINVAL);
> >
> > -     if (offset < size) /* linear area */
> > +     if (likely((offset < size))) /* linear area */
>
> nit: you can drop a round bracket here. Other than that:

Fixed while applying. Thanks everyone.

> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
>
> >               goto out;
> >
> > +     sinfo = xdp_get_shared_info_from_buff(xdp);
> >       offset -= size;
> >       for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */
> >               u32 frag_size = skb_frag_size(&sinfo->frags[i]);
> >
> >
patchwork-bot+netdevbpf@kernel.org June 5, 2023, 8:50 p.m. UTC | #4
Hello:

This patch was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Thu, 01 Jun 2023 18:21:54 +0200 you wrote:
> Currently we observed a significant performance degradation in
> samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling,
> added in commit 772251742262 ("samples/bpf: fixup some tools to be able
> to support xdp multibuffer").
> 
> This patch reduce the overhead by avoiding to read/load shared_info
> (sinfo) memory area, when XDP packet don't have any frags. This improves
> performance because sinfo is located in another cacheline.
> 
> [...]

Here is the summary with links:
  - [bpf-next,V2] bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo
    https://git.kernel.org/bpf/bpf-next/c/411486626e57

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 968139f4a1ac..961db5bd2f94 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3948,20 +3948,21 @@  void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
 
 void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len)
 {
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	u32 size = xdp->data_end - xdp->data;
+	struct skb_shared_info *sinfo;
 	void *addr = xdp->data;
 	int i;
 
 	if (unlikely(offset > 0xffff || len > 0xffff))
 		return ERR_PTR(-EFAULT);
 
-	if (offset + len > xdp_get_buff_len(xdp))
+	if (unlikely(offset + len > xdp_get_buff_len(xdp)))
 		return ERR_PTR(-EINVAL);
 
-	if (offset < size) /* linear area */
+	if (likely((offset < size))) /* linear area */
 		goto out;
 
+	sinfo = xdp_get_shared_info_from_buff(xdp);
 	offset -= size;
 	for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */
 		u32 frag_size = skb_frag_size(&sinfo->frags[i]);