diff mbox series

[bpf,1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags

Message ID 20250403140846.1268564-2-willemdebruijn.kernel@gmail.com (mailing list archive)
State New
Delegated to: BPF
Headers show
Series support SKF_NET_OFF and SKF_LL_OFF on skb frags | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR success PR summary
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 13 this patch: 13
netdev/build_tools success Errors and warnings before: 26 (+2) this patch: 26 (+2)
netdev/cc_maintainers warning 13 maintainers not CCed: yonghong.song@linux.dev andrii@kernel.org edumazet@google.com kuba@kernel.org song@kernel.org eddyz87@gmail.com sdf@fomichev.me martin.lau@linux.dev haoluo@google.com pabeni@redhat.com jolsa@kernel.org horms@kernel.org kpsingh@kernel.org
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1374 this patch: 1374
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 141 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 6 this patch: 6
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / GCC BPF
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for aarch64-gcc / veristat-meta
bpf/vmtest-bpf-VM_Test-11 success Logs for aarch64-gcc / veristat-kernel
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / GCC BPF
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-15 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-19 success Logs for s390x-gcc / veristat-kernel
bpf/vmtest-bpf-VM_Test-18 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for s390x-gcc / veristat-meta
bpf/vmtest-bpf-VM_Test-21 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / GCC BPF / GCC BPF
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-gcc / veristat-kernel / x86_64-gcc veristat_kernel
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-17 / veristat-kernel
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-44 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-46 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-17 / veristat-meta
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / GCC BPF / GCC BPF
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-45 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-47 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-42 success Logs for x86_64-llvm-18 / GCC BPF / GCC BPF
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-48 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-43 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-gcc / veristat-meta / x86_64-gcc veristat_meta
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-49 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-50 success Logs for x86_64-llvm-18 / veristat-kernel
bpf/vmtest-bpf-VM_Test-51 success Logs for x86_64-llvm-18 / veristat-meta

Commit Message

Willem de Bruijn April 3, 2025, 2:07 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
read when these offsets extend into frags.

This has been observed with iwlwifi and reproduced with tun with
IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
applied to a RAW socket, will silently miss matching packets.

    const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
    const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
    struct sock_filter filter_code[] = {
            BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
            BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
            BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),

This is unexpected behavior. Socket filter programs should be
consistent regardless of environment. Silent misses are
particularly concerning as hard to detect.

Use skb_copy_bits for offsets outside linear, same as done for
non-SKF_(LL|NET) offsets.

Offset is always positive after subtracting the reference threshold
SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
the two is an offset against skb->data, and may be negative, but it
cannot point before skb->head, as skb_(mac|network)_offset would too.

This appears to go back to when frag support was introduced to
sk_run_filter in linux-2.4.4, before the introduction of git.

The amount of code change and 8/16/32 bit duplication are unfortunate.
But any attempt I made to be smarter saved very few LoC while
complicating the code.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
Reported-by: Matt Moeller <moeller.matt@gmail.com>
Co-developed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/filter.h |  3 --
 kernel/bpf/core.c      | 21 ------------
 net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
 3 files changed, 42 insertions(+), 57 deletions(-)

Comments

Willem de Bruijn April 3, 2025, 2:18 p.m. UTC | #1
Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> read when these offsets extend into frags.
> 
> This has been observed with iwlwifi and reproduced with tun with
> IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> applied to a RAW socket, will silently miss matching packets.
> 
>     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
>     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
>     struct sock_filter filter_code[] = {
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
>             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> 
> This is unexpected behavior. Socket filter programs should be
> consistent regardless of environment. Silent misses are
> particularly concerning as hard to detect.
> 
> Use skb_copy_bits for offsets outside linear, same as done for
> non-SKF_(LL|NET) offsets.
> 
> Offset is always positive after subtracting the reference threshold
> SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> the two is an offset against skb->data, and may be negative, but it
> cannot point before skb->head, as skb_(mac|network)_offset would too.
> 
> This appears to go back to when frag support was introduced to
> sk_run_filter in linux-2.4.4, before the introduction of git.
> 
> The amount of code change and 8/16/32 bit duplication are unfortunate.
> But any attempt I made to be smarter saved very few LoC while
> complicating the code.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/

Let me respond here to the earlier comments in that thread, rather than reopen that.
https://lore.kernel.org/netdev/4a6be957-f932-426a-99bf-7209620f8fa9@iogearbox.net/

As Daniel suggests, a developer can work around the issue in a variety
of ways. And that is what this customer does in the short term. Avoid
SKF_NET_OFF.

But I think this does need to be addressed. SKF_NET_OFF doing what you
expect in many cases, but silently missing matches in some, is just
dangerous. It's not hard to fix: just use skb_copy_bits, like we also
do for regular positive offsets.

The specific instance is due to attaching a filter in the L3 layer,
while inspecting a transport header that will only get pulled by the
L4 layer. Pulling these headers in device drivers is a non-starter.
Stanislav Fomichev April 3, 2025, 5:21 p.m. UTC | #2
On 04/03, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> read when these offsets extend into frags.
> 
> This has been observed with iwlwifi and reproduced with tun with
> IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> applied to a RAW socket, will silently miss matching packets.
> 
>     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
>     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
>     struct sock_filter filter_code[] = {
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
>             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
>             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
>             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> 
> This is unexpected behavior. Socket filter programs should be
> consistent regardless of environment. Silent misses are
> particularly concerning as hard to detect.
> 
> Use skb_copy_bits for offsets outside linear, same as done for
> non-SKF_(LL|NET) offsets.
> 
> Offset is always positive after subtracting the reference threshold
> SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> the two is an offset against skb->data, and may be negative, but it
> cannot point before skb->head, as skb_(mac|network)_offset would too.
> 
> This appears to go back to when frag support was introduced to
> sk_run_filter in linux-2.4.4, before the introduction of git.
> 
> The amount of code change and 8/16/32 bit duplication are unfortunate.
> But any attempt I made to be smarter saved very few LoC while
> complicating the code.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> Reported-by: Matt Moeller <moeller.matt@gmail.com>
> Co-developed-by: Maciej Żenczykowski <maze@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  include/linux/filter.h |  3 --
>  kernel/bpf/core.c      | 21 ------------
>  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
>  3 files changed, 42 insertions(+), 57 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index f5cf4d35d83e..708ac7e0cd36 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
>  	}
>  }
>  
> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> -					   int k, unsigned int size);
> -
>  static inline int bpf_tell_extensions(void)
>  {
>  	return SKF_AD_MAX;
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index ba6b6118cf50..0e836b5ac9a0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -68,27 +68,6 @@
>  struct bpf_mem_alloc bpf_global_ma;
>  bool bpf_global_ma_set;
>  
> -/* No hurry in this branch
> - *
> - * Exported for the bpf jit load helper.
> - */
> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> -{
> -	u8 *ptr = NULL;
> -
> -	if (k >= SKF_NET_OFF) {
> -		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> -	} else if (k >= SKF_LL_OFF) {
> -		if (unlikely(!skb_mac_header_was_set(skb)))
> -			return NULL;
> -		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> -	}
> -	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> -		return ptr;
> -
> -	return NULL;
> -}
> -
>  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
>  enum page_size_enum {
>  	__PAGE_SIZE = PAGE_SIZE
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bc6828761a47..b232b70dd10d 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
>  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
>  	   data, int, headlen, int, offset)
>  {
> -	u8 tmp, *ptr;
> +	u8 tmp;
>  	const int len = sizeof(tmp);
>  
> -	if (offset >= 0) {
> -		if (headlen - offset >= len)
> -			return *(u8 *)(data + offset);
> -		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> -			return tmp;
> -	} else {
> -		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> -		if (likely(ptr))
> -			return *(u8 *)ptr;

[..]

> +	if (offset < 0) {
> +		if (offset >= SKF_NET_OFF)
> +			offset += skb_network_offset(skb) - SKF_NET_OFF;
> +		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> +			offset += skb_mac_offset(skb) - SKF_LL_OFF;
> +		else
> +			return -EFAULT;
>  	}

nit: we now repeat the same logic three times, maybe still worth it to put it
into a helper? bpf_resolve_classic_offset or something.
Willem de Bruijn April 3, 2025, 5:27 p.m. UTC | #3
Stanislav Fomichev wrote:
> On 04/03, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> > 
> > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > read when these offsets extend into frags.
> > 
> > This has been observed with iwlwifi and reproduced with tun with
> > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > applied to a RAW socket, will silently miss matching packets.
> > 
> >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> >     struct sock_filter filter_code[] = {
> >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > 
> > This is unexpected behavior. Socket filter programs should be
> > consistent regardless of environment. Silent misses are
> > particularly concerning as hard to detect.
> > 
> > Use skb_copy_bits for offsets outside linear, same as done for
> > non-SKF_(LL|NET) offsets.
> > 
> > Offset is always positive after subtracting the reference threshold
> > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > the two is an offset against skb->data, and may be negative, but it
> > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > 
> > This appears to go back to when frag support was introduced to
> > sk_run_filter in linux-2.4.4, before the introduction of git.
> > 
> > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > But any attempt I made to be smarter saved very few LoC while
> > complicating the code.
> > 
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> >  include/linux/filter.h |  3 --
> >  kernel/bpf/core.c      | 21 ------------
> >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> >  3 files changed, 42 insertions(+), 57 deletions(-)
> > 
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index f5cf4d35d83e..708ac7e0cd36 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> >  	}
> >  }
> >  
> > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > -					   int k, unsigned int size);
> > -
> >  static inline int bpf_tell_extensions(void)
> >  {
> >  	return SKF_AD_MAX;
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index ba6b6118cf50..0e836b5ac9a0 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -68,27 +68,6 @@
> >  struct bpf_mem_alloc bpf_global_ma;
> >  bool bpf_global_ma_set;
> >  
> > -/* No hurry in this branch
> > - *
> > - * Exported for the bpf jit load helper.
> > - */
> > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > -{
> > -	u8 *ptr = NULL;
> > -
> > -	if (k >= SKF_NET_OFF) {
> > -		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > -	} else if (k >= SKF_LL_OFF) {
> > -		if (unlikely(!skb_mac_header_was_set(skb)))
> > -			return NULL;
> > -		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > -	}
> > -	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > -		return ptr;
> > -
> > -	return NULL;
> > -}
> > -
> >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> >  enum page_size_enum {
> >  	__PAGE_SIZE = PAGE_SIZE
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index bc6828761a47..b232b70dd10d 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> >  	   data, int, headlen, int, offset)
> >  {
> > -	u8 tmp, *ptr;
> > +	u8 tmp;
> >  	const int len = sizeof(tmp);
> >  
> > -	if (offset >= 0) {
> > -		if (headlen - offset >= len)
> > -			return *(u8 *)(data + offset);
> > -		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > -			return tmp;
> > -	} else {
> > -		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > -		if (likely(ptr))
> > -			return *(u8 *)ptr;
> 
> [..]
> 
> > +	if (offset < 0) {
> > +		if (offset >= SKF_NET_OFF)
> > +			offset += skb_network_offset(skb) - SKF_NET_OFF;
> > +		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > +			offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > +		else
> > +			return -EFAULT;
> >  	}
> 
> nit: we now repeat the same logic three times, maybe still worth it to put it
> into a helper? bpf_resolve_classic_offset or something. 

I definitely tried this in various ways. But since the core logic is
only four lines and there is an early return on error, no helper
really simplifies anything. It just adds a layer of indirection and
more code in the end.
Stanislav Fomichev April 3, 2025, 5:35 p.m. UTC | #4
On 04/03, Willem de Bruijn wrote:
> Stanislav Fomichev wrote:
> > On 04/03, Willem de Bruijn wrote:
> > > From: Willem de Bruijn <willemb@google.com>
> > > 
> > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > > read when these offsets extend into frags.
> > > 
> > > This has been observed with iwlwifi and reproduced with tun with
> > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > > applied to a RAW socket, will silently miss matching packets.
> > > 
> > >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> > >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> > >     struct sock_filter filter_code[] = {
> > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> > >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > > 
> > > This is unexpected behavior. Socket filter programs should be
> > > consistent regardless of environment. Silent misses are
> > > particularly concerning as hard to detect.
> > > 
> > > Use skb_copy_bits for offsets outside linear, same as done for
> > > non-SKF_(LL|NET) offsets.
> > > 
> > > Offset is always positive after subtracting the reference threshold
> > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > > the two is an offset against skb->data, and may be negative, but it
> > > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > > 
> > > This appears to go back to when frag support was introduced to
> > > sk_run_filter in linux-2.4.4, before the introduction of git.
> > > 
> > > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > > But any attempt I made to be smarter saved very few LoC while
> > > complicating the code.
> > > 
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > ---
> > >  include/linux/filter.h |  3 --
> > >  kernel/bpf/core.c      | 21 ------------
> > >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> > >  3 files changed, 42 insertions(+), 57 deletions(-)
> > > 
> > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > index f5cf4d35d83e..708ac7e0cd36 100644
> > > --- a/include/linux/filter.h
> > > +++ b/include/linux/filter.h
> > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> > >  	}
> > >  }
> > >  
> > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > > -					   int k, unsigned int size);
> > > -
> > >  static inline int bpf_tell_extensions(void)
> > >  {
> > >  	return SKF_AD_MAX;
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index ba6b6118cf50..0e836b5ac9a0 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -68,27 +68,6 @@
> > >  struct bpf_mem_alloc bpf_global_ma;
> > >  bool bpf_global_ma_set;
> > >  
> > > -/* No hurry in this branch
> > > - *
> > > - * Exported for the bpf jit load helper.
> > > - */
> > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > > -{
> > > -	u8 *ptr = NULL;
> > > -
> > > -	if (k >= SKF_NET_OFF) {
> > > -		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > > -	} else if (k >= SKF_LL_OFF) {
> > > -		if (unlikely(!skb_mac_header_was_set(skb)))
> > > -			return NULL;
> > > -		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > > -	}
> > > -	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > > -		return ptr;
> > > -
> > > -	return NULL;
> > > -}
> > > -
> > >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> > >  enum page_size_enum {
> > >  	__PAGE_SIZE = PAGE_SIZE
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index bc6828761a47..b232b70dd10d 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> > >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> > >  	   data, int, headlen, int, offset)
> > >  {
> > > -	u8 tmp, *ptr;
> > > +	u8 tmp;
> > >  	const int len = sizeof(tmp);
> > >  
> > > -	if (offset >= 0) {
> > > -		if (headlen - offset >= len)
> > > -			return *(u8 *)(data + offset);
> > > -		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > -			return tmp;
> > > -	} else {
> > > -		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > -		if (likely(ptr))
> > > -			return *(u8 *)ptr;
> > 
> > [..]
> > 
> > > +	if (offset < 0) {
> > > +		if (offset >= SKF_NET_OFF)
> > > +			offset += skb_network_offset(skb) - SKF_NET_OFF;
> > > +		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > > +			offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > > +		else
> > > +			return -EFAULT;
> > >  	}
> > 
> > nit: we now repeat the same logic three times, maybe still worth it to put it
> > into a helper? bpf_resolve_classic_offset or something. 
> 
> I definitely tried this in various ways. But since the core logic is
> only four lines and there is an early return on error, no helper
> really simplifies anything. It just adds a layer of indirection and
> more code in the end.

More code, but at least it de-duplicates the logic of translating
SKF_XXX_OFF? Something like the following below, but yeah, a matter
of preference, up to you.

static int bpf_skb_resolve_offset(skb, offset) {
	if (offset >= 0)
		return offset;

	if (offset >= SKF_NET_OFF)
		offset += skb_network_offset(skb) - SKF_NET_OFF;
	else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
		offset += skb_mac_offset(skb) - SKF_LL_OFF;

	return -1;
}

BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
           data, int, headlen, int, offset)
{
	offset = bpf_skb_resolve_offset(skb, offset);
	if (offset < 0)
		return -EFAULT;

	...
}
Maciej Żenczykowski April 3, 2025, 6:30 p.m. UTC | #5
On Thu, Apr 3, 2025 at 10:35 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 04/03, Willem de Bruijn wrote:
> > Stanislav Fomichev wrote:
> > > On 04/03, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > > > read when these offsets extend into frags.
> > > >
> > > > This has been observed with iwlwifi and reproduced with tun with
> > > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > > > applied to a RAW socket, will silently miss matching packets.
> > > >
> > > >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> > > >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> > > >     struct sock_filter filter_code[] = {
> > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> > > >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > > >
> > > > This is unexpected behavior. Socket filter programs should be
> > > > consistent regardless of environment. Silent misses are
> > > > particularly concerning as hard to detect.
> > > >
> > > > Use skb_copy_bits for offsets outside linear, same as done for
> > > > non-SKF_(LL|NET) offsets.
> > > >
> > > > Offset is always positive after subtracting the reference threshold
> > > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > > > the two is an offset against skb->data, and may be negative, but it
> > > > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > > >
> > > > This appears to go back to when frag support was introduced to
> > > > sk_run_filter in linux-2.4.4, before the introduction of git.
> > > >
> > > > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > > > But any attempt I made to be smarter saved very few LoC while
> > > > complicating the code.
> > > >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > > > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > > > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > > >  include/linux/filter.h |  3 --
> > > >  kernel/bpf/core.c      | 21 ------------
> > > >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> > > >  3 files changed, 42 insertions(+), 57 deletions(-)
> > > >
> > > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > > index f5cf4d35d83e..708ac7e0cd36 100644
> > > > --- a/include/linux/filter.h
> > > > +++ b/include/linux/filter.h
> > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> > > >   }
> > > >  }
> > > >
> > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > > > -                                    int k, unsigned int size);
> > > > -
> > > >  static inline int bpf_tell_extensions(void)
> > > >  {
> > > >   return SKF_AD_MAX;
> > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > index ba6b6118cf50..0e836b5ac9a0 100644
> > > > --- a/kernel/bpf/core.c
> > > > +++ b/kernel/bpf/core.c
> > > > @@ -68,27 +68,6 @@
> > > >  struct bpf_mem_alloc bpf_global_ma;
> > > >  bool bpf_global_ma_set;
> > > >
> > > > -/* No hurry in this branch
> > > > - *
> > > > - * Exported for the bpf jit load helper.
> > > > - */
> > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > > > -{
> > > > - u8 *ptr = NULL;
> > > > -
> > > > - if (k >= SKF_NET_OFF) {
> > > > -         ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > > > - } else if (k >= SKF_LL_OFF) {
> > > > -         if (unlikely(!skb_mac_header_was_set(skb)))
> > > > -                 return NULL;
> > > > -         ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > > > - }
> > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > > > -         return ptr;
> > > > -
> > > > - return NULL;
> > > > -}
> > > > -
> > > >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> > > >  enum page_size_enum {
> > > >   __PAGE_SIZE = PAGE_SIZE
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index bc6828761a47..b232b70dd10d 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> > > >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> > > >      data, int, headlen, int, offset)
> > > >  {
> > > > - u8 tmp, *ptr;
> > > > + u8 tmp;
> > > >   const int len = sizeof(tmp);
> > > >
> > > > - if (offset >= 0) {
> > > > -         if (headlen - offset >= len)
> > > > -                 return *(u8 *)(data + offset);
> > > > -         if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > -                 return tmp;
> > > > - } else {
> > > > -         ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > > -         if (likely(ptr))
> > > > -                 return *(u8 *)ptr;
> > >
> > > [..]
> > >
> > > > + if (offset < 0) {
> > > > +         if (offset >= SKF_NET_OFF)
> > > > +                 offset += skb_network_offset(skb) - SKF_NET_OFF;
> > > > +         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > > > +                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > > > +         else
> > > > +                 return -EFAULT;
> > > >   }
> > >
> > > nit: we now repeat the same logic three times, maybe still worth it to put it
> > > into a helper? bpf_resolve_classic_offset or something.
> >
> > I definitely tried this in various ways. But since the core logic is
> > only four lines and there is an early return on error, no helper
> > really simplifies anything. It just adds a layer of indirection and
> > more code in the end.
>
> More code, but at least it de-duplicates the logic of translating
> SKF_XXX_OFF? Something like the following below, but yeah, a matter
> of preference, up to you.
>
> static int bpf_skb_resolve_offset(skb, offset) {
>         if (offset >= 0)
>                 return offset;
>
>         if (offset >= SKF_NET_OFF)
>                 offset += skb_network_offset(skb) - SKF_NET_OFF;
>         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
>                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
>
>         return -1;
> }
>
> BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
>            data, int, headlen, int, offset)
> {
>         offset = bpf_skb_resolve_offset(skb, offset);
>         if (offset < 0)
>                 return -EFAULT;

this is incorrect, as offset can be legally negative here.

>
>         ...
> }

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Willem de Bruijn April 3, 2025, 6:54 p.m. UTC | #6
Maciej Żenczykowski wrote:
> On Thu, Apr 3, 2025 at 10:35 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 04/03, Willem de Bruijn wrote:
> > > Stanislav Fomichev wrote:
> > > > On 04/03, Willem de Bruijn wrote:
> > > > > From: Willem de Bruijn <willemb@google.com>
> > > > >
> > > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > > > > read when these offsets extend into frags.
> > > > >
> > > > > This has been observed with iwlwifi and reproduced with tun with
> > > > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > > > > applied to a RAW socket, will silently miss matching packets.
> > > > >
> > > > >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> > > > >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> > > > >     struct sock_filter filter_code[] = {
> > > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> > > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> > > > >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> > > > >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> > > > >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > > > >
> > > > > This is unexpected behavior. Socket filter programs should be
> > > > > consistent regardless of environment. Silent misses are
> > > > > particularly concerning as hard to detect.
> > > > >
> > > > > Use skb_copy_bits for offsets outside linear, same as done for
> > > > > non-SKF_(LL|NET) offsets.
> > > > >
> > > > > Offset is always positive after subtracting the reference threshold
> > > > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > > > > the two is an offset against skb->data, and may be negative, but it
> > > > > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > > > >
> > > > > This appears to go back to when frag support was introduced to
> > > > > sk_run_filter in linux-2.4.4, before the introduction of git.
> > > > >
> > > > > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > > > > But any attempt I made to be smarter saved very few LoC while
> > > > > complicating the code.
> > > > >
> > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > > > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > > > > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > > > > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > > > > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > ---
> > > > >  include/linux/filter.h |  3 --
> > > > >  kernel/bpf/core.c      | 21 ------------
> > > > >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> > > > >  3 files changed, 42 insertions(+), 57 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > > > index f5cf4d35d83e..708ac7e0cd36 100644
> > > > > --- a/include/linux/filter.h
> > > > > +++ b/include/linux/filter.h
> > > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> > > > >   }
> > > > >  }
> > > > >
> > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > > > > -                                    int k, unsigned int size);
> > > > > -
> > > > >  static inline int bpf_tell_extensions(void)
> > > > >  {
> > > > >   return SKF_AD_MAX;
> > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > > index ba6b6118cf50..0e836b5ac9a0 100644
> > > > > --- a/kernel/bpf/core.c
> > > > > +++ b/kernel/bpf/core.c
> > > > > @@ -68,27 +68,6 @@
> > > > >  struct bpf_mem_alloc bpf_global_ma;
> > > > >  bool bpf_global_ma_set;
> > > > >
> > > > > -/* No hurry in this branch
> > > > > - *
> > > > > - * Exported for the bpf jit load helper.
> > > > > - */
> > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > > > > -{
> > > > > - u8 *ptr = NULL;
> > > > > -
> > > > > - if (k >= SKF_NET_OFF) {
> > > > > -         ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > > > > - } else if (k >= SKF_LL_OFF) {
> > > > > -         if (unlikely(!skb_mac_header_was_set(skb)))
> > > > > -                 return NULL;
> > > > > -         ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > > > > - }
> > > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > > > > -         return ptr;
> > > > > -
> > > > > - return NULL;
> > > > > -}
> > > > > -
> > > > >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> > > > >  enum page_size_enum {
> > > > >   __PAGE_SIZE = PAGE_SIZE
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index bc6828761a47..b232b70dd10d 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> > > > >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> > > > >      data, int, headlen, int, offset)
> > > > >  {
> > > > > - u8 tmp, *ptr;
> > > > > + u8 tmp;
> > > > >   const int len = sizeof(tmp);
> > > > >
> > > > > - if (offset >= 0) {
> > > > > -         if (headlen - offset >= len)
> > > > > -                 return *(u8 *)(data + offset);
> > > > > -         if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > > > > -                 return tmp;
> > > > > - } else {
> > > > > -         ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > > > > -         if (likely(ptr))
> > > > > -                 return *(u8 *)ptr;
> > > >
> > > > [..]
> > > >
> > > > > + if (offset < 0) {
> > > > > +         if (offset >= SKF_NET_OFF)
> > > > > +                 offset += skb_network_offset(skb) - SKF_NET_OFF;
> > > > > +         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > > > > +                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > > > > +         else
> > > > > +                 return -EFAULT;
> > > > >   }
> > > >
> > > > nit: we now repeat the same logic three times, maybe still worth it to put it
> > > > into a helper? bpf_resolve_classic_offset or something.
> > >
> > > I definitely tried this in various ways. But since the core logic is
> > > only four lines and there is an early return on error, no helper
> > > really simplifies anything. It just adds a layer of indirection and
> > > more code in the end.
> >
> > More code, but at least it de-duplicates the logic of translating
> > SKF_XXX_OFF? Something like the following below, but yeah, a matter
> > of preference, up to you.

I see your point. No strong opinion from me. Will revise,
assuming you don't mind the workaround below:

> > static int bpf_skb_resolve_offset(skb, offset) {
> >         if (offset >= 0)
> >                 return offset;
> >
> >         if (offset >= SKF_NET_OFF)
> >                 offset += skb_network_offset(skb) - SKF_NET_OFF;
> >         else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> >                 offset += skb_mac_offset(skb) - SKF_LL_OFF;
> >
> >         return -1;
> > }
> >
> > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> >            data, int, headlen, int, offset)
> > {
> >         offset = bpf_skb_resolve_offset(skb, offset);
> >         if (offset < 0)
> >                 return -EFAULT;
> 
> this is incorrect, as offset can be legally negative here.

Yeah, this needs a special case like INT_MIN to communicate error,
or pass-by-value. Exactly the kind of workarounds that gave me pause.
diff mbox series

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f5cf4d35d83e..708ac7e0cd36 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1496,9 +1496,6 @@  static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
 	}
 }
 
-void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
-					   int k, unsigned int size);
-
 static inline int bpf_tell_extensions(void)
 {
 	return SKF_AD_MAX;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ba6b6118cf50..0e836b5ac9a0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -68,27 +68,6 @@ 
 struct bpf_mem_alloc bpf_global_ma;
 bool bpf_global_ma_set;
 
-/* No hurry in this branch
- *
- * Exported for the bpf jit load helper.
- */
-void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
-{
-	u8 *ptr = NULL;
-
-	if (k >= SKF_NET_OFF) {
-		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
-	} else if (k >= SKF_LL_OFF) {
-		if (unlikely(!skb_mac_header_was_set(skb)))
-			return NULL;
-		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
-	}
-	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
-		return ptr;
-
-	return NULL;
-}
-
 /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
 enum page_size_enum {
 	__PAGE_SIZE = PAGE_SIZE
diff --git a/net/core/filter.c b/net/core/filter.c
index bc6828761a47..b232b70dd10d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -221,21 +221,24 @@  BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
 BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
 	   data, int, headlen, int, offset)
 {
-	u8 tmp, *ptr;
+	u8 tmp;
 	const int len = sizeof(tmp);
 
-	if (offset >= 0) {
-		if (headlen - offset >= len)
-			return *(u8 *)(data + offset);
-		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
-			return tmp;
-	} else {
-		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
-		if (likely(ptr))
-			return *(u8 *)ptr;
+	if (offset < 0) {
+		if (offset >= SKF_NET_OFF)
+			offset += skb_network_offset(skb) - SKF_NET_OFF;
+		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
+			offset += skb_mac_offset(skb) - SKF_LL_OFF;
+		else
+			return -EFAULT;
 	}
 
-	return -EFAULT;
+	if (headlen - offset >= len)
+		return *(u8 *)(data + offset);
+	if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+		return tmp;
+	else
+		return -EFAULT;
 }
 
 BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
@@ -248,21 +251,24 @@  BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
 BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *,
 	   data, int, headlen, int, offset)
 {
-	__be16 tmp, *ptr;
+	__be16 tmp;
 	const int len = sizeof(tmp);
 
-	if (offset >= 0) {
-		if (headlen - offset >= len)
-			return get_unaligned_be16(data + offset);
-		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
-			return be16_to_cpu(tmp);
-	} else {
-		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
-		if (likely(ptr))
-			return get_unaligned_be16(ptr);
+	if (offset < 0) {
+		if (offset >= SKF_NET_OFF)
+			offset += skb_network_offset(skb) - SKF_NET_OFF;
+		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
+			offset += skb_mac_offset(skb) - SKF_LL_OFF;
+		else
+			return -EFAULT;
 	}
 
-	return -EFAULT;
+	if (headlen - offset >= len)
+		return get_unaligned_be16(data + offset);
+	if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+		return be16_to_cpu(tmp);
+	else
+		return -EFAULT;
 }
 
 BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
@@ -275,21 +281,24 @@  BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
 BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *,
 	   data, int, headlen, int, offset)
 {
-	__be32 tmp, *ptr;
+	__be32 tmp;
 	const int len = sizeof(tmp);
 
-	if (likely(offset >= 0)) {
-		if (headlen - offset >= len)
-			return get_unaligned_be32(data + offset);
-		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
-			return be32_to_cpu(tmp);
-	} else {
-		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
-		if (likely(ptr))
-			return get_unaligned_be32(ptr);
+	if (offset < 0) {
+		if (offset >= SKF_NET_OFF)
+			offset += skb_network_offset(skb) - SKF_NET_OFF;
+		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
+			offset += skb_mac_offset(skb) - SKF_LL_OFF;
+		else
+			return -EFAULT;
 	}
 
-	return -EFAULT;
+	if (headlen - offset >= len)
+		return get_unaligned_be32(data + offset);
+	if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+		return be32_to_cpu(tmp);
+	else
+		return -EFAULT;
 }
 
 BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb,