Message ID | 20250403140846.1268564-2-willemdebruijn.kernel@gmail.com (mailing list archive) |
---|---|
State | New |
Delegated to: | BPF |
Headers | show |
Series | support SKF_NET_OFF and SKF_LL_OFF on skb frags | expand |
Willem de Bruijn wrote: > From: Willem de Bruijn <willemb@google.com> > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > read when these offsets extend into frags. > > This has been observed with iwlwifi and reproduced with tun with > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > applied to a RAW socket, will silently miss matching packets. > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > struct sock_filter filter_code[] = { > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > This is unexpected behavior. Socket filter programs should be > consistent regardless of environment. Silent misses are > particularly concerning as hard to detect. > > Use skb_copy_bits for offsets outside linear, same as done for > non-SKF_(LL|NET) offsets. > > Offset is always positive after subtracting the reference threshold > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > the two is an offset against skb->data, and may be negative, but it > cannot point before skb->head, as skb_(mac|network)_offset would too. > > This appears to go back to when frag support was introduced to > sk_run_filter in linux-2.4.4, before the introduction of git. > > The amount of code change and 8/16/32 bit duplication are unfortunate. > But any attempt I made to be smarter saved very few LoC while > complicating the code. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ Let me respond here to the earlier comments in that thread, rather than reopen that. https://lore.kernel.org/netdev/4a6be957-f932-426a-99bf-7209620f8fa9@iogearbox.net/ As Daniel suggests, a developer can work around the issue in a variety of ways. And that is what this customer does in the short term. Avoid SKF_NET_OFF. But I think this does need to be addressed. SKF_NET_OFF doing what you expect in many cases, but silently missing matches in some, is just dangerous. It's not hard to fix: just use skb_copy_bits, like we also do for regular positive offsets. The specific instance is due to attaching a filter in the L3 layer, while inspecting a transport header that will only get pulled by the L4 layer. Pulling these headers in device drivers is a non-starter.
On 04/03, Willem de Bruijn wrote: > From: Willem de Bruijn <willemb@google.com> > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > read when these offsets extend into frags. > > This has been observed with iwlwifi and reproduced with tun with > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > applied to a RAW socket, will silently miss matching packets. > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > struct sock_filter filter_code[] = { > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > This is unexpected behavior. Socket filter programs should be > consistent regardless of environment. Silent misses are > particularly concerning as hard to detect. > > Use skb_copy_bits for offsets outside linear, same as done for > non-SKF_(LL|NET) offsets. > > Offset is always positive after subtracting the reference threshold > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > the two is an offset against skb->data, and may be negative, but it > cannot point before skb->head, as skb_(mac|network)_offset would too. > > This appears to go back to when frag support was introduced to > sk_run_filter in linux-2.4.4, before the introduction of git. > > The amount of code change and 8/16/32 bit duplication are unfortunate. > But any attempt I made to be smarter saved very few LoC while > complicating the code. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 > Reported-by: Matt Moeller <moeller.matt@gmail.com> > Co-developed-by: Maciej Żenczykowski <maze@google.com> > Signed-off-by: Maciej Żenczykowski <maze@google.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > include/linux/filter.h | 3 -- > kernel/bpf/core.c | 21 ------------ > net/core/filter.c | 75 +++++++++++++++++++++++------------------- > 3 files changed, 42 insertions(+), 57 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index f5cf4d35d83e..708ac7e0cd36 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > } > } > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > - int k, unsigned int size); > - > static inline int bpf_tell_extensions(void) > { > return SKF_AD_MAX; > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index ba6b6118cf50..0e836b5ac9a0 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -68,27 +68,6 @@ > struct bpf_mem_alloc bpf_global_ma; > bool bpf_global_ma_set; > > -/* No hurry in this branch > - * > - * Exported for the bpf jit load helper. > - */ > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > -{ > - u8 *ptr = NULL; > - > - if (k >= SKF_NET_OFF) { > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > - } else if (k >= SKF_LL_OFF) { > - if (unlikely(!skb_mac_header_was_set(skb))) > - return NULL; > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > - } > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > - return ptr; > - > - return NULL; > -} > - > /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ > enum page_size_enum { > __PAGE_SIZE = PAGE_SIZE > diff --git a/net/core/filter.c b/net/core/filter.c > index bc6828761a47..b232b70dd10d 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, > data, int, headlen, int, offset) > { > - u8 tmp, *ptr; > + u8 tmp; > const int len = sizeof(tmp); > > - if (offset >= 0) { > - if (headlen - offset >= len) > - return *(u8 *)(data + offset); > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) > - return tmp; > - } else { > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); > - if (likely(ptr)) > - return *(u8 *)ptr; [..] > + if (offset < 0) { > + if (offset >= SKF_NET_OFF) > + offset += skb_network_offset(skb) - SKF_NET_OFF; > + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) > + offset += skb_mac_offset(skb) - SKF_LL_OFF; > + else > + return -EFAULT; > } nit: we now repeat the same logic three times, maybe still worth it to put it into a helper? bpf_resolve_classic_offset or something.
Stanislav Fomichev wrote: > On 04/03, Willem de Bruijn wrote: > > From: Willem de Bruijn <willemb@google.com> > > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > > read when these offsets extend into frags. > > > > This has been observed with iwlwifi and reproduced with tun with > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > > applied to a RAW socket, will silently miss matching packets. > > > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > > struct sock_filter filter_code[] = { > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > > > This is unexpected behavior. Socket filter programs should be > > consistent regardless of environment. Silent misses are > > particularly concerning as hard to detect. > > > > Use skb_copy_bits for offsets outside linear, same as done for > > non-SKF_(LL|NET) offsets. > > > > Offset is always positive after subtracting the reference threshold > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > > the two is an offset against skb->data, and may be negative, but it > > cannot point before skb->head, as skb_(mac|network)_offset would too. > > > > This appears to go back to when frag support was introduced to > > sk_run_filter in linux-2.4.4, before the introduction of git. > > > > The amount of code change and 8/16/32 bit duplication are unfortunate. > > But any attempt I made to be smarter saved very few LoC while > > complicating the code. > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 > > Reported-by: Matt Moeller <moeller.matt@gmail.com> > > Co-developed-by: Maciej Żenczykowski <maze@google.com> > > Signed-off-by: Maciej Żenczykowski <maze@google.com> > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > --- > > include/linux/filter.h | 3 -- > > kernel/bpf/core.c | 21 ------------ > > net/core/filter.c | 75 +++++++++++++++++++++++------------------- > > 3 files changed, 42 insertions(+), 57 deletions(-) > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index f5cf4d35d83e..708ac7e0cd36 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > } > > } > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > - int k, unsigned int size); > > - > > static inline int bpf_tell_extensions(void) > > { > > return SKF_AD_MAX; > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > index ba6b6118cf50..0e836b5ac9a0 100644 > > --- a/kernel/bpf/core.c > > +++ b/kernel/bpf/core.c > > @@ -68,27 +68,6 @@ > > struct bpf_mem_alloc bpf_global_ma; > > bool bpf_global_ma_set; > > > > -/* No hurry in this branch > > - * > > - * Exported for the bpf jit load helper. > > - */ > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > -{ > > - u8 *ptr = NULL; > > - > > - if (k >= SKF_NET_OFF) { > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > - } else if (k >= SKF_LL_OFF) { > > - if (unlikely(!skb_mac_header_was_set(skb))) > > - return NULL; > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > - } > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > - return ptr; > > - > > - return NULL; > > -} > > - > > /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ > > enum page_size_enum { > > __PAGE_SIZE = PAGE_SIZE > > diff --git a/net/core/filter.c b/net/core/filter.c > > index bc6828761a47..b232b70dd10d 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) > > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, > > data, int, headlen, int, offset) > > { > > - u8 tmp, *ptr; > > + u8 tmp; > > const int len = sizeof(tmp); > > > > - if (offset >= 0) { > > - if (headlen - offset >= len) > > - return *(u8 *)(data + offset); > > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) > > - return tmp; > > - } else { > > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); > > - if (likely(ptr)) > > - return *(u8 *)ptr; > > [..] > > > + if (offset < 0) { > > + if (offset >= SKF_NET_OFF) > > + offset += skb_network_offset(skb) - SKF_NET_OFF; > > + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) > > + offset += skb_mac_offset(skb) - SKF_LL_OFF; > > + else > > + return -EFAULT; > > } > > nit: we now repeat the same logic three times, maybe still worth it to put it > into a helper? bpf_resolve_classic_offset or something. I definitely tried this in various ways. But since the core logic is only four lines and there is an early return on error, no helper really simplifies anything. It just adds a layer of indirection and more code in the end.
On 04/03, Willem de Bruijn wrote: > Stanislav Fomichev wrote: > > On 04/03, Willem de Bruijn wrote: > > > From: Willem de Bruijn <willemb@google.com> > > > > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > > > read when these offsets extend into frags. > > > > > > This has been observed with iwlwifi and reproduced with tun with > > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > > > applied to a RAW socket, will silently miss matching packets. > > > > > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > > > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > > > struct sock_filter filter_code[] = { > > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > > > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > > > > > This is unexpected behavior. Socket filter programs should be > > > consistent regardless of environment. Silent misses are > > > particularly concerning as hard to detect. > > > > > > Use skb_copy_bits for offsets outside linear, same as done for > > > non-SKF_(LL|NET) offsets. > > > > > > Offset is always positive after subtracting the reference threshold > > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > > > the two is an offset against skb->data, and may be negative, but it > > > cannot point before skb->head, as skb_(mac|network)_offset would too. > > > > > > This appears to go back to when frag support was introduced to > > > sk_run_filter in linux-2.4.4, before the introduction of git. > > > > > > The amount of code change and 8/16/32 bit duplication are unfortunate. > > > But any attempt I made to be smarter saved very few LoC while > > > complicating the code. > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ > > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 > > > Reported-by: Matt Moeller <moeller.matt@gmail.com> > > > Co-developed-by: Maciej Żenczykowski <maze@google.com> > > > Signed-off-by: Maciej Żenczykowski <maze@google.com> > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > --- > > > include/linux/filter.h | 3 -- > > > kernel/bpf/core.c | 21 ------------ > > > net/core/filter.c | 75 +++++++++++++++++++++++------------------- > > > 3 files changed, 42 insertions(+), 57 deletions(-) > > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > > index f5cf4d35d83e..708ac7e0cd36 100644 > > > --- a/include/linux/filter.h > > > +++ b/include/linux/filter.h > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > > } > > > } > > > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > > - int k, unsigned int size); > > > - > > > static inline int bpf_tell_extensions(void) > > > { > > > return SKF_AD_MAX; > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > index ba6b6118cf50..0e836b5ac9a0 100644 > > > --- a/kernel/bpf/core.c > > > +++ b/kernel/bpf/core.c > > > @@ -68,27 +68,6 @@ > > > struct bpf_mem_alloc bpf_global_ma; > > > bool bpf_global_ma_set; > > > > > > -/* No hurry in this branch > > > - * > > > - * Exported for the bpf jit load helper. > > > - */ > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > > -{ > > > - u8 *ptr = NULL; > > > - > > > - if (k >= SKF_NET_OFF) { > > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > > - } else if (k >= SKF_LL_OFF) { > > > - if (unlikely(!skb_mac_header_was_set(skb))) > > > - return NULL; > > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > > - } > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > > - return ptr; > > > - > > > - return NULL; > > > -} > > > - > > > /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ > > > enum page_size_enum { > > > __PAGE_SIZE = PAGE_SIZE > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index bc6828761a47..b232b70dd10d 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) > > > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, > > > data, int, headlen, int, offset) > > > { > > > - u8 tmp, *ptr; > > > + u8 tmp; > > > const int len = sizeof(tmp); > > > > > > - if (offset >= 0) { > > > - if (headlen - offset >= len) > > > - return *(u8 *)(data + offset); > > > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) > > > - return tmp; > > > - } else { > > > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); > > > - if (likely(ptr)) > > > - return *(u8 *)ptr; > > > > [..] > > > > > + if (offset < 0) { > > > + if (offset >= SKF_NET_OFF) > > > + offset += skb_network_offset(skb) - SKF_NET_OFF; > > > + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) > > > + offset += skb_mac_offset(skb) - SKF_LL_OFF; > > > + else > > > + return -EFAULT; > > > } > > > > nit: we now repeat the same logic three times, maybe still worth it to put it > > into a helper? bpf_resolve_classic_offset or something. > > I definitely tried this in various ways. But since the core logic is > only four lines and there is an early return on error, no helper > really simplifies anything. It just adds a layer of indirection and > more code in the end. More code, but at least it de-duplicates the logic of translating SKF_XXX_OFF? Something like the following below, but yeah, a matter of preference, up to you. static int bpf_skb_resolve_offset(skb, offset) { if (offset >= 0) return offset; if (offset >= SKF_NET_OFF) offset += skb_network_offset(skb) - SKF_NET_OFF; else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) offset += skb_mac_offset(skb) - SKF_LL_OFF; return -1; } BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { offset = bpf_skb_resolve_offset(skb, offset); if (offset < 0) return -EFAULT; ... }
On Thu, Apr 3, 2025 at 10:35 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > On 04/03, Willem de Bruijn wrote: > > Stanislav Fomichev wrote: > > > On 04/03, Willem de Bruijn wrote: > > > > From: Willem de Bruijn <willemb@google.com> > > > > > > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > > > > read when these offsets extend into frags. > > > > > > > > This has been observed with iwlwifi and reproduced with tun with > > > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > > > > applied to a RAW socket, will silently miss matching packets. > > > > > > > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > > > > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > > > > struct sock_filter filter_code[] = { > > > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > > > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > > > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > > > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > > > > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > > > > > > > This is unexpected behavior. Socket filter programs should be > > > > consistent regardless of environment. Silent misses are > > > > particularly concerning as hard to detect. > > > > > > > > Use skb_copy_bits for offsets outside linear, same as done for > > > > non-SKF_(LL|NET) offsets. > > > > > > > > Offset is always positive after subtracting the reference threshold > > > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > > > > the two is an offset against skb->data, and may be negative, but it > > > > cannot point before skb->head, as skb_(mac|network)_offset would too. > > > > > > > > This appears to go back to when frag support was introduced to > > > > sk_run_filter in linux-2.4.4, before the introduction of git. > > > > > > > > The amount of code change and 8/16/32 bit duplication are unfortunate. > > > > But any attempt I made to be smarter saved very few LoC while > > > > complicating the code. > > > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ > > > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 > > > > Reported-by: Matt Moeller <moeller.matt@gmail.com> > > > > Co-developed-by: Maciej Żenczykowski <maze@google.com> > > > > Signed-off-by: Maciej Żenczykowski <maze@google.com> > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > --- > > > > include/linux/filter.h | 3 -- > > > > kernel/bpf/core.c | 21 ------------ > > > > net/core/filter.c | 75 +++++++++++++++++++++++------------------- > > > > 3 files changed, 42 insertions(+), 57 deletions(-) > > > > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > > > index f5cf4d35d83e..708ac7e0cd36 100644 > > > > --- a/include/linux/filter.h > > > > +++ b/include/linux/filter.h > > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > > > } > > > > } > > > > > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > > > - int k, unsigned int size); > > > > - > > > > static inline int bpf_tell_extensions(void) > > > > { > > > > return SKF_AD_MAX; > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > > index ba6b6118cf50..0e836b5ac9a0 100644 > > > > --- a/kernel/bpf/core.c > > > > +++ b/kernel/bpf/core.c > > > > @@ -68,27 +68,6 @@ > > > > struct bpf_mem_alloc bpf_global_ma; > > > > bool bpf_global_ma_set; > > > > > > > > -/* No hurry in this branch > > > > - * > > > > - * Exported for the bpf jit load helper. > > > > - */ > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > > > -{ > > > > - u8 *ptr = NULL; > > > > - > > > > - if (k >= SKF_NET_OFF) { > > > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > > > - } else if (k >= SKF_LL_OFF) { > > > > - if (unlikely(!skb_mac_header_was_set(skb))) > > > > - return NULL; > > > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > > > - } > > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > > > - return ptr; > > > > - > > > > - return NULL; > > > > -} > > > > - > > > > /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ > > > > enum page_size_enum { > > > > __PAGE_SIZE = PAGE_SIZE > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > > index bc6828761a47..b232b70dd10d 100644 > > > > --- a/net/core/filter.c > > > > +++ b/net/core/filter.c > > > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) > > > > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, > > > > data, int, headlen, int, offset) > > > > { > > > > - u8 tmp, *ptr; > > > > + u8 tmp; > > > > const int len = sizeof(tmp); > > > > > > > > - if (offset >= 0) { > > > > - if (headlen - offset >= len) > > > > - return *(u8 *)(data + offset); > > > > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) > > > > - return tmp; > > > > - } else { > > > > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); > > > > - if (likely(ptr)) > > > > - return *(u8 *)ptr; > > > > > > [..] > > > > > > > + if (offset < 0) { > > > > + if (offset >= SKF_NET_OFF) > > > > + offset += skb_network_offset(skb) - SKF_NET_OFF; > > > > + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) > > > > + offset += skb_mac_offset(skb) - SKF_LL_OFF; > > > > + else > > > > + return -EFAULT; > > > > } > > > > > > nit: we now repeat the same logic three times, maybe still worth it to put it > > > into a helper? bpf_resolve_classic_offset or something. > > > > I definitely tried this in various ways. But since the core logic is > > only four lines and there is an early return on error, no helper > > really simplifies anything. It just adds a layer of indirection and > > more code in the end. > > More code, but at least it de-duplicates the logic of translating > SKF_XXX_OFF? Something like the following below, but yeah, a matter > of preference, up to you. > > static int bpf_skb_resolve_offset(skb, offset) { > if (offset >= 0) > return offset; > > if (offset >= SKF_NET_OFF) > offset += skb_network_offset(skb) - SKF_NET_OFF; > else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) > offset += skb_mac_offset(skb) - SKF_LL_OFF; > > return -1; > } > > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, > data, int, headlen, int, offset) > { > offset = bpf_skb_resolve_offset(skb, offset); > if (offset < 0) > return -EFAULT; this is incorrect, as offset can be legally negative here. > > ... > } -- Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski wrote: > On Thu, Apr 3, 2025 at 10:35 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 04/03, Willem de Bruijn wrote: > > > Stanislav Fomichev wrote: > > > > On 04/03, Willem de Bruijn wrote: > > > > > From: Willem de Bruijn <willemb@google.com> > > > > > > > > > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > > > > > read when these offsets extend into frags. > > > > > > > > > > This has been observed with iwlwifi and reproduced with tun with > > > > > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > > > > > applied to a RAW socket, will silently miss matching packets. > > > > > > > > > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > > > > > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > > > > > struct sock_filter filter_code[] = { > > > > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > > > > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > > > > > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > > > > > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > > > > > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > > > > > > > > > This is unexpected behavior. Socket filter programs should be > > > > > consistent regardless of environment. Silent misses are > > > > > particularly concerning as hard to detect. > > > > > > > > > > Use skb_copy_bits for offsets outside linear, same as done for > > > > > non-SKF_(LL|NET) offsets. > > > > > > > > > > Offset is always positive after subtracting the reference threshold > > > > > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > > > > > the two is an offset against skb->data, and may be negative, but it > > > > > cannot point before skb->head, as skb_(mac|network)_offset would too. > > > > > > > > > > This appears to go back to when frag support was introduced to > > > > > sk_run_filter in linux-2.4.4, before the introduction of git. > > > > > > > > > > The amount of code change and 8/16/32 bit duplication are unfortunate. > > > > > But any attempt I made to be smarter saved very few LoC while > > > > > complicating the code. > > > > > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ > > > > > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 > > > > > Reported-by: Matt Moeller <moeller.matt@gmail.com> > > > > > Co-developed-by: Maciej Żenczykowski <maze@google.com> > > > > > Signed-off-by: Maciej Żenczykowski <maze@google.com> > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > --- > > > > > include/linux/filter.h | 3 -- > > > > > kernel/bpf/core.c | 21 ------------ > > > > > net/core/filter.c | 75 +++++++++++++++++++++++------------------- > > > > > 3 files changed, 42 insertions(+), 57 deletions(-) > > > > > > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > > > > index f5cf4d35d83e..708ac7e0cd36 100644 > > > > > --- a/include/linux/filter.h > > > > > +++ b/include/linux/filter.h > > > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > > > > } > > > > > } > > > > > > > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > > > > - int k, unsigned int size); > > > > > - > > > > > static inline int bpf_tell_extensions(void) > > > > > { > > > > > return SKF_AD_MAX; > > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > > > index ba6b6118cf50..0e836b5ac9a0 100644 > > > > > --- a/kernel/bpf/core.c > > > > > +++ b/kernel/bpf/core.c > > > > > @@ -68,27 +68,6 @@ > > > > > struct bpf_mem_alloc bpf_global_ma; > > > > > bool bpf_global_ma_set; > > > > > > > > > > -/* No hurry in this branch > > > > > - * > > > > > - * Exported for the bpf jit load helper. > > > > > - */ > > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > > > > -{ > > > > > - u8 *ptr = NULL; > > > > > - > > > > > - if (k >= SKF_NET_OFF) { > > > > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > > > > - } else if (k >= SKF_LL_OFF) { > > > > > - if (unlikely(!skb_mac_header_was_set(skb))) > > > > > - return NULL; > > > > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > > > > - } > > > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > > > > - return ptr; > > > > > - > > > > > - return NULL; > > > > > -} > > > > > - > > > > > /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ > > > > > enum page_size_enum { > > > > > __PAGE_SIZE = PAGE_SIZE > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > > > index bc6828761a47..b232b70dd10d 100644 > > > > > --- a/net/core/filter.c > > > > > +++ b/net/core/filter.c > > > > > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) > > > > > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, > > > > > data, int, headlen, int, offset) > > > > > { > > > > > - u8 tmp, *ptr; > > > > > + u8 tmp; > > > > > const int len = sizeof(tmp); > > > > > > > > > > - if (offset >= 0) { > > > > > - if (headlen - offset >= len) > > > > > - return *(u8 *)(data + offset); > > > > > - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) > > > > > - return tmp; > > > > > - } else { > > > > > - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); > > > > > - if (likely(ptr)) > > > > > - return *(u8 *)ptr; > > > > > > > > [..] > > > > > > > > > + if (offset < 0) { > > > > > + if (offset >= SKF_NET_OFF) > > > > > + offset += skb_network_offset(skb) - SKF_NET_OFF; > > > > > + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) > > > > > + offset += skb_mac_offset(skb) - SKF_LL_OFF; > > > > > + else > > > > > + return -EFAULT; > > > > > } > > > > > > > > nit: we now repeat the same logic three times, maybe still worth it to put it > > > > into a helper? bpf_resolve_classic_offset or something. > > > > > > I definitely tried this in various ways. But since the core logic is > > > only four lines and there is an early return on error, no helper > > > really simplifies anything. It just adds a layer of indirection and > > > more code in the end. > > > > More code, but at least it de-duplicates the logic of translating > > SKF_XXX_OFF? Something like the following below, but yeah, a matter > > of preference, up to you. I see your point. No strong opinion from me. Will revise, assuming you don't mind the workaround below: > > static int bpf_skb_resolve_offset(skb, offset) { > > if (offset >= 0) > > return offset; > > > > if (offset >= SKF_NET_OFF) > > offset += skb_network_offset(skb) - SKF_NET_OFF; > > else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) > > offset += skb_mac_offset(skb) - SKF_LL_OFF; > > > > return -1; > > } > > > > BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, > > data, int, headlen, int, offset) > > { > > offset = bpf_skb_resolve_offset(skb, offset); > > if (offset < 0) > > return -EFAULT; > > this is incorrect, as offset can be legally negative here. Yeah, this needs a special case like INT_MIN to communicate error, or pass-by-value. Exactly the kind of workarounds that gave me pause.
diff --git a/include/linux/filter.h b/include/linux/filter.h index f5cf4d35d83e..708ac7e0cd36 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) } } -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, - int k, unsigned int size); - static inline int bpf_tell_extensions(void) { return SKF_AD_MAX; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ba6b6118cf50..0e836b5ac9a0 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -68,27 +68,6 @@ struct bpf_mem_alloc bpf_global_ma; bool bpf_global_ma_set; -/* No hurry in this branch - * - * Exported for the bpf jit load helper. - */ -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) -{ - u8 *ptr = NULL; - - if (k >= SKF_NET_OFF) { - ptr = skb_network_header(skb) + k - SKF_NET_OFF; - } else if (k >= SKF_LL_OFF) { - if (unlikely(!skb_mac_header_was_set(skb))) - return NULL; - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; - } - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) - return ptr; - - return NULL; -} - /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ enum page_size_enum { __PAGE_SIZE = PAGE_SIZE diff --git a/net/core/filter.c b/net/core/filter.c index bc6828761a47..b232b70dd10d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - u8 tmp, *ptr; + u8 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return *(u8 *)(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return tmp; - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return *(u8 *)ptr; + if (offset < 0) { + if (offset >= SKF_NET_OFF) + offset += skb_network_offset(skb) - SKF_NET_OFF; + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + offset += skb_mac_offset(skb) - SKF_LL_OFF; + else + return -EFAULT; } - return -EFAULT; + if (headlen - offset >= len) + return *(u8 *)(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return tmp; + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, @@ -248,21 +251,24 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be16 tmp, *ptr; + __be16 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return get_unaligned_be16(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be16_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be16(ptr); + if (offset < 0) { + if (offset >= SKF_NET_OFF) + offset += skb_network_offset(skb) - SKF_NET_OFF; + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + offset += skb_mac_offset(skb) - SKF_LL_OFF; + else + return -EFAULT; } - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be16(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be16_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, @@ -275,21 +281,24 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be32 tmp, *ptr; + __be32 tmp; const int len = sizeof(tmp); - if (likely(offset >= 0)) { - if (headlen - offset >= len) - return get_unaligned_be32(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be32_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be32(ptr); + if (offset < 0) { + if (offset >= SKF_NET_OFF) + offset += skb_network_offset(skb) - SKF_NET_OFF; + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + offset += skb_mac_offset(skb) - SKF_LL_OFF; + else + return -EFAULT; } - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be32(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be32_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb,