diff mbox series

[bpf-next,v2,1/2] bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()

Message ID b231c7d0acacd702284158cd44734e72ef661a01.1673423199.git.william.xuanziyang@huawei.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room() | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 fail Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1717 this patch: 1717
netdev/cc_maintainers success CCed 17 of 17 maintainers
netdev/build_clang success Errors and warnings before: 159 this patch: 159
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1714 this patch: 1714
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 102 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix

Commit Message

Ziyang Xuan (William) Jan. 11, 2023, 8:01 a.m. UTC
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
Main use case is for using cls_bpf on ingress hook to decapsulate
IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.

Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the
new IP header version after decapsulating the outer IP header.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
---
 include/uapi/linux/bpf.h       |  8 ++++++++
 net/core/filter.c              | 26 +++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |  8 ++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn Jan. 11, 2023, 3:43 p.m. UTC | #1
On Wed, Jan 11, 2023 at 3:01 AM Ziyang Xuan
<william.xuanziyang@huawei.com> wrote:
>
> Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
> Main use case is for using cls_bpf on ingress hook to decapsulate
> IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.
>
> Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the
> new IP header version after decapsulating the outer IP header.
>
> Suggested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> ---
>  include/uapi/linux/bpf.h       |  8 ++++++++
>  net/core/filter.c              | 26 +++++++++++++++++++++++++-
>  tools/include/uapi/linux/bpf.h |  8 ++++++++
>  3 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 464ca3f01fe7..dde1c2ea1c84 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2644,6 +2644,12 @@ union bpf_attr {
>   *               Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the
>   *               L2 type as Ethernet.
>   *
> + *              * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
> + *                **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
> + *                Indicate the new IP header version after decapsulating the
> + *                outer IP header. Mainly used in scenarios that the inner and
> + *                outer IP versions are different.
> + *

Nit (only since I have another comment below)

Indicate -> Set
[Mainly used .. that] -> [Used when]

>         if (skb_is_gso(skb)) {
>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
>
> @@ -3596,6 +3609,10 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>         if (unlikely(proto != htons(ETH_P_IP) &&
>                      proto != htons(ETH_P_IPV6)))
>                 return -ENOTSUPP;
> +       if (unlikely((shrink && ((flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) ==
> +                     BPF_F_ADJ_ROOM_DECAP_L3_MASK)) || (!shrink &&
> +                     flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK)))
> +               return -EINVAL;
>
>         off = skb_mac_header_len(skb);
>         switch (mode) {
> @@ -3608,6 +3625,13 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>                 return -ENOTSUPP;
>         }
>
> +       if (shrink) {
> +               if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
> +                       len_min = sizeof(struct ipv6hdr);
> +               else if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
> +                       len_min = sizeof(struct iphdr);
> +       }
> +

How about combining this branch with the above:

  if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
    if (!shrink)
      return -EINVAL;

    switch (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
      case BPF_F_ADJ_ROOM_DECAP_L3_IPV4:
        len_min = sizeof(struct iphdr);
        break;
      case BPF_F_ADJ_ROOM_DECAP_L3_IPV6:
        len_min = sizeof(struct ipv6hdr);
        break;
      default:
        return -EINVAL;
    }
Martin KaFai Lau Jan. 12, 2023, 1:26 a.m. UTC | #2
On 1/11/23 12:01 AM, Ziyang Xuan wrote:
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 464ca3f01fe7..dde1c2ea1c84 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2644,6 +2644,12 @@ union bpf_attr {
>    *		  Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the
>    *		  L2 type as Ethernet.
>    *
> + *              * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
> + *                **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
> + *                Indicate the new IP header version after decapsulating the
> + *                outer IP header. Mainly used in scenarios that the inner and
> + *                outer IP versions are different.
> + *

selftests/bpf failed to compile. It is probably because there is leading spaces 
instead of using tabs: 
https://github.com/kernel-patches/bpf/actions/runs/3890850490/jobs/6640395038#step:11:112

   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/bpf-helpers.rst:1112: 
(WARNING/2) Bullet list ends without a blank line; unexpected unindent.
   make[1]: *** [Makefile.docs:76: 
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/bpf-helpers.7] Error 12
   make: *** [Makefile:259: docs] Error 2
Ziyang Xuan (William) Jan. 12, 2023, 7:15 a.m. UTC | #3
> On Wed, Jan 11, 2023 at 3:01 AM Ziyang Xuan
> <william.xuanziyang@huawei.com> wrote:
>>
>> Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
>> Main use case is for using cls_bpf on ingress hook to decapsulate
>> IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.
>>
>> Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the
>> new IP header version after decapsulating the outer IP header.
>>
>> Suggested-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
>> ---
>>  include/uapi/linux/bpf.h       |  8 ++++++++
>>  net/core/filter.c              | 26 +++++++++++++++++++++++++-
>>  tools/include/uapi/linux/bpf.h |  8 ++++++++
>>  3 files changed, 41 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 464ca3f01fe7..dde1c2ea1c84 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2644,6 +2644,12 @@ union bpf_attr {
>>   *               Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the
>>   *               L2 type as Ethernet.
>>   *
>> + *              * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
>> + *                **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
>> + *                Indicate the new IP header version after decapsulating the
>> + *                outer IP header. Mainly used in scenarios that the inner and
>> + *                outer IP versions are different.
>> + *
> 
> Nit (only since I have another comment below)
> 
> Indicate -> Set
Sorry, I think "Indicate" maybe more suitable. Because the new IP header is original inner
IP header, it's not be changed. The flags assist the kernel to better complete specific tasks.
I think "Set" has a meaning of change.

> [Mainly used .. that] -> [Used when]
This looks good to me. Thanks!

> 
>>         if (skb_is_gso(skb)) {
>>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
>>
>> @@ -3596,6 +3609,10 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>>         if (unlikely(proto != htons(ETH_P_IP) &&
>>                      proto != htons(ETH_P_IPV6)))
>>                 return -ENOTSUPP;
>> +       if (unlikely((shrink && ((flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) ==
>> +                     BPF_F_ADJ_ROOM_DECAP_L3_MASK)) || (!shrink &&
>> +                     flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK)))
>> +               return -EINVAL;
>>
>>         off = skb_mac_header_len(skb);
>>         switch (mode) {
>> @@ -3608,6 +3625,13 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>>                 return -ENOTSUPP;
>>         }
>>
>> +       if (shrink) {
>> +               if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
>> +                       len_min = sizeof(struct ipv6hdr);
>> +               else if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
>> +                       len_min = sizeof(struct iphdr);
>> +       }
>> +
> 
> How about combining this branch with the above:
> 
>   if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
>     if (!shrink)
>       return -EINVAL;
> 
>     switch (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
>       case BPF_F_ADJ_ROOM_DECAP_L3_IPV4:
>         len_min = sizeof(struct iphdr);
>         break;
>       case BPF_F_ADJ_ROOM_DECAP_L3_IPV6:
>         len_min = sizeof(struct ipv6hdr);
>         break;
>       default:
>         return -EINVAL;
>     }
> This looks good to me. Thanks!

>
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 464ca3f01fe7..dde1c2ea1c84 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2644,6 +2644,12 @@  union bpf_attr {
  *		  Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the
  *		  L2 type as Ethernet.
  *
+ *              * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
+ *                **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
+ *                Indicate the new IP header version after decapsulating the
+ *                outer IP header. Mainly used in scenarios that the inner and
+ *                outer IP versions are different.
+ *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
  * 		previously done by the verifier are invalidated and must be
@@ -5803,6 +5809,8 @@  enum {
 	BPF_F_ADJ_ROOM_ENCAP_L4_UDP	= (1ULL << 4),
 	BPF_F_ADJ_ROOM_NO_CSUM_RESET	= (1ULL << 5),
 	BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
+	BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
+	BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
 };
 
 enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index 43cc1fe58a2c..5fb113953f80 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3381,13 +3381,17 @@  static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
 #define BPF_F_ADJ_ROOM_ENCAP_L3_MASK	(BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | \
 					 BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
 
+#define BPF_F_ADJ_ROOM_DECAP_L3_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \
+					 BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
+
 #define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO | \
 					 BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
 					 BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
 					 BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
 					 BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \
 					 BPF_F_ADJ_ROOM_ENCAP_L2( \
-					  BPF_ADJ_ROOM_ENCAP_L2_MASK))
+					  BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
+					 BPF_F_ADJ_ROOM_DECAP_L3_MASK)
 
 static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 			    u64 flags)
@@ -3501,6 +3505,7 @@  static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 	int ret;
 
 	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
+			       BPF_F_ADJ_ROOM_DECAP_L3_MASK |
 			       BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
 		return -EINVAL;
 
@@ -3519,6 +3524,14 @@  static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 	if (unlikely(ret < 0))
 		return ret;
 
+	/* Match skb->protocol to new outer l3 protocol */
+	if (skb->protocol == htons(ETH_P_IP) &&
+	    flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
+		skb->protocol = htons(ETH_P_IPV6);
+	else if (skb->protocol == htons(ETH_P_IPV6) &&
+		 flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
+		skb->protocol = htons(ETH_P_IP);
+
 	if (skb_is_gso(skb)) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
 
@@ -3596,6 +3609,10 @@  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 	if (unlikely(proto != htons(ETH_P_IP) &&
 		     proto != htons(ETH_P_IPV6)))
 		return -ENOTSUPP;
+	if (unlikely((shrink && ((flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) ==
+		      BPF_F_ADJ_ROOM_DECAP_L3_MASK)) || (!shrink &&
+		      flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK)))
+		return -EINVAL;
 
 	off = skb_mac_header_len(skb);
 	switch (mode) {
@@ -3608,6 +3625,13 @@  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 		return -ENOTSUPP;
 	}
 
+	if (shrink) {
+		if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
+			len_min = sizeof(struct ipv6hdr);
+		else if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
+			len_min = sizeof(struct iphdr);
+	}
+
 	len_cur = skb->len - skb_network_offset(skb);
 	if ((shrink && (len_diff_abs >= len_cur ||
 			len_cur - len_diff_abs < len_min)) ||
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 464ca3f01fe7..22672e5c8466 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2644,6 +2644,12 @@  union bpf_attr {
  *		  Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the
  *		  L2 type as Ethernet.
  *
+ *              * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
+ *                **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
+ *                Indicate the new IP header version after decapsulating the
+ *                outer IP header. Mainly used in scenarios that the inner and
+ *                outer IP versions are different.
+ *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
  * 		previously done by the verifier are invalidated and must be
@@ -5803,6 +5809,8 @@  enum {
 	BPF_F_ADJ_ROOM_ENCAP_L4_UDP	= (1ULL << 4),
 	BPF_F_ADJ_ROOM_NO_CSUM_RESET	= (1ULL << 5),
 	BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
+	BPF_F_ADJ_ROOM_DECAP_L3_IPV4    = (1ULL << 7),
+	BPF_F_ADJ_ROOM_DECAP_L3_IPV6    = (1ULL << 8),
 };
 
 enum {