diff mbox series

[bpf-next,1/2] bpf: Support getting tunnel flags

Message ID 20220831144010.174110-1-shmulik.ladkani@gmail.com (mailing list archive)
State Accepted
Commit 44c51472bef83bb70b43e2f4b7a592096f32a855
Delegated to: BPF
Headers show
Series [bpf-next,1/2] bpf: Support getting tunnel flags | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1697 this patch: 1697
netdev/cc_maintainers warning 13 maintainers not CCed: edumazet@google.com john.fastabend@gmail.com jolsa@kernel.org pabeni@redhat.com song@kernel.org yhs@fb.com netdev@vger.kernel.org haoluo@google.com martin.lau@linux.dev kuba@kernel.org kpsingh@kernel.org davem@davemloft.net sdf@google.com
netdev/build_clang success Errors and warnings before: 173 this patch: 173
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1691 this patch: 1691
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Shmulik Ladkani <shmulik@metanetworks.com>' != 'Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix

Commit Message

Shmulik Ladkani Aug. 31, 2022, 2:40 p.m. UTC
Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters
(id, ttl, tos, local and remote) but does not expose ip_tunnel_info's
tun_flags to the bpf program.

It makes sense to expose tun_flags to the bpf program.

Assume for example multiple GRE tunnels maintained on a single GRE
interface in collect_md mode. The program expects origins to initiate
over GRE, however different origins use different GRE characteristics
(e.g. some prefer to use GRE checksum, some do not; some pass a GRE key,
some do not, etc..).

A bpf program getting tun_flags can therefore remember the relevant
flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote.
In the reply path, the program can use 'bpf_skb_set_tunnel_key' in order
to correctly reply to the remote, using similar characteristics, based on
the stored tunnel flags.

Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key.
If specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags.

Decided to use the existing unused 'tunnel_ext' as the storage for the
'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 include/uapi/linux/bpf.h       | 10 +++++++++-
 net/core/filter.c              |  8 ++++++--
 tools/include/uapi/linux/bpf.h | 10 +++++++++-
 3 files changed, 24 insertions(+), 4 deletions(-)

Comments

Daniel Borkmann Aug. 31, 2022, 8:46 p.m. UTC | #1
On 8/31/22 4:40 PM, Shmulik Ladkani wrote:
> Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters
> (id, ttl, tos, local and remote) but does not expose ip_tunnel_info's
> tun_flags to the bpf program.
> 
> It makes sense to expose tun_flags to the bpf program.
> 
> Assume for example multiple GRE tunnels maintained on a single GRE
> interface in collect_md mode. The program expects origins to initiate
> over GRE, however different origins use different GRE characteristics
> (e.g. some prefer to use GRE checksum, some do not; some pass a GRE key,
> some do not, etc..).
> 
> A bpf program getting tun_flags can therefore remember the relevant
> flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote.
> In the reply path, the program can use 'bpf_skb_set_tunnel_key' in order
> to correctly reply to the remote, using similar characteristics, based on
> the stored tunnel flags.
> 
> Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key.
> If specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags.
> 
> Decided to use the existing unused 'tunnel_ext' as the storage for the
> 'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

The set looks good to me, one comment below..

> ---
>   include/uapi/linux/bpf.h       | 10 +++++++++-
>   net/core/filter.c              |  8 ++++++--
>   tools/include/uapi/linux/bpf.h | 10 +++++++++-
>   3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 962960a98835..837c0f9b7fdd 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5659,6 +5659,11 @@ enum {
>   	BPF_F_SEQ_NUMBER		= (1ULL << 3),
>   };
>   
> +/* BPF_FUNC_skb_get_tunnel_key flags. */
> +enum {
> +	BPF_F_TUNINFO_FLAGS		= (1ULL << 4),
> +};
> +
>   /* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
>    * BPF_FUNC_perf_event_read_value flags.
>    */
> @@ -5848,7 +5853,10 @@ struct bpf_tunnel_key {
>   	};
>   	__u8 tunnel_tos;
>   	__u8 tunnel_ttl;
> -	__u16 tunnel_ext;	/* Padding, future use. */
> +	union {
> +		__u16 tunnel_ext;	/* compat */
> +		__be16 tunnel_flags;
> +	};
>   	__u32 tunnel_label;
>   	union {
>   		__u32 local_ipv4;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 63e25d8ce501..74e2a4a0d747 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4488,7 +4488,8 @@ BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key
>   	void *to_orig = to;
>   	int err;
>   
> -	if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6)))) {
> +	if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6 |
> +					 BPF_F_TUNINFO_FLAGS)))) {
>   		err = -EINVAL;
>   		goto err_clear;
>   	}
> @@ -4520,7 +4521,10 @@ BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key
>   	to->tunnel_id = be64_to_cpu(info->key.tun_id);
>   	to->tunnel_tos = info->key.tos;
>   	to->tunnel_ttl = info->key.ttl;
> -	to->tunnel_ext = 0;
> +	if (flags & BPF_F_TUNINFO_FLAGS)
> +		to->tunnel_flags = info->key.tun_flags;
> +	else
> +		to->tunnel_ext = 0;

The bpf_skb_set_tunnel_key() helper has a number of flags we pass in, e.g.
BPF_F_ZERO_CSUM_TX, BPF_F_DONT_FRAGMENT, BPF_F_SEQ_NUMBER, and then based on
those flags we set:

   [...]
   info->key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE;
   if (flags & BPF_F_DONT_FRAGMENT)
           info->key.tun_flags |= TUNNEL_DONT_FRAGMENT;
   if (flags & BPF_F_ZERO_CSUM_TX)
           info->key.tun_flags &= ~TUNNEL_CSUM;
   if (flags & BPF_F_SEQ_NUMBER)
           info->key.tun_flags |= TUNNEL_SEQ;
   [...]

Should we similarly only expose those which are interesting/relevant to BPF
program authors as a __u16 tunnel_flags and not the whole set? Which ones
do you have a need for? TUNNEL_SEQ, TUNNEL_CSUM, TUNNEL_KEY, and then the
TUNNEL_OPTIONS_PRESENT?

Thanks,
Daniel
Shmulik Ladkani Sept. 1, 2022, 6:10 a.m. UTC | #2
On Wed, 31 Aug 2022 22:46:15 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> The bpf_skb_set_tunnel_key() helper has a number of flags we pass in, e.g.
> BPF_F_ZERO_CSUM_TX, BPF_F_DONT_FRAGMENT, BPF_F_SEQ_NUMBER, and then based on
> those flags we set:
> 
>    [...]
>    info->key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE;
>    if (flags & BPF_F_DONT_FRAGMENT)
>            info->key.tun_flags |= TUNNEL_DONT_FRAGMENT;
>    if (flags & BPF_F_ZERO_CSUM_TX)
>            info->key.tun_flags &= ~TUNNEL_CSUM;
>    if (flags & BPF_F_SEQ_NUMBER)
>            info->key.tun_flags |= TUNNEL_SEQ;
>    [...]
> 
> Should we similarly only expose those which are interesting/relevant to BPF
> program authors as a __u16 tunnel_flags and not the whole set? Which ones
> do you have a need for? TUNNEL_SEQ, TUNNEL_CSUM, TUNNEL_KEY, and then the
> TUNNEL_OPTIONS_PRESENT?

Indeed, I noticed this and considered various approaches:

1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy
and place into the new 'tunnel_flags' field.
This has 2 drawbacks:
 - The BPF_F_yyy flags are from *set_tunnel_key* enumeration space,
   e.g. BPF_F_ZERO_CSUM_TX.
   I find it awkward that it is "returned" into tunnel_flags from a
   *get_tunnel_key* call.
 - Not all "interesting" TUNNEL_xxx flags can be mapped to existing
   BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy
   flags just for purposes of the returned tunnel_flags.

2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only
   "interesting" flags.
   That's ok, but the drawback is that what's "intersting" for my
   usecase might be limiting for other usecases.

Therefore I decided to expose what's in key.tun_flags *as is*, which seems
most flexible. The bpf user can just choose to ingore bits he's not
interested in. The TUNNEL_xxx are uapi, so no harm exposing them back in
the get_tunnel_key call.

WDYT?

Best,
Shmulik
Daniel Borkmann Sept. 2, 2022, 1:26 p.m. UTC | #3
On 9/1/22 8:10 AM, Shmulik Ladkani wrote:
> On Wed, 31 Aug 2022 22:46:15 +0200
> Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
>> The bpf_skb_set_tunnel_key() helper has a number of flags we pass in, e.g.
>> BPF_F_ZERO_CSUM_TX, BPF_F_DONT_FRAGMENT, BPF_F_SEQ_NUMBER, and then based on
>> those flags we set:
>>
>>     [...]
>>     info->key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE;
>>     if (flags & BPF_F_DONT_FRAGMENT)
>>             info->key.tun_flags |= TUNNEL_DONT_FRAGMENT;
>>     if (flags & BPF_F_ZERO_CSUM_TX)
>>             info->key.tun_flags &= ~TUNNEL_CSUM;
>>     if (flags & BPF_F_SEQ_NUMBER)
>>             info->key.tun_flags |= TUNNEL_SEQ;
>>     [...]
>>
>> Should we similarly only expose those which are interesting/relevant to BPF
>> program authors as a __u16 tunnel_flags and not the whole set? Which ones
>> do you have a need for? TUNNEL_SEQ, TUNNEL_CSUM, TUNNEL_KEY, and then the
>> TUNNEL_OPTIONS_PRESENT?
> 
> Indeed, I noticed this and considered various approaches:
> 
> 1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy
> and place into the new 'tunnel_flags' field.
> This has 2 drawbacks:
>   - The BPF_F_yyy flags are from *set_tunnel_key* enumeration space,
>     e.g. BPF_F_ZERO_CSUM_TX.
>     I find it awkward that it is "returned" into tunnel_flags from a
>     *get_tunnel_key* call.
>   - Not all "interesting" TUNNEL_xxx flags can be mapped to existing
>     BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy
>     flags just for purposes of the returned tunnel_flags.
> 
> 2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only
>     "interesting" flags.
>     That's ok, but the drawback is that what's "intersting" for my
>     usecase might be limiting for other usecases.
> 
> Therefore I decided to expose what's in key.tun_flags *as is*, which seems
> most flexible. The bpf user can just choose to ingore bits he's not
> interested in. The TUNNEL_xxx are uapi, so no harm exposing them back in
> the get_tunnel_key call.
> 
> WDYT?

Yes, the argumentation sounds reasonable to me. I've added this note to the
commit message to reflect the design decision, and applied it to bpf-next.

Thanks Shmulik!
patchwork-bot+netdevbpf@kernel.org Sept. 2, 2022, 1:30 p.m. UTC | #4
Hello:

This series was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Wed, 31 Aug 2022 17:40:09 +0300 you wrote:
> Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters
> (id, ttl, tos, local and remote) but does not expose ip_tunnel_info's
> tun_flags to the bpf program.
> 
> It makes sense to expose tun_flags to the bpf program.
> 
> Assume for example multiple GRE tunnels maintained on a single GRE
> interface in collect_md mode. The program expects origins to initiate
> over GRE, however different origins use different GRE characteristics
> (e.g. some prefer to use GRE checksum, some do not; some pass a GRE key,
> some do not, etc..).
> 
> [...]

Here is the summary with links:
  - [bpf-next,1/2] bpf: Support getting tunnel flags
    https://git.kernel.org/bpf/bpf-next/c/44c51472bef8
  - [bpf-next,2/2] selftests/bpf: Amend test_tunnel to exercise BPF_F_TUNINFO_FLAGS
    https://git.kernel.org/bpf/bpf-next/c/8cc61b7a6416

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 962960a98835..837c0f9b7fdd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5659,6 +5659,11 @@  enum {
 	BPF_F_SEQ_NUMBER		= (1ULL << 3),
 };
 
+/* BPF_FUNC_skb_get_tunnel_key flags. */
+enum {
+	BPF_F_TUNINFO_FLAGS		= (1ULL << 4),
+};
+
 /* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
  * BPF_FUNC_perf_event_read_value flags.
  */
@@ -5848,7 +5853,10 @@  struct bpf_tunnel_key {
 	};
 	__u8 tunnel_tos;
 	__u8 tunnel_ttl;
-	__u16 tunnel_ext;	/* Padding, future use. */
+	union {
+		__u16 tunnel_ext;	/* compat */
+		__be16 tunnel_flags;
+	};
 	__u32 tunnel_label;
 	union {
 		__u32 local_ipv4;
diff --git a/net/core/filter.c b/net/core/filter.c
index 63e25d8ce501..74e2a4a0d747 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4488,7 +4488,8 @@  BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key
 	void *to_orig = to;
 	int err;
 
-	if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6)))) {
+	if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6 |
+					 BPF_F_TUNINFO_FLAGS)))) {
 		err = -EINVAL;
 		goto err_clear;
 	}
@@ -4520,7 +4521,10 @@  BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key
 	to->tunnel_id = be64_to_cpu(info->key.tun_id);
 	to->tunnel_tos = info->key.tos;
 	to->tunnel_ttl = info->key.ttl;
-	to->tunnel_ext = 0;
+	if (flags & BPF_F_TUNINFO_FLAGS)
+		to->tunnel_flags = info->key.tun_flags;
+	else
+		to->tunnel_ext = 0;
 
 	if (flags & BPF_F_TUNINFO_IPV6) {
 		memcpy(to->remote_ipv6, &info->key.u.ipv6.src,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f4ba82a1eace..793103b10eab 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5659,6 +5659,11 @@  enum {
 	BPF_F_SEQ_NUMBER		= (1ULL << 3),
 };
 
+/* BPF_FUNC_skb_get_tunnel_key flags. */
+enum {
+	BPF_F_TUNINFO_FLAGS		= (1ULL << 4),
+};
+
 /* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
  * BPF_FUNC_perf_event_read_value flags.
  */
@@ -5848,7 +5853,10 @@  struct bpf_tunnel_key {
 	};
 	__u8 tunnel_tos;
 	__u8 tunnel_ttl;
-	__u16 tunnel_ext;	/* Padding, future use. */
+	union {
+		__u16 tunnel_ext;	/* compat */
+		__be16 tunnel_flags;
+	};
 	__u32 tunnel_label;
 	union {
 		__u32 local_ipv4;