Message ID | 20220831144010.174110-1-shmulik.ladkani@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 44c51472bef83bb70b43e2f4b7a592096f32a855 |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,1/2] bpf: Support getting tunnel flags | expand |
On 8/31/22 4:40 PM, Shmulik Ladkani wrote: > Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters > (id, ttl, tos, local and remote) but does not expose ip_tunnel_info's > tun_flags to the bpf program. > > It makes sense to expose tun_flags to the bpf program. > > Assume for example multiple GRE tunnels maintained on a single GRE > interface in collect_md mode. The program expects origins to initiate > over GRE, however different origins use different GRE characteristics > (e.g. some prefer to use GRE checksum, some do not; some pass a GRE key, > some do not, etc..). > > A bpf program getting tun_flags can therefore remember the relevant > flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote. > In the reply path, the program can use 'bpf_skb_set_tunnel_key' in order > to correctly reply to the remote, using similar characteristics, based on > the stored tunnel flags. > > Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key. > If specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags. > > Decided to use the existing unused 'tunnel_ext' as the storage for the > 'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout. > > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> The set looks good to me, one comment below.. > --- > include/uapi/linux/bpf.h | 10 +++++++++- > net/core/filter.c | 8 ++++++-- > tools/include/uapi/linux/bpf.h | 10 +++++++++- > 3 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 962960a98835..837c0f9b7fdd 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -5659,6 +5659,11 @@ enum { > BPF_F_SEQ_NUMBER = (1ULL << 3), > }; > > +/* BPF_FUNC_skb_get_tunnel_key flags. */ > +enum { > + BPF_F_TUNINFO_FLAGS = (1ULL << 4), > +}; > + > /* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and > * BPF_FUNC_perf_event_read_value flags. > */ > @@ -5848,7 +5853,10 @@ struct bpf_tunnel_key { > }; > __u8 tunnel_tos; > __u8 tunnel_ttl; > - __u16 tunnel_ext; /* Padding, future use. */ > + union { > + __u16 tunnel_ext; /* compat */ > + __be16 tunnel_flags; > + }; > __u32 tunnel_label; > union { > __u32 local_ipv4; > diff --git a/net/core/filter.c b/net/core/filter.c > index 63e25d8ce501..74e2a4a0d747 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -4488,7 +4488,8 @@ BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key > void *to_orig = to; > int err; > > - if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6)))) { > + if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6 | > + BPF_F_TUNINFO_FLAGS)))) { > err = -EINVAL; > goto err_clear; > } > @@ -4520,7 +4521,10 @@ BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key > to->tunnel_id = be64_to_cpu(info->key.tun_id); > to->tunnel_tos = info->key.tos; > to->tunnel_ttl = info->key.ttl; > - to->tunnel_ext = 0; > + if (flags & BPF_F_TUNINFO_FLAGS) > + to->tunnel_flags = info->key.tun_flags; > + else > + to->tunnel_ext = 0; The bpf_skb_set_tunnel_key() helper has a number of flags we pass in, e.g. BPF_F_ZERO_CSUM_TX, BPF_F_DONT_FRAGMENT, BPF_F_SEQ_NUMBER, and then based on those flags we set: [...] info->key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE; if (flags & BPF_F_DONT_FRAGMENT) info->key.tun_flags |= TUNNEL_DONT_FRAGMENT; if (flags & BPF_F_ZERO_CSUM_TX) info->key.tun_flags &= ~TUNNEL_CSUM; if (flags & BPF_F_SEQ_NUMBER) info->key.tun_flags |= TUNNEL_SEQ; [...] Should we similarly only expose those which are interesting/relevant to BPF program authors as a __u16 tunnel_flags and not the whole set? Which ones do you have a need for? TUNNEL_SEQ, TUNNEL_CSUM, TUNNEL_KEY, and then the TUNNEL_OPTIONS_PRESENT? Thanks, Daniel
On Wed, 31 Aug 2022 22:46:15 +0200 Daniel Borkmann <daniel@iogearbox.net> wrote: > The bpf_skb_set_tunnel_key() helper has a number of flags we pass in, e.g. > BPF_F_ZERO_CSUM_TX, BPF_F_DONT_FRAGMENT, BPF_F_SEQ_NUMBER, and then based on > those flags we set: > > [...] > info->key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE; > if (flags & BPF_F_DONT_FRAGMENT) > info->key.tun_flags |= TUNNEL_DONT_FRAGMENT; > if (flags & BPF_F_ZERO_CSUM_TX) > info->key.tun_flags &= ~TUNNEL_CSUM; > if (flags & BPF_F_SEQ_NUMBER) > info->key.tun_flags |= TUNNEL_SEQ; > [...] > > Should we similarly only expose those which are interesting/relevant to BPF > program authors as a __u16 tunnel_flags and not the whole set? Which ones > do you have a need for? TUNNEL_SEQ, TUNNEL_CSUM, TUNNEL_KEY, and then the > TUNNEL_OPTIONS_PRESENT? Indeed, I noticed this and considered various approaches: 1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy and place into the new 'tunnel_flags' field. This has 2 drawbacks: - The BPF_F_yyy flags are from *set_tunnel_key* enumeration space, e.g. BPF_F_ZERO_CSUM_TX. I find it awkward that it is "returned" into tunnel_flags from a *get_tunnel_key* call. - Not all "interesting" TUNNEL_xxx flags can be mapped to existing BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy flags just for purposes of the returned tunnel_flags. 2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only "interesting" flags. That's ok, but the drawback is that what's "intersting" for my usecase might be limiting for other usecases. Therefore I decided to expose what's in key.tun_flags *as is*, which seems most flexible. The bpf user can just choose to ingore bits he's not interested in. The TUNNEL_xxx are uapi, so no harm exposing them back in the get_tunnel_key call. WDYT? Best, Shmulik
On 9/1/22 8:10 AM, Shmulik Ladkani wrote: > On Wed, 31 Aug 2022 22:46:15 +0200 > Daniel Borkmann <daniel@iogearbox.net> wrote: > >> The bpf_skb_set_tunnel_key() helper has a number of flags we pass in, e.g. >> BPF_F_ZERO_CSUM_TX, BPF_F_DONT_FRAGMENT, BPF_F_SEQ_NUMBER, and then based on >> those flags we set: >> >> [...] >> info->key.tun_flags = TUNNEL_KEY | TUNNEL_CSUM | TUNNEL_NOCACHE; >> if (flags & BPF_F_DONT_FRAGMENT) >> info->key.tun_flags |= TUNNEL_DONT_FRAGMENT; >> if (flags & BPF_F_ZERO_CSUM_TX) >> info->key.tun_flags &= ~TUNNEL_CSUM; >> if (flags & BPF_F_SEQ_NUMBER) >> info->key.tun_flags |= TUNNEL_SEQ; >> [...] >> >> Should we similarly only expose those which are interesting/relevant to BPF >> program authors as a __u16 tunnel_flags and not the whole set? Which ones >> do you have a need for? TUNNEL_SEQ, TUNNEL_CSUM, TUNNEL_KEY, and then the >> TUNNEL_OPTIONS_PRESENT? > > Indeed, I noticed this and considered various approaches: > > 1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy > and place into the new 'tunnel_flags' field. > This has 2 drawbacks: > - The BPF_F_yyy flags are from *set_tunnel_key* enumeration space, > e.g. BPF_F_ZERO_CSUM_TX. > I find it awkward that it is "returned" into tunnel_flags from a > *get_tunnel_key* call. > - Not all "interesting" TUNNEL_xxx flags can be mapped to existing > BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy > flags just for purposes of the returned tunnel_flags. > > 2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only > "interesting" flags. > That's ok, but the drawback is that what's "intersting" for my > usecase might be limiting for other usecases. > > Therefore I decided to expose what's in key.tun_flags *as is*, which seems > most flexible. The bpf user can just choose to ingore bits he's not > interested in. The TUNNEL_xxx are uapi, so no harm exposing them back in > the get_tunnel_key call. > > WDYT? Yes, the argumentation sounds reasonable to me. I've added this note to the commit message to reflect the design decision, and applied it to bpf-next. Thanks Shmulik!
Hello: This series was applied to bpf/bpf-next.git (master) by Daniel Borkmann <daniel@iogearbox.net>: On Wed, 31 Aug 2022 17:40:09 +0300 you wrote: > Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters > (id, ttl, tos, local and remote) but does not expose ip_tunnel_info's > tun_flags to the bpf program. > > It makes sense to expose tun_flags to the bpf program. > > Assume for example multiple GRE tunnels maintained on a single GRE > interface in collect_md mode. The program expects origins to initiate > over GRE, however different origins use different GRE characteristics > (e.g. some prefer to use GRE checksum, some do not; some pass a GRE key, > some do not, etc..). > > [...] Here is the summary with links: - [bpf-next,1/2] bpf: Support getting tunnel flags https://git.kernel.org/bpf/bpf-next/c/44c51472bef8 - [bpf-next,2/2] selftests/bpf: Amend test_tunnel to exercise BPF_F_TUNINFO_FLAGS https://git.kernel.org/bpf/bpf-next/c/8cc61b7a6416 You are awesome, thank you!
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 962960a98835..837c0f9b7fdd 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5659,6 +5659,11 @@ enum { BPF_F_SEQ_NUMBER = (1ULL << 3), }; +/* BPF_FUNC_skb_get_tunnel_key flags. */ +enum { + BPF_F_TUNINFO_FLAGS = (1ULL << 4), +}; + /* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and * BPF_FUNC_perf_event_read_value flags. */ @@ -5848,7 +5853,10 @@ struct bpf_tunnel_key { }; __u8 tunnel_tos; __u8 tunnel_ttl; - __u16 tunnel_ext; /* Padding, future use. */ + union { + __u16 tunnel_ext; /* compat */ + __be16 tunnel_flags; + }; __u32 tunnel_label; union { __u32 local_ipv4; diff --git a/net/core/filter.c b/net/core/filter.c index 63e25d8ce501..74e2a4a0d747 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4488,7 +4488,8 @@ BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key void *to_orig = to; int err; - if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6)))) { + if (unlikely(!info || (flags & ~(BPF_F_TUNINFO_IPV6 | + BPF_F_TUNINFO_FLAGS)))) { err = -EINVAL; goto err_clear; } @@ -4520,7 +4521,10 @@ BPF_CALL_4(bpf_skb_get_tunnel_key, struct sk_buff *, skb, struct bpf_tunnel_key to->tunnel_id = be64_to_cpu(info->key.tun_id); to->tunnel_tos = info->key.tos; to->tunnel_ttl = info->key.ttl; - to->tunnel_ext = 0; + if (flags & BPF_F_TUNINFO_FLAGS) + to->tunnel_flags = info->key.tun_flags; + else + to->tunnel_ext = 0; if (flags & BPF_F_TUNINFO_IPV6) { memcpy(to->remote_ipv6, &info->key.u.ipv6.src, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index f4ba82a1eace..793103b10eab 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5659,6 +5659,11 @@ enum { BPF_F_SEQ_NUMBER = (1ULL << 3), }; +/* BPF_FUNC_skb_get_tunnel_key flags. */ +enum { + BPF_F_TUNINFO_FLAGS = (1ULL << 4), +}; + /* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and * BPF_FUNC_perf_event_read_value flags. */ @@ -5848,7 +5853,10 @@ struct bpf_tunnel_key { }; __u8 tunnel_tos; __u8 tunnel_ttl; - __u16 tunnel_ext; /* Padding, future use. */ + union { + __u16 tunnel_ext; /* compat */ + __be16 tunnel_flags; + }; __u32 tunnel_label; union { __u32 local_ipv4;
Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters (id, ttl, tos, local and remote) but does not expose ip_tunnel_info's tun_flags to the bpf program. It makes sense to expose tun_flags to the bpf program. Assume for example multiple GRE tunnels maintained on a single GRE interface in collect_md mode. The program expects origins to initiate over GRE, however different origins use different GRE characteristics (e.g. some prefer to use GRE checksum, some do not; some pass a GRE key, some do not, etc..). A bpf program getting tun_flags can therefore remember the relevant flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote. In the reply path, the program can use 'bpf_skb_set_tunnel_key' in order to correctly reply to the remote, using similar characteristics, based on the stored tunnel flags. Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key. If specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags. Decided to use the existing unused 'tunnel_ext' as the storage for the 'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> --- include/uapi/linux/bpf.h | 10 +++++++++- net/core/filter.c | 8 ++++++-- tools/include/uapi/linux/bpf.h | 10 +++++++++- 3 files changed, 24 insertions(+), 4 deletions(-)