Message ID | 20220824044117.137658-3-shmulik.ladkani@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: Support setting variable-length tunnel options | expand |
On Tue, Aug 23, 2022 at 9:41 PM Shmulik Ladkani <shmulik@metanetworks.com> wrote: > > Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given > an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer > size (ARG_CONST_SIZE). > > However, in certain cases we wish to set tunnel options of dynamic > length. > > For example, we have an ebpf program that gets geneve options on > incoming packets, stores them into a map (using a key representing > the incoming flow), and later needs to assign *same* options to > reply packets (belonging to same flow). > > This is currently imposssible without knowing sender's exact geneve > options length, which unfortunately is dymamic. > > Introduce 'bpf_skb_set_tunnel_opt_dynptr'. > > This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic > pointer (ARG_PTR_TO_DYNPTR) parameter 'ptr' whose data points to the > options buffer, and 'len', the byte length of options data caller wishes > to copy into ip_tunnnel_info. > 'len' must never exceed the dynptr's internal size, o/w EINVAL is > returned. > > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> > --- > v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function > v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com> > --- > include/uapi/linux/bpf.h | 12 ++++++++++++ > net/core/filter.c | 36 ++++++++++++++++++++++++++++++++-- > tools/include/uapi/linux/bpf.h | 12 ++++++++++++ > 3 files changed, 58 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 644600dbb114..c7b313e30635 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -5367,6 +5367,17 @@ union bpf_attr { > * Return > * Current *ktime*. > * > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len) why can't we rely on dynptr's len instead of specifying extra one here? dynptr is a range of memory, so just specify that you take that entire range? And then we'll have (or partially already have) generic dynptr helpers to adjust internal dynptr offset and len. > + * Description > + * Set tunnel options metadata for the packet associated to *skb* > + * to the variable length *len* bytes of option data pointed to > + * by the *opt* dynptr. > + * > + * See also the description of the **bpf_skb_get_tunnel_opt**\ () > + * helper for additional information. > + * Return > + * 0 on success, or a negative error in case of failure. > + * > */ [...]
On Thu, 25 Aug 2022 11:20:31 -0700 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len) > > why can't we rely on dynptr's len instead of specifying extra one > here? dynptr is a range of memory, so just specify that you take that > entire range? > > And then we'll have (or partially already have) generic dynptr helpers > to adjust internal dynptr offset and len. Alright. For my usecase I need to use *part* of the tunnel options that were previously stored as a dynptr. Therefore I need to introduce a new bpf helper that adjusts the dynptr. How about this suggestion (sketch, not yet tried): // adjusts the dynptr to point to *len* bytes starting from the // specified *offset* BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len) { int err; u32 size; if (!ptr->data) return -EINVAL; err = bpf_dynptr_check_off_len(ptr, offset, len); if (err) return err; ptr->offset += offset; size = bpf_dynptr_get_size(ptr) - len; ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size; return 0; }
On Tue, 30 Aug 2022 23:02:57 +0300 Shmulik Ladkani <shmulik@metanetworks.com> wrote: > On Thu, 25 Aug 2022 11:20:31 -0700 > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len) > > > > why can't we rely on dynptr's len instead of specifying extra one > > here? dynptr is a range of memory, so just specify that you take that > > entire range? > > > > And then we'll have (or partially already have) generic dynptr helpers > > to adjust internal dynptr offset and len. > > Alright. > > For my usecase I need to use *part* of the tunnel options that were > previously stored as a dynptr. > > Therefore I need to introduce a new bpf helper that adjusts the dynptr. > > How about this suggestion (sketch, not yet tried): > > // adjusts the dynptr to point to *len* bytes starting from the > // specified *offset* > > BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len) > { > int err; > u32 size; > > if (!ptr->data) > return -EINVAL; > > err = bpf_dynptr_check_off_len(ptr, offset, len); > if (err) > return err; > > ptr->offset += offset; > size = bpf_dynptr_get_size(ptr) - len; > ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size; > return 0; > } > Correction, meant: ptr->offset += offset; ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | len;
On Tue, Aug 30, 2022 at 1:13 PM Shmulik Ladkani <shmulik@metanetworks.com> wrote: > > On Tue, 30 Aug 2022 23:02:57 +0300 > Shmulik Ladkani <shmulik@metanetworks.com> wrote: > > > On Thu, 25 Aug 2022 11:20:31 -0700 > > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len) > > > > > > why can't we rely on dynptr's len instead of specifying extra one > > > here? dynptr is a range of memory, so just specify that you take that > > > entire range? > > > > > > And then we'll have (or partially already have) generic dynptr helpers > > > to adjust internal dynptr offset and len. > > > > Alright. > > > > For my usecase I need to use *part* of the tunnel options that were > > previously stored as a dynptr. > > > > Therefore I need to introduce a new bpf helper that adjusts the dynptr. > > > > How about this suggestion (sketch, not yet tried): > > > > // adjusts the dynptr to point to *len* bytes starting from the > > // specified *offset* > > > > BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len) > > { > > int err; > > u32 size; > > > > if (!ptr->data) > > return -EINVAL; > > > > err = bpf_dynptr_check_off_len(ptr, offset, len); > > if (err) > > return err; > > > > ptr->offset += offset; > > size = bpf_dynptr_get_size(ptr) - len; > > ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size; > > return 0; > > } > > > > Correction, meant: > ptr->offset += offset; > ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | len; > I have a patchset for dynptr convenience helpers that I plan to tidy up and push out later this week or mid-next week. it includes "bpf_dynptr_advance()" and "bpf_dynptr_trim()", which will let you adjust the dynptr ("_advance()" advances the offset, "_trim()" trims the size)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 644600dbb114..c7b313e30635 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5367,6 +5367,17 @@ union bpf_attr { * Return * Current *ktime*. * + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len) + * Description + * Set tunnel options metadata for the packet associated to *skb* + * to the variable length *len* bytes of option data pointed to + * by the *opt* dynptr. + * + * See also the description of the **bpf_skb_get_tunnel_opt**\ () + * helper for additional information. + * Return + * 0 on success, or a negative error in case of failure. + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5578,6 +5589,7 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ FN(ktime_get_tai_ns), \ + FN(skb_set_tunnel_opt_dynptr), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/net/core/filter.c b/net/core/filter.c index 63e25d8ce501..1407c87ba6ac 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4669,8 +4669,7 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = { .arg4_type = ARG_ANYTHING, }; -BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb, - const u8 *, from, u32, size) +static u64 __bpf_skb_set_tunopt(struct sk_buff *skb, const u8 *from, u32 size) { struct ip_tunnel_info *info = skb_tunnel_info(skb); const struct metadata_dst *md = this_cpu_ptr(md_dst); @@ -4685,6 +4684,26 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb, return 0; } +BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb, + const u8 *, from, u32, size) +{ + return __bpf_skb_set_tunopt(skb, from, size); +} + +BPF_CALL_3(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb, + struct bpf_dynptr_kern *, ptr, u32, len) +{ + const u8 *from; + u32 avail; + + from = bpf_dynptr_get_data(ptr, &avail); + if (unlikely(!from)) + return -EFAULT; + if (unlikely(len > avail)) + return -EINVAL; + return __bpf_skb_set_tunopt(skb, from, len); +} + static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = { .func = bpf_skb_set_tunnel_opt, .gpl_only = false, @@ -4694,6 +4713,15 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = { .arg3_type = ARG_CONST_SIZE, }; +static const struct bpf_func_proto bpf_skb_set_tunnel_opt_dynptr_proto = { + .func = bpf_skb_set_tunnel_opt_dynptr, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL, + .arg3_type = ARG_ANYTHING, +}; + static const struct bpf_func_proto * bpf_get_skb_set_tunnel_proto(enum bpf_func_id which) { @@ -4714,6 +4742,8 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which) return &bpf_skb_set_tunnel_key_proto; case BPF_FUNC_skb_set_tunnel_opt: return &bpf_skb_set_tunnel_opt_proto; + case BPF_FUNC_skb_set_tunnel_opt_dynptr: + return &bpf_skb_set_tunnel_opt_dynptr_proto; default: return NULL; } @@ -7808,6 +7838,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_skb_get_tunnel_opt: return &bpf_skb_get_tunnel_opt_proto; case BPF_FUNC_skb_set_tunnel_opt: + case BPF_FUNC_skb_set_tunnel_opt_dynptr: return bpf_get_skb_set_tunnel_proto(func_id); case BPF_FUNC_redirect: return &bpf_redirect_proto; @@ -8155,6 +8186,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_skb_get_tunnel_opt: return &bpf_skb_get_tunnel_opt_proto; case BPF_FUNC_skb_set_tunnel_opt: + case BPF_FUNC_skb_set_tunnel_opt_dynptr: return bpf_get_skb_set_tunnel_proto(func_id); case BPF_FUNC_redirect: return &bpf_redirect_proto; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 4fb685591035..6a032a8a5fef 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5367,6 +5367,17 @@ union bpf_attr { * Return * Current *ktime*. * + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len) + * Description + * Set tunnel options metadata for the packet associated to *skb* + * to the variable length *len* bytes of option data pointed to + * by the *opt* dynptr. + * + * See also the description of the **bpf_skb_get_tunnel_opt**\ () + * helper for additional information. + * Return + * 0 on success, or a negative error in case of failure. + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5578,6 +5589,7 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ FN(ktime_get_tai_ns), \ + FN(skb_set_tunnel_opt_dynptr), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper
Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer size (ARG_CONST_SIZE). However, in certain cases we wish to set tunnel options of dynamic length. For example, we have an ebpf program that gets geneve options on incoming packets, stores them into a map (using a key representing the incoming flow), and later needs to assign *same* options to reply packets (belonging to same flow). This is currently imposssible without knowing sender's exact geneve options length, which unfortunately is dymamic. Introduce 'bpf_skb_set_tunnel_opt_dynptr'. This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic pointer (ARG_PTR_TO_DYNPTR) parameter 'ptr' whose data points to the options buffer, and 'len', the byte length of options data caller wishes to copy into ip_tunnnel_info. 'len' must never exceed the dynptr's internal size, o/w EINVAL is returned. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> --- v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com> --- include/uapi/linux/bpf.h | 12 ++++++++++++ net/core/filter.c | 36 ++++++++++++++++++++++++++++++++-- tools/include/uapi/linux/bpf.h | 12 ++++++++++++ 3 files changed, 58 insertions(+), 2 deletions(-)