diff mbox series

[v5,bpf-next,2/4] bpf: Support setting variable-length tunnel options

Message ID 20220824044117.137658-3-shmulik.ladkani@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: Support setting variable-length tunnel options | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 105798 this patch: 105798
netdev/cc_maintainers warning 12 maintainers not CCed: edumazet@google.com jolsa@kernel.org pabeni@redhat.com song@kernel.org yhs@fb.com netdev@vger.kernel.org haoluo@google.com martin.lau@linux.dev kuba@kernel.org kpsingh@kernel.org davem@davemloft.net sdf@google.com
netdev/build_clang success Errors and warnings before: 173 this patch: 173
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 107672 this patch: 107672
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Shmulik Ladkani <shmulik@metanetworks.com>' != 'Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>' WARNING: line length of 91 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix

Commit Message

Shmulik Ladkani Aug. 24, 2022, 4:41 a.m. UTC
Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given
an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer
size (ARG_CONST_SIZE).

However, in certain cases we wish to set tunnel options of dynamic
length.

For example, we have an ebpf program that gets geneve options on
incoming packets, stores them into a map (using a key representing
the incoming flow), and later needs to assign *same* options to
reply packets (belonging to same flow).

This is currently imposssible without knowing sender's exact geneve
options length, which unfortunately is dymamic.

Introduce 'bpf_skb_set_tunnel_opt_dynptr'.

This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic
pointer (ARG_PTR_TO_DYNPTR) parameter 'ptr' whose data points to the
options buffer, and 'len', the byte length of options data caller wishes
to copy into ip_tunnnel_info.
'len' must never exceed the dynptr's internal size, o/w EINVAL is
returned.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function
v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf.h       | 12 ++++++++++++
 net/core/filter.c              | 36 ++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 12 ++++++++++++
 3 files changed, 58 insertions(+), 2 deletions(-)

Comments

Andrii Nakryiko Aug. 25, 2022, 6:20 p.m. UTC | #1
On Tue, Aug 23, 2022 at 9:41 PM Shmulik Ladkani
<shmulik@metanetworks.com> wrote:
>
> Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given
> an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer
> size (ARG_CONST_SIZE).
>
> However, in certain cases we wish to set tunnel options of dynamic
> length.
>
> For example, we have an ebpf program that gets geneve options on
> incoming packets, stores them into a map (using a key representing
> the incoming flow), and later needs to assign *same* options to
> reply packets (belonging to same flow).
>
> This is currently imposssible without knowing sender's exact geneve
> options length, which unfortunately is dymamic.
>
> Introduce 'bpf_skb_set_tunnel_opt_dynptr'.
>
> This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic
> pointer (ARG_PTR_TO_DYNPTR) parameter 'ptr' whose data points to the
> options buffer, and 'len', the byte length of options data caller wishes
> to copy into ip_tunnnel_info.
> 'len' must never exceed the dynptr's internal size, o/w EINVAL is
> returned.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
> v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function
> v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com>
> ---
>  include/uapi/linux/bpf.h       | 12 ++++++++++++
>  net/core/filter.c              | 36 ++++++++++++++++++++++++++++++++--
>  tools/include/uapi/linux/bpf.h | 12 ++++++++++++
>  3 files changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 644600dbb114..c7b313e30635 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5367,6 +5367,17 @@ union bpf_attr {
>   *     Return
>   *             Current *ktime*.
>   *
> + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)

why can't we rely on dynptr's len instead of specifying extra one
here? dynptr is a range of memory, so just specify that you take that
entire range?

And then we'll have (or partially already have) generic dynptr helpers
to adjust internal dynptr offset and len.

> + *     Description
> + *             Set tunnel options metadata for the packet associated to *skb*
> + *             to the variable length *len* bytes of option data pointed to
> + *             by the *opt* dynptr.
> + *
> + *             See also the description of the **bpf_skb_get_tunnel_opt**\ ()
> + *             helper for additional information.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
>   */

[...]
Shmulik Ladkani Aug. 30, 2022, 8:02 p.m. UTC | #2
On Thu, 25 Aug 2022 11:20:31 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)  
> 
> why can't we rely on dynptr's len instead of specifying extra one
> here? dynptr is a range of memory, so just specify that you take that
> entire range?
> 
> And then we'll have (or partially already have) generic dynptr helpers
> to adjust internal dynptr offset and len.

Alright.

For my usecase I need to use *part* of the tunnel options that were
previously stored as a dynptr.

Therefore I need to introduce a new bpf helper that adjusts the dynptr.

How about this suggestion (sketch, not yet tried):

// adjusts the dynptr to point to *len* bytes starting from the
// specified *offset*

BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
{
	int err;
        u32 size;

	if (!ptr->data)
		return -EINVAL;

	err = bpf_dynptr_check_off_len(ptr, offset, len);
	if (err)
		return err;

	ptr->offset += offset;
	size = bpf_dynptr_get_size(ptr) - len;
	ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size;
	return 0;
}
Shmulik Ladkani Aug. 30, 2022, 8:13 p.m. UTC | #3
On Tue, 30 Aug 2022 23:02:57 +0300
Shmulik Ladkani <shmulik@metanetworks.com> wrote:

> On Thu, 25 Aug 2022 11:20:31 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> > > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)    
> > 
> > why can't we rely on dynptr's len instead of specifying extra one
> > here? dynptr is a range of memory, so just specify that you take that
> > entire range?
> > 
> > And then we'll have (or partially already have) generic dynptr helpers
> > to adjust internal dynptr offset and len.  
> 
> Alright.
> 
> For my usecase I need to use *part* of the tunnel options that were
> previously stored as a dynptr.
> 
> Therefore I need to introduce a new bpf helper that adjusts the dynptr.
> 
> How about this suggestion (sketch, not yet tried):
> 
> // adjusts the dynptr to point to *len* bytes starting from the
> // specified *offset*
> 
> BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
> {
> 	int err;
>         u32 size;
> 
> 	if (!ptr->data)
> 		return -EINVAL;
> 
> 	err = bpf_dynptr_check_off_len(ptr, offset, len);
> 	if (err)
> 		return err;
> 
> 	ptr->offset += offset;
> 	size = bpf_dynptr_get_size(ptr) - len;
> 	ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size;
> 	return 0;
> }
> 

Correction, meant:
 	ptr->offset += offset;
 	ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | len;
Joanne Koong Aug. 30, 2022, 11:35 p.m. UTC | #4
On Tue, Aug 30, 2022 at 1:13 PM Shmulik Ladkani
<shmulik@metanetworks.com> wrote:
>
> On Tue, 30 Aug 2022 23:02:57 +0300
> Shmulik Ladkani <shmulik@metanetworks.com> wrote:
>
> > On Thu, 25 Aug 2022 11:20:31 -0700
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > > > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
> > >
> > > why can't we rely on dynptr's len instead of specifying extra one
> > > here? dynptr is a range of memory, so just specify that you take that
> > > entire range?
> > >
> > > And then we'll have (or partially already have) generic dynptr helpers
> > > to adjust internal dynptr offset and len.
> >
> > Alright.
> >
> > For my usecase I need to use *part* of the tunnel options that were
> > previously stored as a dynptr.
> >
> > Therefore I need to introduce a new bpf helper that adjusts the dynptr.
> >
> > How about this suggestion (sketch, not yet tried):
> >
> > // adjusts the dynptr to point to *len* bytes starting from the
> > // specified *offset*
> >
> > BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
> > {
> >       int err;
> >         u32 size;
> >
> >       if (!ptr->data)
> >               return -EINVAL;
> >
> >       err = bpf_dynptr_check_off_len(ptr, offset, len);
> >       if (err)
> >               return err;
> >
> >       ptr->offset += offset;
> >       size = bpf_dynptr_get_size(ptr) - len;
> >       ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size;
> >       return 0;
> > }
> >
>
> Correction, meant:
>         ptr->offset += offset;
>         ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | len;
>

I have a patchset for dynptr convenience helpers that I plan to tidy
up and push out later this week or mid-next week. it includes
"bpf_dynptr_advance()" and "bpf_dynptr_trim()", which will let you
adjust the dynptr ("_advance()" advances the offset, "_trim()" trims
the size)
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 644600dbb114..c7b313e30635 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5367,6 +5367,17 @@  union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
+ *	Description
+ *		Set tunnel options metadata for the packet associated to *skb*
+ *		to the variable length *len* bytes of option data pointed to
+ *		by the *opt* dynptr.
+ *
+ *		See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ *		helper for additional information.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5578,6 +5589,7 @@  union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(skb_set_tunnel_opt_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index 63e25d8ce501..1407c87ba6ac 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4669,8 +4669,7 @@  static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
-BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
-	   const u8 *, from, u32, size)
+static u64 __bpf_skb_set_tunopt(struct sk_buff *skb, const u8 *from, u32 size)
 {
 	struct ip_tunnel_info *info = skb_tunnel_info(skb);
 	const struct metadata_dst *md = this_cpu_ptr(md_dst);
@@ -4685,6 +4684,26 @@  BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
 	return 0;
 }
 
+BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
+	   const u8 *, from, u32, size)
+{
+	return __bpf_skb_set_tunopt(skb, from, size);
+}
+
+BPF_CALL_3(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb,
+	   struct bpf_dynptr_kern *, ptr, u32, len)
+{
+	const u8 *from;
+	u32 avail;
+
+	from = bpf_dynptr_get_data(ptr, &avail);
+	if (unlikely(!from))
+		return -EFAULT;
+	if (unlikely(len > avail))
+		return -EINVAL;
+	return __bpf_skb_set_tunopt(skb, from, len);
+}
+
 static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.func		= bpf_skb_set_tunnel_opt,
 	.gpl_only	= false,
@@ -4694,6 +4713,15 @@  static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
+static const struct bpf_func_proto bpf_skb_set_tunnel_opt_dynptr_proto = {
+	.func		= bpf_skb_set_tunnel_opt_dynptr,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
 {
@@ -4714,6 +4742,8 @@  bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
 		return &bpf_skb_set_tunnel_key_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
 		return &bpf_skb_set_tunnel_opt_proto;
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
+		return &bpf_skb_set_tunnel_opt_dynptr_proto;
 	default:
 		return NULL;
 	}
@@ -7808,6 +7838,7 @@  tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_tunnel_opt:
 		return &bpf_skb_get_tunnel_opt_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
 		return bpf_get_skb_set_tunnel_proto(func_id);
 	case BPF_FUNC_redirect:
 		return &bpf_redirect_proto;
@@ -8155,6 +8186,7 @@  lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_tunnel_opt:
 		return &bpf_skb_get_tunnel_opt_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
 		return bpf_get_skb_set_tunnel_proto(func_id);
 	case BPF_FUNC_redirect:
 		return &bpf_redirect_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4fb685591035..6a032a8a5fef 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5367,6 +5367,17 @@  union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
+ *	Description
+ *		Set tunnel options metadata for the packet associated to *skb*
+ *		to the variable length *len* bytes of option data pointed to
+ *		by the *opt* dynptr.
+ *
+ *		See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ *		helper for additional information.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5578,6 +5589,7 @@  union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(skb_set_tunnel_opt_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper