diff mbox series

[v1,bpf-next,06/11] bpf: Introduce two attach types for BPF_PROG_TYPE_SK_REUSEPORT.

Message ID 20201201144418.35045-7-kuniyu@amazon.co.jp (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Socket migration for SO_REUSEPORT. | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 15755 this patch: 15755
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 15667 this patch: 15667
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Iwashima, Kuniyuki Dec. 1, 2020, 2:44 p.m. UTC
This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
check if the attached eBPF program is capable of migrating sockets.

When the eBPF program is attached, the kernel runs it for socket migration
only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
The kernel will change the behaviour depending on the returned value:

  - SK_PASS with selected_sk, select it as a new listener
  - SK_PASS with selected_sk NULL, fall back to the random selection
  - SK_DROP, cancel the migration

Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 include/uapi/linux/bpf.h       | 2 ++
 kernel/bpf/syscall.c           | 8 ++++++++
 tools/include/uapi/linux/bpf.h | 2 ++
 3 files changed, 12 insertions(+)

Comments

Andrii Nakryiko Dec. 2, 2020, 2:04 a.m. UTC | #1
On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>
> This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
> check if the attached eBPF program is capable of migrating sockets.
>
> When the eBPF program is attached, the kernel runs it for socket migration
> only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
> The kernel will change the behaviour depending on the returned value:
>
>   - SK_PASS with selected_sk, select it as a new listener
>   - SK_PASS with selected_sk NULL, fall back to the random selection
>   - SK_DROP, cancel the migration
>
> Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
> Suggested-by: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> ---
>  include/uapi/linux/bpf.h       | 2 ++
>  kernel/bpf/syscall.c           | 8 ++++++++
>  tools/include/uapi/linux/bpf.h | 2 ++
>  3 files changed, 12 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 85278deff439..cfc207ae7782 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -241,6 +241,8 @@ enum bpf_attach_type {
>         BPF_XDP_CPUMAP,
>         BPF_SK_LOOKUP,
>         BPF_XDP,
> +       BPF_SK_REUSEPORT_SELECT,
> +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
>         __MAX_BPF_ATTACH_TYPE
>  };
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index f3fe9f53f93c..a0796a8de5ea 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
>                 if (expected_attach_type == BPF_SK_LOOKUP)
>                         return 0;
>                 return -EINVAL;
> +       case BPF_PROG_TYPE_SK_REUSEPORT:
> +               switch (expected_attach_type) {
> +               case BPF_SK_REUSEPORT_SELECT:
> +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
> +                       return 0;
> +               default:
> +                       return -EINVAL;
> +               }

this is a kernel regression, previously expected_attach_type wasn't
enforced, so user-space could have provided any number without an
error.

>         case BPF_PROG_TYPE_EXT:
>                 if (expected_attach_type)
>                         return -EINVAL;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 85278deff439..cfc207ae7782 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -241,6 +241,8 @@ enum bpf_attach_type {
>         BPF_XDP_CPUMAP,
>         BPF_SK_LOOKUP,
>         BPF_XDP,
> +       BPF_SK_REUSEPORT_SELECT,
> +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
>         __MAX_BPF_ATTACH_TYPE
>  };
>
> --
> 2.17.2 (Apple Git-113)
>
Martin KaFai Lau Dec. 2, 2020, 7:19 p.m. UTC | #2
On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:
> On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> >
> > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
> > check if the attached eBPF program is capable of migrating sockets.
> >
> > When the eBPF program is attached, the kernel runs it for socket migration
> > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
> > The kernel will change the behaviour depending on the returned value:
> >
> >   - SK_PASS with selected_sk, select it as a new listener
> >   - SK_PASS with selected_sk NULL, fall back to the random selection
> >   - SK_DROP, cancel the migration
> >
> > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
> > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > ---
> >  include/uapi/linux/bpf.h       | 2 ++
> >  kernel/bpf/syscall.c           | 8 ++++++++
> >  tools/include/uapi/linux/bpf.h | 2 ++
> >  3 files changed, 12 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 85278deff439..cfc207ae7782 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> >         BPF_XDP_CPUMAP,
> >         BPF_SK_LOOKUP,
> >         BPF_XDP,
> > +       BPF_SK_REUSEPORT_SELECT,
> > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> >         __MAX_BPF_ATTACH_TYPE
> >  };
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index f3fe9f53f93c..a0796a8de5ea 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
> >                 if (expected_attach_type == BPF_SK_LOOKUP)
> >                         return 0;
> >                 return -EINVAL;
> > +       case BPF_PROG_TYPE_SK_REUSEPORT:
> > +               switch (expected_attach_type) {
> > +               case BPF_SK_REUSEPORT_SELECT:
> > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
> > +                       return 0;
> > +               default:
> > +                       return -EINVAL;
> > +               }
> 
> this is a kernel regression, previously expected_attach_type wasn't
> enforced, so user-space could have provided any number without an
> error.
I also think this change alone will break things like when the usual
attr->expected_attach_type == 0 case.  At least changes is needed in
bpf_prog_load_fixup_attach_type() which is also handling a
similar situation for BPF_PROG_TYPE_CGROUP_SOCK.

I now think there is no need to expose new bpf_attach_type to the UAPI.
Since the prog->expected_attach_type is not used, it can be cleared at load time
and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined
internally at filter.[c|h]) in the is_valid_access() when "migration"
is accessed.  When "migration" is accessed, the bpf prog can handle
migration (and the original not-migration) case.

> 
> >         case BPF_PROG_TYPE_EXT:
> >                 if (expected_attach_type)
> >                         return -EINVAL;
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 85278deff439..cfc207ae7782 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> >         BPF_XDP_CPUMAP,
> >         BPF_SK_LOOKUP,
> >         BPF_XDP,
> > +       BPF_SK_REUSEPORT_SELECT,
> > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> >         __MAX_BPF_ATTACH_TYPE
> >  };
> >
> > --
> > 2.17.2 (Apple Git-113)
> >
Martin KaFai Lau Dec. 3, 2020, 4:24 a.m. UTC | #3
On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:
> On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:
> > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> > >
> > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
> > > check if the attached eBPF program is capable of migrating sockets.
> > >
> > > When the eBPF program is attached, the kernel runs it for socket migration
> > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
> > > The kernel will change the behaviour depending on the returned value:
> > >
> > >   - SK_PASS with selected_sk, select it as a new listener
> > >   - SK_PASS with selected_sk NULL, fall back to the random selection
> > >   - SK_DROP, cancel the migration
> > >
> > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
> > > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > > ---
> > >  include/uapi/linux/bpf.h       | 2 ++
> > >  kernel/bpf/syscall.c           | 8 ++++++++
> > >  tools/include/uapi/linux/bpf.h | 2 ++
> > >  3 files changed, 12 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 85278deff439..cfc207ae7782 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> > >         BPF_XDP_CPUMAP,
> > >         BPF_SK_LOOKUP,
> > >         BPF_XDP,
> > > +       BPF_SK_REUSEPORT_SELECT,
> > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > >         __MAX_BPF_ATTACH_TYPE
> > >  };
> > >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index f3fe9f53f93c..a0796a8de5ea 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
> > >                 if (expected_attach_type == BPF_SK_LOOKUP)
> > >                         return 0;
> > >                 return -EINVAL;
> > > +       case BPF_PROG_TYPE_SK_REUSEPORT:
> > > +               switch (expected_attach_type) {
> > > +               case BPF_SK_REUSEPORT_SELECT:
> > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
> > > +                       return 0;
> > > +               default:
> > > +                       return -EINVAL;
> > > +               }
> > 
> > this is a kernel regression, previously expected_attach_type wasn't
> > enforced, so user-space could have provided any number without an
> > error.
> I also think this change alone will break things like when the usual
> attr->expected_attach_type == 0 case.  At least changes is needed in
> bpf_prog_load_fixup_attach_type() which is also handling a
> similar situation for BPF_PROG_TYPE_CGROUP_SOCK.
> 
> I now think there is no need to expose new bpf_attach_type to the UAPI.
> Since the prog->expected_attach_type is not used, it can be cleared at load time
> and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined
> internally at filter.[c|h]) in the is_valid_access() when "migration"
> is accessed.  When "migration" is accessed, the bpf prog can handle
> migration (and the original not-migration) case.
Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.
I think there will be cases that bpf prog wants to do both
without accessing any field from sk_reuseport_md.

Lets go back to the discussion on using a similar
idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().
I am not aware there is loader setting a random number
in expected_attach_type, so the chance of breaking
is very low.  There was a similar discussion earlier [0].

[0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/

> 
> > 
> > >         case BPF_PROG_TYPE_EXT:
> > >                 if (expected_attach_type)
> > >                         return -EINVAL;
> > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > > index 85278deff439..cfc207ae7782 100644
> > > --- a/tools/include/uapi/linux/bpf.h
> > > +++ b/tools/include/uapi/linux/bpf.h
> > > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> > >         BPF_XDP_CPUMAP,
> > >         BPF_SK_LOOKUP,
> > >         BPF_XDP,
> > > +       BPF_SK_REUSEPORT_SELECT,
> > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > >         __MAX_BPF_ATTACH_TYPE
> > >  };
> > >
> > > --
> > > 2.17.2 (Apple Git-113)
> > >
Iwashima, Kuniyuki Dec. 3, 2020, 2:16 p.m. UTC | #4
From:   Martin KaFai Lau <kafai@fb.com>
Date:   Wed, 2 Dec 2020 20:24:02 -0800
> On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:
> > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:
> > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> > > >
> > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
> > > > check if the attached eBPF program is capable of migrating sockets.
> > > >
> > > > When the eBPF program is attached, the kernel runs it for socket migration
> > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
> > > > The kernel will change the behaviour depending on the returned value:
> > > >
> > > >   - SK_PASS with selected_sk, select it as a new listener
> > > >   - SK_PASS with selected_sk NULL, fall back to the random selection
> > > >   - SK_DROP, cancel the migration
> > > >
> > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
> > > > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > > > ---
> > > >  include/uapi/linux/bpf.h       | 2 ++
> > > >  kernel/bpf/syscall.c           | 8 ++++++++
> > > >  tools/include/uapi/linux/bpf.h | 2 ++
> > > >  3 files changed, 12 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index 85278deff439..cfc207ae7782 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> > > >         BPF_XDP_CPUMAP,
> > > >         BPF_SK_LOOKUP,
> > > >         BPF_XDP,
> > > > +       BPF_SK_REUSEPORT_SELECT,
> > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > > >         __MAX_BPF_ATTACH_TYPE
> > > >  };
> > > >
> > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > index f3fe9f53f93c..a0796a8de5ea 100644
> > > > --- a/kernel/bpf/syscall.c
> > > > +++ b/kernel/bpf/syscall.c
> > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
> > > >                 if (expected_attach_type == BPF_SK_LOOKUP)
> > > >                         return 0;
> > > >                 return -EINVAL;
> > > > +       case BPF_PROG_TYPE_SK_REUSEPORT:
> > > > +               switch (expected_attach_type) {
> > > > +               case BPF_SK_REUSEPORT_SELECT:
> > > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
> > > > +                       return 0;
> > > > +               default:
> > > > +                       return -EINVAL;
> > > > +               }
> > > 
> > > this is a kernel regression, previously expected_attach_type wasn't
> > > enforced, so user-space could have provided any number without an
> > > error.
> > I also think this change alone will break things like when the usual
> > attr->expected_attach_type == 0 case.  At least changes is needed in
> > bpf_prog_load_fixup_attach_type() which is also handling a
> > similar situation for BPF_PROG_TYPE_CGROUP_SOCK.
> > 
> > I now think there is no need to expose new bpf_attach_type to the UAPI.
> > Since the prog->expected_attach_type is not used, it can be cleared at load time
> > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined
> > internally at filter.[c|h]) in the is_valid_access() when "migration"
> > is accessed.  When "migration" is accessed, the bpf prog can handle
> > migration (and the original not-migration) case.
> Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.
> I think there will be cases that bpf prog wants to do both
> without accessing any field from sk_reuseport_md.
> 
> Lets go back to the discussion on using a similar
> idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().
> I am not aware there is loader setting a random number
> in expected_attach_type, so the chance of breaking
> is very low.  There was a similar discussion earlier [0].
> 
> [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/

Thank you for the idea and reference.

I will remove the change in bpf_prog_load_check_attach() and set the
default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type()
for backward compatibility if expected_attach_type is 0.


> > > >         case BPF_PROG_TYPE_EXT:
> > > >                 if (expected_attach_type)
> > > >                         return -EINVAL;
> > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > > > index 85278deff439..cfc207ae7782 100644
> > > > --- a/tools/include/uapi/linux/bpf.h
> > > > +++ b/tools/include/uapi/linux/bpf.h
> > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> > > >         BPF_XDP_CPUMAP,
> > > >         BPF_SK_LOOKUP,
> > > >         BPF_XDP,
> > > > +       BPF_SK_REUSEPORT_SELECT,
> > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > > >         __MAX_BPF_ATTACH_TYPE
> > > >  };
> > > >
> > > > --
> > > > 2.17.2 (Apple Git-113)
Martin KaFai Lau Dec. 4, 2020, 5:56 a.m. UTC | #5
On Thu, Dec 03, 2020 at 11:16:08PM +0900, Kuniyuki Iwashima wrote:
> From:   Martin KaFai Lau <kafai@fb.com>
> Date:   Wed, 2 Dec 2020 20:24:02 -0800
> > On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:
> > > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:
> > > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> > > > >
> > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
> > > > > check if the attached eBPF program is capable of migrating sockets.
> > > > >
> > > > > When the eBPF program is attached, the kernel runs it for socket migration
> > > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
> > > > > The kernel will change the behaviour depending on the returned value:
> > > > >
> > > > >   - SK_PASS with selected_sk, select it as a new listener
> > > > >   - SK_PASS with selected_sk NULL, fall back to the random selection
> > > > >   - SK_DROP, cancel the migration
> > > > >
> > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
> > > > > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > > > > ---
> > > > >  include/uapi/linux/bpf.h       | 2 ++
> > > > >  kernel/bpf/syscall.c           | 8 ++++++++
> > > > >  tools/include/uapi/linux/bpf.h | 2 ++
> > > > >  3 files changed, 12 insertions(+)
> > > > >
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 85278deff439..cfc207ae7782 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> > > > >         BPF_XDP_CPUMAP,
> > > > >         BPF_SK_LOOKUP,
> > > > >         BPF_XDP,
> > > > > +       BPF_SK_REUSEPORT_SELECT,
> > > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > > > >         __MAX_BPF_ATTACH_TYPE
> > > > >  };
> > > > >
> > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > index f3fe9f53f93c..a0796a8de5ea 100644
> > > > > --- a/kernel/bpf/syscall.c
> > > > > +++ b/kernel/bpf/syscall.c
> > > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
> > > > >                 if (expected_attach_type == BPF_SK_LOOKUP)
> > > > >                         return 0;
> > > > >                 return -EINVAL;
> > > > > +       case BPF_PROG_TYPE_SK_REUSEPORT:
> > > > > +               switch (expected_attach_type) {
> > > > > +               case BPF_SK_REUSEPORT_SELECT:
> > > > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
> > > > > +                       return 0;
> > > > > +               default:
> > > > > +                       return -EINVAL;
> > > > > +               }
> > > > 
> > > > this is a kernel regression, previously expected_attach_type wasn't
> > > > enforced, so user-space could have provided any number without an
> > > > error.
> > > I also think this change alone will break things like when the usual
> > > attr->expected_attach_type == 0 case.  At least changes is needed in
> > > bpf_prog_load_fixup_attach_type() which is also handling a
> > > similar situation for BPF_PROG_TYPE_CGROUP_SOCK.
> > > 
> > > I now think there is no need to expose new bpf_attach_type to the UAPI.
> > > Since the prog->expected_attach_type is not used, it can be cleared at load time
> > > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined
> > > internally at filter.[c|h]) in the is_valid_access() when "migration"
> > > is accessed.  When "migration" is accessed, the bpf prog can handle
> > > migration (and the original not-migration) case.
> > Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.
> > I think there will be cases that bpf prog wants to do both
> > without accessing any field from sk_reuseport_md.
> > 
> > Lets go back to the discussion on using a similar
> > idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().
> > I am not aware there is loader setting a random number
> > in expected_attach_type, so the chance of breaking
> > is very low.  There was a similar discussion earlier [0].
> > 
> > [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/
> 
> Thank you for the idea and reference.
> 
> I will remove the change in bpf_prog_load_check_attach() and set the
> default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type()
> for backward compatibility if expected_attach_type is 0.
check_attach_type() can be kept.  You can refer to
commit aac3fc320d94 for a similar situation.
Iwashima, Kuniyuki Dec. 6, 2020, 4:32 a.m. UTC | #6
I'm sending this mail just for logging because I failed to send mails only
to LKML, netdev, and bpf yesterday.


From:   Martin KaFai Lau <kafai@fb.com>
Date:   Thu, 3 Dec 2020 21:56:53 -0800
> On Thu, Dec 03, 2020 at 11:16:08PM +0900, Kuniyuki Iwashima wrote:
> > From:   Martin KaFai Lau <kafai@fb.com>
> > Date:   Wed, 2 Dec 2020 20:24:02 -0800
> > > On Wed, Dec 02, 2020 at 11:19:02AM -0800, Martin KaFai Lau wrote:
> > > > On Tue, Dec 01, 2020 at 06:04:50PM -0800, Andrii Nakryiko wrote:
> > > > > On Tue, Dec 1, 2020 at 6:49 AM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> > > > > >
> > > > > > This commit adds new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT to
> > > > > > check if the attached eBPF program is capable of migrating sockets.
> > > > > >
> > > > > > When the eBPF program is attached, the kernel runs it for socket migration
> > > > > > only if the expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.
> > > > > > The kernel will change the behaviour depending on the returned value:
> > > > > >
> > > > > >   - SK_PASS with selected_sk, select it as a new listener
> > > > > >   - SK_PASS with selected_sk NULL, fall back to the random selection
> > > > > >   - SK_DROP, cancel the migration
> > > > > >
> > > > > > Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
> > > > > > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> > > > > > ---
> > > > > >  include/uapi/linux/bpf.h       | 2 ++
> > > > > >  kernel/bpf/syscall.c           | 8 ++++++++
> > > > > >  tools/include/uapi/linux/bpf.h | 2 ++
> > > > > >  3 files changed, 12 insertions(+)
> > > > > >
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 85278deff439..cfc207ae7782 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -241,6 +241,8 @@ enum bpf_attach_type {
> > > > > >         BPF_XDP_CPUMAP,
> > > > > >         BPF_SK_LOOKUP,
> > > > > >         BPF_XDP,
> > > > > > +       BPF_SK_REUSEPORT_SELECT,
> > > > > > +       BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
> > > > > >         __MAX_BPF_ATTACH_TYPE
> > > > > >  };
> > > > > >
> > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > index f3fe9f53f93c..a0796a8de5ea 100644
> > > > > > --- a/kernel/bpf/syscall.c
> > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > @@ -2036,6 +2036,14 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
> > > > > >                 if (expected_attach_type == BPF_SK_LOOKUP)
> > > > > >                         return 0;
> > > > > >                 return -EINVAL;
> > > > > > +       case BPF_PROG_TYPE_SK_REUSEPORT:
> > > > > > +               switch (expected_attach_type) {
> > > > > > +               case BPF_SK_REUSEPORT_SELECT:
> > > > > > +               case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
> > > > > > +                       return 0;
> > > > > > +               default:
> > > > > > +                       return -EINVAL;
> > > > > > +               }
> > > > > 
> > > > > this is a kernel regression, previously expected_attach_type wasn't
> > > > > enforced, so user-space could have provided any number without an
> > > > > error.
> > > > I also think this change alone will break things like when the usual
> > > > attr->expected_attach_type == 0 case.  At least changes is needed in
> > > > bpf_prog_load_fixup_attach_type() which is also handling a
> > > > similar situation for BPF_PROG_TYPE_CGROUP_SOCK.
> > > > 
> > > > I now think there is no need to expose new bpf_attach_type to the UAPI.
> > > > Since the prog->expected_attach_type is not used, it can be cleared at load time
> > > > and then only set to BPF_SK_REUSEPORT_SELECT_OR_MIGRATE (probably defined
> > > > internally at filter.[c|h]) in the is_valid_access() when "migration"
> > > > is accessed.  When "migration" is accessed, the bpf prog can handle
> > > > migration (and the original not-migration) case.
> > > Scrap this internal only BPF_SK_REUSEPORT_SELECT_OR_MIGRATE idea.
> > > I think there will be cases that bpf prog wants to do both
> > > without accessing any field from sk_reuseport_md.
> > > 
> > > Lets go back to the discussion on using a similar
> > > idea as BPF_PROG_TYPE_CGROUP_SOCK in bpf_prog_load_fixup_attach_type().
> > > I am not aware there is loader setting a random number
> > > in expected_attach_type, so the chance of breaking
> > > is very low.  There was a similar discussion earlier [0].
> > > 
> > > [0]: https://lore.kernel.org/netdev/20200126045443.f47dzxdglazzchfm@ast-mbp/
> > 
> > Thank you for the idea and reference.
> > 
> > I will remove the change in bpf_prog_load_check_attach() and set the
> > default value (BPF_SK_REUSEPORT_SELECT) in bpf_prog_load_fixup_attach_type()
> > for backward compatibility if expected_attach_type is 0.
> check_attach_type() can be kept.  You can refer to
> commit aac3fc320d94 for a similar situation.

I confirmed bpf_prog_load_fixup_attach_type() is called just before
bpf_prog_load_check_attach(), so I will add the fixup code to this patch.
Thank you.
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 85278deff439..cfc207ae7782 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -241,6 +241,8 @@  enum bpf_attach_type {
 	BPF_XDP_CPUMAP,
 	BPF_SK_LOOKUP,
 	BPF_XDP,
+	BPF_SK_REUSEPORT_SELECT,
+	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f3fe9f53f93c..a0796a8de5ea 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2036,6 +2036,14 @@  bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 		if (expected_attach_type == BPF_SK_LOOKUP)
 			return 0;
 		return -EINVAL;
+	case BPF_PROG_TYPE_SK_REUSEPORT:
+		switch (expected_attach_type) {
+		case BPF_SK_REUSEPORT_SELECT:
+		case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:
+			return 0;
+		default:
+			return -EINVAL;
+		}
 	case BPF_PROG_TYPE_EXT:
 		if (expected_attach_type)
 			return -EINVAL;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 85278deff439..cfc207ae7782 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -241,6 +241,8 @@  enum bpf_attach_type {
 	BPF_XDP_CPUMAP,
 	BPF_SK_LOOKUP,
 	BPF_XDP,
+	BPF_SK_REUSEPORT_SELECT,
+	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	__MAX_BPF_ATTACH_TYPE
 };