diff mbox series

[RFC,v2] Documentation/bpf: Add a description of "stable kfuncs"

Message ID 20230117212731.442859-1-toke@redhat.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [RFC,v2] Documentation/bpf: Add a description of "stable kfuncs" | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Toke Høiland-Jørgensen Jan. 17, 2023, 9:27 p.m. UTC
Following up on the discussion at the BPF office hours, this patch adds a
description of the (new) concept of "stable kfuncs", which are kfuncs that
offer a "more stable" interface than what we have now, but is still not
part of UAPI.

This is mostly meant as a straw man proposal to focus discussions around
stability guarantees. From the discussion, it seemed clear that there were
at least some people (myself included) who felt that there needs to be some
way to export functionality that we consider "stable" (in the sense of
"applications can rely on its continuing existence").

One option is to keep BPF helpers as the stable interface and implement
some technical solution for moving functionality from kfuncs to helpers
once it has stood the test of time and we're comfortable committing to it
as a stable API. Another is to freeze the helper definitions, and instead
use kfuncs for this purpose as well, by marking a subset of them as
"stable" in some way. Or we can do both and have multiple levels of
"stable", I suppose.

This patch is an attempt to describe what the "stable kfuncs" idea might
look like, as well as to formulate some criteria for what we mean by
"stable", and describe an explicit deprecation procedure. Feel free to
critique any part of this (including rejecting the notion entirely).

Some people mentioned (in the office hours) that should we decide to go in
this direction, there's some work that needs to be done in libbpf (and
probably the kernel too?) to bring the kfunc developer experience up to par
with helpers. Things like exporting kfunc definitions to vmlinux.h (to make
them discoverable), and having CO-RE support for using them, etc. I kinda
consider that orthogonal to what's described here, but I do think we should
fix those issues before implementing the procedures described here.

v2:
- Incorporate Daniel's changes

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 Documentation/bpf/kfuncs.rst | 87 +++++++++++++++++++++++++++++++++---
 1 file changed, 81 insertions(+), 6 deletions(-)

Comments

Alexei Starovoitov Jan. 17, 2023, 11:19 p.m. UTC | #1
On Tue, Jan 17, 2023 at 2:20 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Jan 17, 2023 at 2:04 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Stanislav Fomichev <sdf@google.com> writes:
> >
> > > On Tue, Jan 17, 2023 at 1:27 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >>
> > >> Following up on the discussion at the BPF office hours, this patch adds a
> > >> description of the (new) concept of "stable kfuncs", which are kfuncs that
> > >> offer a "more stable" interface than what we have now, but is still not
> > >> part of UAPI.
> > >>
> > >> This is mostly meant as a straw man proposal to focus discussions around
> > >> stability guarantees. From the discussion, it seemed clear that there were
> > >> at least some people (myself included) who felt that there needs to be some
> > >> way to export functionality that we consider "stable" (in the sense of
> > >> "applications can rely on its continuing existence").
> > >>
> > >> One option is to keep BPF helpers as the stable interface and implement
> > >> some technical solution for moving functionality from kfuncs to helpers
> > >> once it has stood the test of time and we're comfortable committing to it
> > >> as a stable API. Another is to freeze the helper definitions, and instead
> > >> use kfuncs for this purpose as well, by marking a subset of them as
> > >> "stable" in some way. Or we can do both and have multiple levels of
> > >> "stable", I suppose.
> > >>
> > >> This patch is an attempt to describe what the "stable kfuncs" idea might
> > >> look like, as well as to formulate some criteria for what we mean by
> > >> "stable", and describe an explicit deprecation procedure. Feel free to
> > >> critique any part of this (including rejecting the notion entirely).
> > >>
> > >> Some people mentioned (in the office hours) that should we decide to go in
> > >> this direction, there's some work that needs to be done in libbpf (and
> > >> probably the kernel too?) to bring the kfunc developer experience up to par
> > >> with helpers. Things like exporting kfunc definitions to vmlinux.h (to make
> > >> them discoverable), and having CO-RE support for using them, etc. I kinda
> > >> consider that orthogonal to what's described here, but I do think we should
> > >> fix those issues before implementing the procedures described here.
> > >>
> > >> v2:
> > >> - Incorporate Daniel's changes
> > >>
> > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > >> ---
> > >>  Documentation/bpf/kfuncs.rst | 87 +++++++++++++++++++++++++++++++++---
> > >>  1 file changed, 81 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > >> index 9fd7fb539f85..dd40a4ee35f2 100644
> > >> --- a/Documentation/bpf/kfuncs.rst
> > >> +++ b/Documentation/bpf/kfuncs.rst
> > >> @@ -7,9 +7,9 @@ BPF Kernel Functions (kfuncs)
> > >>
> > >>  BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
> > >>  kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
> > >> -kfuncs do not have a stable interface and can change from one kernel release to
> > >> -another. Hence, BPF programs need to be updated in response to changes in the
> > >> -kernel.
> > >> +kfuncs by default do not have a stable interface and can change from one kernel
> > >> +release to another. Hence, BPF programs may need to be updated in response to
> > >> +changes in the kernel. See :ref:`BPF_kfunc_stability`.
> > >>
> > >>  2. Defining a kfunc
> > >>  ===================
> > >> @@ -223,14 +223,89 @@ type. An example is shown below::
> > >>          }
> > >>          late_initcall(init_subsystem);
> > >>
> > >> -3. Core kfuncs
> > >> +
> > >> +.. _BPF_kfunc_stability:
> > >> +
> > >> +3. API (in)stability of kfuncs
> > >> +==============================
> > >> +
> > >> +By default, kfuncs exported to BPF programs are considered a kernel-internal
> > >> +interface that can change between kernel versions. This means that BPF programs
> > >> +using kfuncs may need to adapt to changes between kernel versions. In the
> > >> +extreme case that could also include removal of a kfunc. In other words, kfuncs
> > >> +are _not_ part of the kernel UAPI! Rather, these kfuncs can be thought of as
> > >> +being similar to internal kernel API functions exported using the
> > >
> > > [..]
> > >
> > >> +``EXPORT_SYMBOL_GPL`` macro. All new BPF kernel helper-like functionality must
> > >> +initially start out as kfuncs.
> > >
> > > To clarify, as part of this proposal, are we making a decision here
> > > that we ban new helpers going forward?
> >
> > Good question! That is one of the things I'm hoping we can clear up by
> > this discussing. I don't have a strong opinion on the matter myself, as
> > long as there is *some* way to mark a subset of helpers/kfuncs as
> > "stable"...
>
> Might be worth it to capitalize in this case to indicate that it's a
> MUST from the RFC world? (or go with SHOULD otherwise).
> I'm fine either way. The only thing that stops me from fully embracing
> MUST is the kfunc requirement on the explicit jit support; I'm not
> sure why it exists and at this point I'm too afraid to ask. But having
> MUST here might give us motivation to address the shortcomings...

Did you do:
git grep bpf_jit_supports_kfunc_call
and didn't find your favorite architecture there and
didn't find it in the upcoming patches for riscv and arm32?
If you care about kfuncs on arm32 please help reviewing posted patches.
Stanislav Fomichev Jan. 18, 2023, 2 a.m. UTC | #2
On Tue, Jan 17, 2023 at 3:19 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 17, 2023 at 2:20 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Tue, Jan 17, 2023 at 2:04 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > Stanislav Fomichev <sdf@google.com> writes:
> > >
> > > > On Tue, Jan 17, 2023 at 1:27 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > > >>
> > > >> Following up on the discussion at the BPF office hours, this patch adds a
> > > >> description of the (new) concept of "stable kfuncs", which are kfuncs that
> > > >> offer a "more stable" interface than what we have now, but is still not
> > > >> part of UAPI.
> > > >>
> > > >> This is mostly meant as a straw man proposal to focus discussions around
> > > >> stability guarantees. From the discussion, it seemed clear that there were
> > > >> at least some people (myself included) who felt that there needs to be some
> > > >> way to export functionality that we consider "stable" (in the sense of
> > > >> "applications can rely on its continuing existence").
> > > >>
> > > >> One option is to keep BPF helpers as the stable interface and implement
> > > >> some technical solution for moving functionality from kfuncs to helpers
> > > >> once it has stood the test of time and we're comfortable committing to it
> > > >> as a stable API. Another is to freeze the helper definitions, and instead
> > > >> use kfuncs for this purpose as well, by marking a subset of them as
> > > >> "stable" in some way. Or we can do both and have multiple levels of
> > > >> "stable", I suppose.
> > > >>
> > > >> This patch is an attempt to describe what the "stable kfuncs" idea might
> > > >> look like, as well as to formulate some criteria for what we mean by
> > > >> "stable", and describe an explicit deprecation procedure. Feel free to
> > > >> critique any part of this (including rejecting the notion entirely).
> > > >>
> > > >> Some people mentioned (in the office hours) that should we decide to go in
> > > >> this direction, there's some work that needs to be done in libbpf (and
> > > >> probably the kernel too?) to bring the kfunc developer experience up to par
> > > >> with helpers. Things like exporting kfunc definitions to vmlinux.h (to make
> > > >> them discoverable), and having CO-RE support for using them, etc. I kinda
> > > >> consider that orthogonal to what's described here, but I do think we should
> > > >> fix those issues before implementing the procedures described here.
> > > >>
> > > >> v2:
> > > >> - Incorporate Daniel's changes
> > > >>
> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > >> ---
> > > >>  Documentation/bpf/kfuncs.rst | 87 +++++++++++++++++++++++++++++++++---
> > > >>  1 file changed, 81 insertions(+), 6 deletions(-)
> > > >>
> > > >> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > > >> index 9fd7fb539f85..dd40a4ee35f2 100644
> > > >> --- a/Documentation/bpf/kfuncs.rst
> > > >> +++ b/Documentation/bpf/kfuncs.rst
> > > >> @@ -7,9 +7,9 @@ BPF Kernel Functions (kfuncs)
> > > >>
> > > >>  BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
> > > >>  kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
> > > >> -kfuncs do not have a stable interface and can change from one kernel release to
> > > >> -another. Hence, BPF programs need to be updated in response to changes in the
> > > >> -kernel.
> > > >> +kfuncs by default do not have a stable interface and can change from one kernel
> > > >> +release to another. Hence, BPF programs may need to be updated in response to
> > > >> +changes in the kernel. See :ref:`BPF_kfunc_stability`.
> > > >>
> > > >>  2. Defining a kfunc
> > > >>  ===================
> > > >> @@ -223,14 +223,89 @@ type. An example is shown below::
> > > >>          }
> > > >>          late_initcall(init_subsystem);
> > > >>
> > > >> -3. Core kfuncs
> > > >> +
> > > >> +.. _BPF_kfunc_stability:
> > > >> +
> > > >> +3. API (in)stability of kfuncs
> > > >> +==============================
> > > >> +
> > > >> +By default, kfuncs exported to BPF programs are considered a kernel-internal
> > > >> +interface that can change between kernel versions. This means that BPF programs
> > > >> +using kfuncs may need to adapt to changes between kernel versions. In the
> > > >> +extreme case that could also include removal of a kfunc. In other words, kfuncs
> > > >> +are _not_ part of the kernel UAPI! Rather, these kfuncs can be thought of as
> > > >> +being similar to internal kernel API functions exported using the
> > > >
> > > > [..]
> > > >
> > > >> +``EXPORT_SYMBOL_GPL`` macro. All new BPF kernel helper-like functionality must
> > > >> +initially start out as kfuncs.
> > > >
> > > > To clarify, as part of this proposal, are we making a decision here
> > > > that we ban new helpers going forward?
> > >
> > > Good question! That is one of the things I'm hoping we can clear up by
> > > this discussing. I don't have a strong opinion on the matter myself, as
> > > long as there is *some* way to mark a subset of helpers/kfuncs as
> > > "stable"...
> >
> > Might be worth it to capitalize in this case to indicate that it's a
> > MUST from the RFC world? (or go with SHOULD otherwise).
> > I'm fine either way. The only thing that stops me from fully embracing
> > MUST is the kfunc requirement on the explicit jit support; I'm not
> > sure why it exists and at this point I'm too afraid to ask. But having
> > MUST here might give us motivation to address the shortcomings...
>
> Did you do:
> git grep bpf_jit_supports_kfunc_call
> and didn't find your favorite architecture there and
> didn't find it in the upcoming patches for riscv and arm32?
> If you care about kfuncs on arm32 please help reviewing posted patches.

Exactly why I'm going to support whatever decision is being made here.
Just trying to clarify what that decision is.
Daniel Borkmann Jan. 18, 2023, 10:48 a.m. UTC | #3
On 1/18/23 3:00 AM, Stanislav Fomichev wrote:
> On Tue, Jan 17, 2023 at 3:19 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Tue, Jan 17, 2023 at 2:20 PM Stanislav Fomichev <sdf@google.com> wrote:
>>> On Tue, Jan 17, 2023 at 2:04 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>> Stanislav Fomichev <sdf@google.com> writes:
>>>>> On Tue, Jan 17, 2023 at 1:27 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>>
>>>>>> Following up on the discussion at the BPF office hours, this patch adds a
>>>>>> description of the (new) concept of "stable kfuncs", which are kfuncs that
>>>>>> offer a "more stable" interface than what we have now, but is still not
>>>>>> part of UAPI.
>>>>>>
>>>>>> This is mostly meant as a straw man proposal to focus discussions around
>>>>>> stability guarantees. From the discussion, it seemed clear that there were
>>>>>> at least some people (myself included) who felt that there needs to be some
>>>>>> way to export functionality that we consider "stable" (in the sense of
>>>>>> "applications can rely on its continuing existence").
>>>>>>
>>>>>> One option is to keep BPF helpers as the stable interface and implement
>>>>>> some technical solution for moving functionality from kfuncs to helpers
>>>>>> once it has stood the test of time and we're comfortable committing to it
>>>>>> as a stable API. Another is to freeze the helper definitions, and instead
>>>>>> use kfuncs for this purpose as well, by marking a subset of them as
>>>>>> "stable" in some way. Or we can do both and have multiple levels of
>>>>>> "stable", I suppose.
>>>>>>
>>>>>> This patch is an attempt to describe what the "stable kfuncs" idea might
>>>>>> look like, as well as to formulate some criteria for what we mean by
>>>>>> "stable", and describe an explicit deprecation procedure. Feel free to
>>>>>> critique any part of this (including rejecting the notion entirely).
>>>>>>
>>>>>> Some people mentioned (in the office hours) that should we decide to go in
>>>>>> this direction, there's some work that needs to be done in libbpf (and
>>>>>> probably the kernel too?) to bring the kfunc developer experience up to par
>>>>>> with helpers. Things like exporting kfunc definitions to vmlinux.h (to make
>>>>>> them discoverable), and having CO-RE support for using them, etc. I kinda
>>>>>> consider that orthogonal to what's described here, but I do think we should
>>>>>> fix those issues before implementing the procedures described here.
>>>>>>
>>>>>> v2:
>>>>>> - Incorporate Daniel's changes
>>>>>>
>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>> ---
>>>>>>   Documentation/bpf/kfuncs.rst | 87 +++++++++++++++++++++++++++++++++---
>>>>>>   1 file changed, 81 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
>>>>>> index 9fd7fb539f85..dd40a4ee35f2 100644
>>>>>> --- a/Documentation/bpf/kfuncs.rst
>>>>>> +++ b/Documentation/bpf/kfuncs.rst
>>>>>> @@ -7,9 +7,9 @@ BPF Kernel Functions (kfuncs)
>>>>>>
>>>>>>   BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
>>>>>>   kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
>>>>>> -kfuncs do not have a stable interface and can change from one kernel release to
>>>>>> -another. Hence, BPF programs need to be updated in response to changes in the
>>>>>> -kernel.
>>>>>> +kfuncs by default do not have a stable interface and can change from one kernel
>>>>>> +release to another. Hence, BPF programs may need to be updated in response to
>>>>>> +changes in the kernel. See :ref:`BPF_kfunc_stability`.
>>>>>>
>>>>>>   2. Defining a kfunc
>>>>>>   ===================
>>>>>> @@ -223,14 +223,89 @@ type. An example is shown below::
>>>>>>           }
>>>>>>           late_initcall(init_subsystem);
>>>>>>
>>>>>> -3. Core kfuncs
>>>>>> +
>>>>>> +.. _BPF_kfunc_stability:

small nit: please also link from Documentation/bpf/bpf_design_QA.rst, so these sections
here are easier to find.

>>>>>> +3. API (in)stability of kfuncs
>>>>>> +==============================
>>>>>> +
>>>>>> +By default, kfuncs exported to BPF programs are considered a kernel-internal
>>>>>> +interface that can change between kernel versions. This means that BPF programs
>>>>>> +using kfuncs may need to adapt to changes between kernel versions. In the
>>>>>> +extreme case that could also include removal of a kfunc. In other words, kfuncs
>>>>>> +are _not_ part of the kernel UAPI! Rather, these kfuncs can be thought of as
>>>>>> +being similar to internal kernel API functions exported using the
>>>>>
>>>>> [..]
>>>>>
>>>>>> +``EXPORT_SYMBOL_GPL`` macro. All new BPF kernel helper-like functionality must
>>>>>> +initially start out as kfuncs.
>>>>>
>>>>> To clarify, as part of this proposal, are we making a decision here
>>>>> that we ban new helpers going forward?
>>>>
>>>> Good question! That is one of the things I'm hoping we can clear up by
>>>> this discussing. I don't have a strong opinion on the matter myself, as
>>>> long as there is *some* way to mark a subset of helpers/kfuncs as
>>>> "stable"...
>>>
>>> Might be worth it to capitalize in this case to indicate that it's a
>>> MUST from the RFC world? (or go with SHOULD otherwise).
>>> I'm fine either way. The only thing that stops me from fully embracing
>>> MUST is the kfunc requirement on the explicit jit support; I'm not
>>> sure why it exists and at this point I'm too afraid to ask. But having
>>> MUST here might give us motivation to address the shortcomings...
>>
>> Did you do:
>> git grep bpf_jit_supports_kfunc_call
>> and didn't find your favorite architecture there and
>> didn't find it in the upcoming patches for riscv and arm32?
>> If you care about kfuncs on arm32 please help reviewing posted patches.
> 
> Exactly why I'm going to support whatever decision is being made here.
> Just trying to clarify what that decision is.

My $0.02 is that I don't think we need to make a hard-cut ban as part of this.
The 'All new BPF kernel helper-like functionality must initially start out as
kfuncs.' is pretty clear where things would need to start out with, and we could
leave the option on the table if really needed to go BPF helper route when
promoting kfunc to stable at the same time. I had that in the text suggestion
earlier, it's more corner case and maybe we'll never need it but we also don't
drive ourselves into a corner where we close the door on it. Lets let the infra
around kfuncs evolve further first.

Thanks,
Daniel
David Vernet Jan. 18, 2023, 3:53 p.m. UTC | #4
On Tue, Jan 17, 2023 at 10:27:31PM +0100, Toke Høiland-Jørgensen wrote:
> Following up on the discussion at the BPF office hours, this patch adds a
> description of the (new) concept of "stable kfuncs", which are kfuncs that
> offer a "more stable" interface than what we have now, but is still not
> part of UAPI.
> 
> This is mostly meant as a straw man proposal to focus discussions around
> stability guarantees. From the discussion, it seemed clear that there were
> at least some people (myself included) who felt that there needs to be some
> way to export functionality that we consider "stable" (in the sense of
> "applications can rely on its continuing existence").
> 
> One option is to keep BPF helpers as the stable interface and implement
> some technical solution for moving functionality from kfuncs to helpers
> once it has stood the test of time and we're comfortable committing to it
> as a stable API. Another is to freeze the helper definitions, and instead
> use kfuncs for this purpose as well, by marking a subset of them as
> "stable" in some way. Or we can do both and have multiple levels of
> "stable", I suppose.
> 
> This patch is an attempt to describe what the "stable kfuncs" idea might
> look like, as well as to formulate some criteria for what we mean by
> "stable", and describe an explicit deprecation procedure. Feel free to
> critique any part of this (including rejecting the notion entirely).
> 
> Some people mentioned (in the office hours) that should we decide to go in
> this direction, there's some work that needs to be done in libbpf (and
> probably the kernel too?) to bring the kfunc developer experience up to par
> with helpers. Things like exporting kfunc definitions to vmlinux.h (to make
> them discoverable), and having CO-RE support for using them, etc. I kinda
> consider that orthogonal to what's described here, but I do think we should
> fix those issues before implementing the procedures described here.
> 
> v2:
> - Incorporate Daniel's changes
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

Hi Toke,

Thanks a lot for writing this up. Left some thoughts and comments below.

> ---
>  Documentation/bpf/kfuncs.rst | 87 +++++++++++++++++++++++++++++++++---
>  1 file changed, 81 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 9fd7fb539f85..dd40a4ee35f2 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -7,9 +7,9 @@ BPF Kernel Functions (kfuncs)
>  
>  BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
>  kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
> -kfuncs do not have a stable interface and can change from one kernel release to
> -another. Hence, BPF programs need to be updated in response to changes in the
> -kernel.
> +kfuncs by default do not have a stable interface and can change from one kernel
> +release to another. Hence, BPF programs may need to be updated in response to
> +changes in the kernel. See :ref:`BPF_kfunc_stability`.
>  
>  2. Defining a kfunc
>  ===================
> @@ -223,14 +223,89 @@ type. An example is shown below::
>          }
>          late_initcall(init_subsystem);
>  
> -3. Core kfuncs
> +
> +.. _BPF_kfunc_stability:
> +
> +3. API (in)stability of kfuncs
> +==============================
> +
> +By default, kfuncs exported to BPF programs are considered a kernel-internal
> +interface that can change between kernel versions. This means that BPF programs
> +using kfuncs may need to adapt to changes between kernel versions. In the
> +extreme case that could also include removal of a kfunc. In other words, kfuncs
> +are _not_ part of the kernel UAPI! Rather, these kfuncs can be thought of as
> +being similar to internal kernel API functions exported using the
> +``EXPORT_SYMBOL_GPL`` macro. All new BPF kernel helper-like functionality must
> +initially start out as kfuncs.
> +
> +3.1 Promotion to "stable" kfuncs
> +--------------------------------
> +
> +While kfuncs are by default considered unstable as described above, some kfuncs
> +may warrant a stronger stability guarantee and can be marked as *stable*. The
> +decision to move a kfunc to *stable* is taken on a case-by-case basis and must
> +clear a high bar, taking into account the functions' usefulness under

s/functions'/function's

> +longer-term production deployment without any unforeseen API issues or
> +limitations. In general, it is not expected that every kfunc will turn into a
> +stable one - think of it as an exception rather than the norm.
> +
> +Those kfuncs which have been promoted to stable are then marked using the
> +``KF_STABLE`` tag. The process for requesting a kfunc be marked as stable

I'm not 100% convinced that KF_STABLE and KF_DEPRECATED flags are the
correct API to go with here for signaling stability and deprecation, for
a couple of reasons:

1. We're already requiring that we document in the kernel docs when a
   stable, deprecated kfunc is slated to be removed, so I think an
   argument could be made to make this a documentation convention rather
   than a programmatic + documentation convention. A user that sees that
   a kfunc is deprecated will end up having to go to the documentation
   anyways to see when it will be removed.
2. Long term, I don't think KF_STABLE or KF_DEPRECATED will be
   expressive enough for us. For example, we could do something similar
   to what Maxim suggested in [0] wherein we have API version numbers
   for kfuncs which are interpreted by libbpf, allowing us to warn for
   deprecated kfuncs, enble symbol versioning by linking against
   different versions of the kfunc in the kernel, etc.

[0]: https://lore.kernel.org/all/Y78+iIqqpyufjWOv@mail.gmail.com/

So my take would be, let's just stick with a documentation convention
for now. Maybe we could also add a "deprecated" BTF tag that would at
least allow the verifier to warn the user on clang builds (until gcc
supports it) when they link against a deprecated kfunc. What do you
think?

> +consists of submitting a patch to the bpf@vger.kernel.org mailing list adding
> +the ``KF_STABLE`` tag to that kfunc's definition. The patch description must
> +include the rationale for why the kfunc should be promoted to stable, including
> +references to existing production uses, etc. The patch will be considered the
> +same was as any other patch, and ultimately the decision on whether a kfunc
> +should be promoted to stable is taken by the BPF maintainers.
> +
> +Stable kfuncs provide the following stability guarantees:
> +
> +1. Stable kfuncs will not change their function signature or functionality in a
> +   way that may cause incompatibilities for BPF programs calling the function.
> +
> +2. The BPF community will make every reasonable effort to keep stable kfuncs
> +   around as long as they continue to be useful to real-world BPF applications.

I think we should try to avoid setting subjective expectations such as
"every reasonable effort". I think it's useful to paint a picture of the
typical workflow / culture for stability, but in terms of _guarantees_,
I think we should only be speaking in terms of deprecation timelines.
For example, points (2) and (3) could possibly be rephrased as:

2. The BPF community generally makes an effort to keep stable kfuncs
around as long as they continue to be useful to real-world applications.
Stable kfuncs will maintain their signature and behavior for as long as
they're defined, and up through and possibly exceeding a deprecation
window that is outlined in more detail below.

> +
> +3. Should a stable kfunc turn out to be no longer useful, the BPF community may
> +   decide to eventually remove it. In this case, before being removed that kfunc
> +   will go through a deprecation procedure as outlined below.

Echoing the point above, I also think we should be a bit cautious in
providing specific reasons as to why a stable kfunc would be removed.
There may be cases where something is useful, but it turns out that the
complexity is causing bugs, etc, and the value just isn't worth the
cost. Hopefully the kfunc wouldn't get to the stable point by the time
we realized there's a better alternative, but it's possible (perhaps
bpf_loop() is a good example?)

Also, "useful" is subjective, so it may surprise / confuse users if they
we deprecate something that's still useful to them, such as
bpf_get_current_task(). Note that we may choose to keep a kfunc around
if a user tells us they need it, but the point is that I don't think we
want to give readers the idea that a kfunc being useful automatically
means it won't be removed.

> +
> +3.2 Deprecation of kfuncs
> +-------------------------
> +
> +As described above, the community will make every reasonable effort to keep
> +kfuncs available through future kernel versions once they are marked as stable.
> +However, it may happen case that BPF development moves in an unforeseen

'it may happen in some cases'?

> +direction so that even a stable kfunc ceases to be useful for program
> +development.
> +
> +In this case, stable kfuncs can be marked as *deprecated* using the
> +``KF_DEPRECATED`` tag. Such a deprecation request cannot be arbitrary and must
> +explain why a given stable kfunc should be deprecated. Once a kfunc is marked as
> +deprecated, the following procedure will be followed for removal:

See above -- not sure I'm convinced yet that KF_STABLE | KF_DEPRECATED
are the right APIs for the job. As another observation to that point, I
don't think it ever make sense to specify a kfunc as KF_DEPRECATED
without also KF_STABLE, which IMO is a sign that we need something
different than KF flags to express what we're going for here.

> +
> +1. A kfunc marked as deprecated will be kept in the kernel for a conservatively
> +   chosen period of time after it was first marked as deprecated (usually
> +   corresponding to a span of multiple years).

I have two suggestions for this paragraph:

1. I think we should consider specifying a concrete minimum time period
   here. Unfortunately it's difficult to choose one given that we
   haven't yet gone through a deprecation story and so we have little
   experience to draw from in choosing it. On the other hand, I think it
   would be good for us to have a more fully-defined policy to point to
   as that seems to have been a big point of confusion in prior
   discussions, and we can always update the time period later.

2. We should probably specify that the clock for a kfunc being
   considered deprecated starts at the release date of the first kernel
   version in which it was marked as deprecated. 'First marked as
   deprecated' is a bit too open for interpretation IMO, as it could
   correspond to when the commit was first merged to Linus' tree, etc.

Assuming others agree that we're ready to specify a minimum deprecation
time, I think 1 year makes sense as a starting point, though I expect it
will increase as we stabilize and finish some remaining kfunc items like
symbol versioning. 1 year isn't super aggressive in that it gives us the
flexibility to change things if we decide that we want a different
deprecation story (e.g. if and when we support symboled versions for
kfuncs), but it's still a reasonably long period of time in that we're
agreeing to a cost for the sake of stability. Wdyt?

> +
> +2. Deprecated functions will be documented in the kernel docs along with their
> +   remaining lifespan and including a recommendation for new functionality that
> +   can replace the usage of the deprecated function (or an explanation for why
> +   no such replacement exists).
> +
> +3. After the deprecation period, the kfunc will be removed and the function name
> +   will be marked as invalid inside the kernel (to ensure that no new kfunc is
> +   accidentally introduced with the same name in the future). After this
> +   happens, BPF programs calling the kfunc will be rejected by the verifier.

Do we need to mark the function name as invalid? Wouldn't removing it be
sufficient for the verifier to reject the program? If we wanted to have
some better UX for this case, ideally we could rely on libbpf to help
us.

In general, I think what you have in this paragraph is good and makes
sense given the current state of kfuncs not having symbol versioning.
That being said, once we do have something like symbol versioning
support, etc, I don't think we'd need to ban the name from ever being
used again. My thinking here is that there may be times where we need to
e.g. add a or remove a field to or from a kfunc, change some semantics,
etc, and adding an entirely new name like kfunc_name_v2() would be kind
of unfortunate. It's certainly warranted in the current state of things,
but if and when we have symbol version support I think it would be nice
to relax this restriction.

> +
> +4. Core kfuncs
>  ==============
>  
>  The BPF subsystem provides a number of "core" kfuncs that are potentially
>  applicable to a wide variety of different possible use cases and programs.
>  Those kfuncs are documented here.
>  
> -3.1 struct task_struct * kfuncs
> +4.1 struct task_struct * kfuncs
>  -------------------------------
>  
>  There are a number of kfuncs that allow ``struct task_struct *`` objects to be
> @@ -306,7 +381,7 @@ Here is an example of it being used:
>  		return 0;
>  	}
>  
> -3.2 struct cgroup * kfuncs
> +4.2 struct cgroup * kfuncs
>  --------------------------
>  
>  ``struct cgroup *`` objects also have acquire and release functions:
> -- 
> 2.39.0
>
David Vernet Jan. 18, 2023, 4:53 p.m. UTC | #5
On Wed, Jan 18, 2023 at 11:48:59AM +0100, Daniel Borkmann wrote:
> On 1/18/23 3:00 AM, Stanislav Fomichev wrote:
> > On Tue, Jan 17, 2023 at 3:19 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > On Tue, Jan 17, 2023 at 2:20 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > On Tue, Jan 17, 2023 at 2:04 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > > > > Stanislav Fomichev <sdf@google.com> writes:
> > > > > > On Tue, Jan 17, 2023 at 1:27 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > > > > > > 
> > > > > > > Following up on the discussion at the BPF office hours, this patch adds a
> > > > > > > description of the (new) concept of "stable kfuncs", which are kfuncs that
> > > > > > > offer a "more stable" interface than what we have now, but is still not
> > > > > > > part of UAPI.
> > > > > > > 
> > > > > > > This is mostly meant as a straw man proposal to focus discussions around
> > > > > > > stability guarantees. From the discussion, it seemed clear that there were
> > > > > > > at least some people (myself included) who felt that there needs to be some
> > > > > > > way to export functionality that we consider "stable" (in the sense of
> > > > > > > "applications can rely on its continuing existence").
> > > > > > > 
> > > > > > > One option is to keep BPF helpers as the stable interface and implement
> > > > > > > some technical solution for moving functionality from kfuncs to helpers
> > > > > > > once it has stood the test of time and we're comfortable committing to it
> > > > > > > as a stable API. Another is to freeze the helper definitions, and instead
> > > > > > > use kfuncs for this purpose as well, by marking a subset of them as
> > > > > > > "stable" in some way. Or we can do both and have multiple levels of
> > > > > > > "stable", I suppose.
> > > > > > > 
> > > > > > > This patch is an attempt to describe what the "stable kfuncs" idea might
> > > > > > > look like, as well as to formulate some criteria for what we mean by
> > > > > > > "stable", and describe an explicit deprecation procedure. Feel free to
> > > > > > > critique any part of this (including rejecting the notion entirely).
> > > > > > > 
> > > > > > > Some people mentioned (in the office hours) that should we decide to go in
> > > > > > > this direction, there's some work that needs to be done in libbpf (and
> > > > > > > probably the kernel too?) to bring the kfunc developer experience up to par
> > > > > > > with helpers. Things like exporting kfunc definitions to vmlinux.h (to make
> > > > > > > them discoverable), and having CO-RE support for using them, etc. I kinda
> > > > > > > consider that orthogonal to what's described here, but I do think we should
> > > > > > > fix those issues before implementing the procedures described here.
> > > > > > > 
> > > > > > > v2:
> > > > > > > - Incorporate Daniel's changes
> > > > > > > 
> > > > > > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > > > > ---
> > > > > > >   Documentation/bpf/kfuncs.rst | 87 +++++++++++++++++++++++++++++++++---
> > > > > > >   1 file changed, 81 insertions(+), 6 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > > > > > > index 9fd7fb539f85..dd40a4ee35f2 100644
> > > > > > > --- a/Documentation/bpf/kfuncs.rst
> > > > > > > +++ b/Documentation/bpf/kfuncs.rst
> > > > > > > @@ -7,9 +7,9 @@ BPF Kernel Functions (kfuncs)
> > > > > > > 
> > > > > > >   BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
> > > > > > >   kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
> > > > > > > -kfuncs do not have a stable interface and can change from one kernel release to
> > > > > > > -another. Hence, BPF programs need to be updated in response to changes in the
> > > > > > > -kernel.
> > > > > > > +kfuncs by default do not have a stable interface and can change from one kernel
> > > > > > > +release to another. Hence, BPF programs may need to be updated in response to
> > > > > > > +changes in the kernel. See :ref:`BPF_kfunc_stability`.
> > > > > > > 
> > > > > > >   2. Defining a kfunc
> > > > > > >   ===================
> > > > > > > @@ -223,14 +223,89 @@ type. An example is shown below::
> > > > > > >           }
> > > > > > >           late_initcall(init_subsystem);
> > > > > > > 
> > > > > > > -3. Core kfuncs
> > > > > > > +
> > > > > > > +.. _BPF_kfunc_stability:
> 
> small nit: please also link from Documentation/bpf/bpf_design_QA.rst, so these sections
> here are easier to find.
> 
> > > > > > > +3. API (in)stability of kfuncs
> > > > > > > +==============================
> > > > > > > +
> > > > > > > +By default, kfuncs exported to BPF programs are considered a kernel-internal
> > > > > > > +interface that can change between kernel versions. This means that BPF programs
> > > > > > > +using kfuncs may need to adapt to changes between kernel versions. In the
> > > > > > > +extreme case that could also include removal of a kfunc. In other words, kfuncs
> > > > > > > +are _not_ part of the kernel UAPI! Rather, these kfuncs can be thought of as
> > > > > > > +being similar to internal kernel API functions exported using the
> > > > > > 
> > > > > > [..]
> > > > > > 
> > > > > > > +``EXPORT_SYMBOL_GPL`` macro. All new BPF kernel helper-like functionality must
> > > > > > > +initially start out as kfuncs.
> > > > > > 
> > > > > > To clarify, as part of this proposal, are we making a decision here
> > > > > > that we ban new helpers going forward?
> > > > > 
> > > > > Good question! That is one of the things I'm hoping we can clear up by
> > > > > this discussing. I don't have a strong opinion on the matter myself, as
> > > > > long as there is *some* way to mark a subset of helpers/kfuncs as
> > > > > "stable"...
> > > > 
> > > > Might be worth it to capitalize in this case to indicate that it's a
> > > > MUST from the RFC world? (or go with SHOULD otherwise).
> > > > I'm fine either way. The only thing that stops me from fully embracing
> > > > MUST is the kfunc requirement on the explicit jit support; I'm not
> > > > sure why it exists and at this point I'm too afraid to ask. But having
> > > > MUST here might give us motivation to address the shortcomings...
> > > 
> > > Did you do:
> > > git grep bpf_jit_supports_kfunc_call
> > > and didn't find your favorite architecture there and
> > > didn't find it in the upcoming patches for riscv and arm32?
> > > If you care about kfuncs on arm32 please help reviewing posted patches.
> > 
> > Exactly why I'm going to support whatever decision is being made here.
> > Just trying to clarify what that decision is.
> 
> My $0.02 is that I don't think we need to make a hard-cut ban as part of this.
> The 'All new BPF kernel helper-like functionality must initially start out as
> kfuncs.' is pretty clear where things would need to start out with, and we could
> leave the option on the table if really needed to go BPF helper route when
> promoting kfunc to stable at the same time. I had that in the text suggestion
> earlier, it's more corner case and maybe we'll never need it but we also don't
> drive ourselves into a corner where we close the door on it. Lets let the infra
> around kfuncs evolve further first.

I think that's reasonable, though I also think it would be good for us
to be concrete about what we mean by "if really needed to go BPF helper
route". One of Andrii's main points (hopefully I'm not misrepresenting
anything) was that having things as kfuncs requires JIT support, which
means that architectures which don't yet have JIT support wouldn't be
able to reap the benefits of whatever functionality is added with
kfuncs. On the other hand, Alexei pointed out in [0] that riscv and
arm32 support is coming for JIT.

[0]: https://lore.kernel.org/all/CAADnVQKy1QzM+wg1BxfYA30QsTaM4M5RRCi+VHN6A7ah2BeZZw@mail.gmail.com/

I propose that we also specify that wanting the feature to be present on
non-JIT / non-BTF kernels is not a sufficient reason for making them a
helper. Not because there are no tradeoffs in doing so, but rather
because:

1. I think we just need to make a decision and be consistent here to
   avoid more lengthy debates.

2. I think that if something is really useful on an architecture, people
   will add JIT support for it. An argument could always be made that we
   should be able to rely only on the interpreter for new architectures
   that are added, etc. As more time passes, BPF sans JIT (i.e.
   interpreter BPF) will be less and less useful, and will diverge more
   and more from JIT-BPF. It's really inevitable anyways given the
   direction that things are going, and IMO we should just embrace that
   and focus on enabling JIT / modern BPF on useful architectures rather
   than adding things to helpers for the sake of those platforms.

Thanks,
David
Alexei Starovoitov Jan. 19, 2023, 4:32 a.m. UTC | #6
On Wed, Jan 18, 2023 at 11:48:59AM +0100, Daniel Borkmann wrote:
> 
> My $0.02 is that I don't think we need to make a hard-cut ban as part of this.

The hard-cut is easier to enforce otherwise every developer will be arguing that
their new feature is special and it requires a new discussion.
This thread has been going for too long. We need to finish it now and
don't come back to it again every now and then.

imo this is the summary of the thread:

bpf folks fall into two categories: kernel maintainers and bpf developers/users.
- developers add new bpf features. They obviously want to use them and want bpf users
to know that the feature they added is not going to disappear in the next kernel.
They want stability.
- maintainers want to make sure that the kernel development doesn't suffer because
developers keep adding new apis. They want freedom to innovate and change apis.
Maintainers also know that developers make mistakes and might leave the community.
The kernel is huge and core infra changes all the time.
bpf apis must never be a reason not to change something in the kernel.

Freedom to change and stability just don't overlap. 
These two camps can never agree on what is more important.
But we can make them co-exist.

The bpf developers adding new kfunc should assume that it's stable and proceed
to use it in bpf progs and production applications.
The bpf maintainers will keep this stability promise. They obviously will not
reap it out of the kernel on the whim, but they will nuke it if this kfunc
will be in the way of the kernel innovation.
The longer the kfunc is present the harder it will be for maintainers to justify
removing it. The developers have to stick around and demonstrate that their
kfunc is actually being used. The better developers do it the bigger the effort
maintainers will put into keeping the kfunc perfectly intact.

Some kfunc might be perfect on the first try and it will be stable from the
first kernel release it appeared in.
Other kfuncs might be questionable. Like what happened with conntrack kfunc.
It looked good first, but then the same developers who added it came back to change it.
The approach of 'assume stable, but fix it if you like' worked in this case.

Take bpf_obj_new kfunc. I think it's great and I hope the interface will stick.
But we have an option to change it.

Take Andrii's upcoming 'open coded iterators'. The concept and api look great.
I hope we will do it right the first time and
bpf progs will start to use it immediately.
In such case why would anyone think of changing it?
If api works well and progs are using we will keep it this way.

But imagine we decide to replace the verifier with something better.
It will give us much better flexibility, but sadly bounded loops and
iterators will be in the way.
What we most likely going to do in such case we'll keep two verifiers for
several years and deprecate the old one along with kfuncs that we couldn't keep.
That would be a scheme for deprecation of kfunc.
Maybe we will use KF_DEPRECATE mechansim in such case or something else.
I think we need to cross that bridge when we get there.

Introducing KF_STABLE and KF_DEPRECATED right now looks premature.
We can discuss it, but adding it to a doc and committing to it is too early.
We don't have any kfuncs to mark as KF_STABLE or as KF_DEPRECATED.
No one presented any data on usage of existing kfuncs.
So we're not going to change or remove any one of them.
bpf developers and users should assume that all kfuncs are stable and use them.
When somebody comes to argue that a particular kfunc needs to change
the developer who added that kfunc better to be around to argue that the kfunc is
perfect the way it is. If developer is gone the maintainers will make a call.
It's a self regulating system.
kfuncs will be stable if developers/users are around.
Yet the maintainers will have a freedom to change if absolutely necessary.

Back to deprecation...
I think KF_DEPRECATED is a good idea.
When kfunc will be auto emitted into vmlinux.h (or whatever other file)
or shipped in libbpf header we can emit
__attribute__((deprecated("reason", "replacement")));
to that header file (so it's seen during bpf prog build) and
start dmesg warn on them in the verifier.
Kernel splats do get noticed. The users would have to act quickly.

As far as KF_STABLE... I think it hurts the system in the long run.
The developer can argue at one point in time that kfunc has to be KF_STABLE.
The patch will be applied, but the developer is off the hook and can disappear.
The maintainers would have to argue on behalf of the developer
and keep maintaining it? The maintainers won't have a signal whether
kfunc is still useful after initial KF_STABLE patch.

I think it's more important to decide how we document kfuncs and
how equivalent of bpf_helper_defs.h can be done.

> The 'All new BPF kernel helper-like functionality must initially start out as
> kfuncs.' is pretty clear where things would need to start out with, and we could
> leave the option on the table if really needed to go BPF helper route when
> promoting kfunc to stable at the same time. I had that in the text suggestion
> earlier, it's more corner case and maybe we'll never need it but we also don't
> drive ourselves into a corner where we close the door on it. Lets let the infra
> around kfuncs evolve further first.

Going kfunc->helper for stability was discussed already. It probably got lost
in the noise. The summary was that it's not an option for the following reason:
kfuncs and helpers are done through different mechanisms on prog and kernel side.
The prog either sees = (void *)1 hack or normal call to extern func.
The generated code is different.
Say, we convert a kfunc to helper. Immediately the existing bpf prog that uses 
that kfunc will fail to load. That's the opposite of stability.
We're going to require the developer to demonstrate the real world use of kfunc
before promoting to stable, but with such 'promotion' we will break bpf progs.
Say, we keep kfunc and introduce a new helper that does exactly the same.
But it won't help bpf prog. The prog was using kfunc in production,
why would it do some kind of CO-RE thing to compile 'call foo' differently
depending whether 'foo' is kfunc or helper in the given kernel. And so on.
Andrii Nakryiko Jan. 24, 2023, 5:17 p.m. UTC | #7
On Wed, Jan 18, 2023 at 8:32 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jan 18, 2023 at 11:48:59AM +0100, Daniel Borkmann wrote:
> >
> > My $0.02 is that I don't think we need to make a hard-cut ban as part of this.
>
> The hard-cut is easier to enforce otherwise every developer will be arguing that
> their new feature is special and it requires a new discussion.
> This thread has been going for too long. We need to finish it now and
> don't come back to it again every now and then.

I wish that we could grant exception at least to complete dynptr
basics (bpf_dynptr_is_null, bpf_dynptr_get_size,
bpf_dynptr_{clone,trim,advance}) so that it is consistently provided
as a unified set of helpers. Similarly, for open coded loop iterator
(3 helpers), I believe it would be better for BPF ecosystem overall to
work on any BPF-enabled architecture and configuration (no matter JIT
or not, BTF of not, etc), just due to generality and unassuming nature
of this functionality.

But it is what it is, let's move on.

>
> imo this is the summary of the thread:
>
> bpf folks fall into two categories: kernel maintainers and bpf developers/users.
> - developers add new bpf features. They obviously want to use them and want bpf users
> to know that the feature they added is not going to disappear in the next kernel.
> They want stability.
> - maintainers want to make sure that the kernel development doesn't suffer because
> developers keep adding new apis. They want freedom to innovate and change apis.
> Maintainers also know that developers make mistakes and might leave the community.
> The kernel is huge and core infra changes all the time.
> bpf apis must never be a reason not to change something in the kernel.
>
> Freedom to change and stability just don't overlap.
> These two camps can never agree on what is more important.
> But we can make them co-exist.
>
> The bpf developers adding new kfunc should assume that it's stable and proceed
> to use it in bpf progs and production applications.

It's unclear what this means for library developers (libbpf, libxdp,
and others), but I guess we'll find out with time.

> The bpf maintainers will keep this stability promise. They obviously will not
> reap it out of the kernel on the whim, but they will nuke it if this kfunc
> will be in the way of the kernel innovation.
> The longer the kfunc is present the harder it will be for maintainers to justify
> removing it. The developers have to stick around and demonstrate that their
> kfunc is actually being used. The better developers do it the bigger the effort
> maintainers will put into keeping the kfunc perfectly intact.
>

[...]

>
> > The 'All new BPF kernel helper-like functionality must initially start out as
> > kfuncs.' is pretty clear where things would need to start out with, and we could
> > leave the option on the table if really needed to go BPF helper route when
> > promoting kfunc to stable at the same time. I had that in the text suggestion
> > earlier, it's more corner case and maybe we'll never need it but we also don't
> > drive ourselves into a corner where we close the door on it. Lets let the infra
> > around kfuncs evolve further first.
>
> Going kfunc->helper for stability was discussed already. It probably got lost
> in the noise. The summary was that it's not an option for the following reason:
> kfuncs and helpers are done through different mechanisms on prog and kernel side.
> The prog either sees = (void *)1 hack or normal call to extern func.
> The generated code is different.
> Say, we convert a kfunc to helper. Immediately the existing bpf prog that uses
> that kfunc will fail to load. That's the opposite of stability.
> We're going to require the developer to demonstrate the real world use of kfunc
> before promoting to stable, but with such 'promotion' we will break bpf progs.
> Say, we keep kfunc and introduce a new helper that does exactly the same.
> But it won't help bpf prog. The prog was using kfunc in production,
> why would it do some kind of CO-RE thing to compile 'call foo' differently
> depending whether 'foo' is kfunc or helper in the given kernel. And so on.

Correct. If we'd anticipated promotion of kfunc to helper (but it
doesn't seem like we do), we'd need to have kfunc with a different
name from its corresponding helper's name to avoid massive pain for
users. So if we were to add bpf_dynptr_is_null() as kfunc with an eye
for it to be stabilized as helper, I'd vote to add it as something
like bpf_kf_dynptr_is_null() or something, and then eventually add
bpf_dynptr_is_null(). That will at least less painful abstraction in
user's BPF code (and libbpf's helpers, if any).
Andrii Nakryiko Jan. 24, 2023, 7:54 p.m. UTC | #8
On Tue, Jan 24, 2023 at 9:17 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Jan 18, 2023 at 8:32 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Jan 18, 2023 at 11:48:59AM +0100, Daniel Borkmann wrote:
> > >
> > > My $0.02 is that I don't think we need to make a hard-cut ban as part of this.
> >
> > The hard-cut is easier to enforce otherwise every developer will be arguing that
> > their new feature is special and it requires a new discussion.
> > This thread has been going for too long. We need to finish it now and
> > don't come back to it again every now and then.
>
> I wish that we could grant exception at least to complete dynptr
> basics (bpf_dynptr_is_null, bpf_dynptr_get_size,
> bpf_dynptr_{clone,trim,advance}) so that it is consistently provided
> as a unified set of helpers. Similarly, for open coded loop iterator
> (3 helpers), I believe it would be better for BPF ecosystem overall to
> work on any BPF-enabled architecture and configuration (no matter JIT
> or not, BTF of not, etc), just due to generality and unassuming nature
> of this functionality.
>
> But it is what it is, let's move on.

Just to expand a bit on the above and make it clearer. I don't like a
hard-cut ban on helpers, but I'll disagree and commit and will move
open-coded iterators to kfuncs. And whoever is waiting on the helpers
vs kfuncs decision should stop waiting and use kfuncs.


[...]
Toke Høiland-Jørgensen Jan. 25, 2023, 1:18 a.m. UTC | #9
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> The bpf developers adding new kfunc should assume that it's stable and proceed
> to use it in bpf progs and production applications.

"Assume all kfuncs are stable" is fine by me, but that is emphatically
not what we have been saying thus far, quite the opposite...

> The bpf maintainers will keep this stability promise. They obviously will not
> reap it out of the kernel on the whim, but they will nuke it if this kfunc
> will be in the way of the kernel innovation.

...and it is contradicted by this last bit. I mean "it's stable, but
we'll remove it if it's in the way" is not, well, stable.

[...]

> bpf developers and users should assume that all kfuncs are stable and use them.
> When somebody comes to argue that a particular kfunc needs to change
> the developer who added that kfunc better to be around to argue that the kfunc is
> perfect the way it is. If developer is gone the maintainers will make a call.
> It's a self regulating system.
> kfuncs will be stable if developers/users are around.
> Yet the maintainers will have a freedom to change if absolutely necessary.

This assumes users (i.e., BPF program authors) are around during the
development phase, which they are generally not. Except for the users
who are also BPF devs, but that's a minority (if not now, hopefully in
the future). So I really think we need to document some expectations
here.

For instance, what happens if we change a kfunc, and a user shows up
during the -rc phase saying it broke their application? Are we going to
revert that change?

> Back to deprecation...
> I think KF_DEPRECATED is a good idea.
> When kfunc will be auto emitted into vmlinux.h (or whatever other file)
> or shipped in libbpf header we can emit
> __attribute__((deprecated("reason", "replacement")));
> to that header file (so it's seen during bpf prog build) and
> start dmesg warn on them in the verifier.
> Kernel splats do get noticed. The users would have to act quickly.

So how about documenting that bit? Something like:

"We promise that kfuncs will not be removed without going through a
deprecation phase. The length of the deprecation will be proportional to
how long that kfunc has existed in the kernel, but will be no shorter
than XX kernel releases." ?

> As far as KF_STABLE... I think it hurts the system in the long run.
> The developer can argue at one point in time that kfunc has to be KF_STABLE.
> The patch will be applied, but the developer is off the hook and can disappear.
> The maintainers would have to argue on behalf of the developer
> and keep maintaining it? The maintainers won't have a signal whether
> kfunc is still useful after initial KF_STABLE patch.

Doing the above wrt deprecation without having an explicit stable tag
would be OK with me.

> I think it's more important to decide how we document kfuncs and
> how equivalent of bpf_helper_defs.h can be done.

I agree we (also) need to do this. As well as have some support for
querying for them from userspace on a running kernel (for CO-RE
purposes).

-Toke
Alexei Starovoitov Jan. 26, 2023, 5:34 a.m. UTC | #10
On Tue, Jan 24, 2023 at 5:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > The bpf developers adding new kfunc should assume that it's stable and proceed
> > to use it in bpf progs and production applications.
>
> "Assume all kfuncs are stable" is fine by me, but that is emphatically
> not what we have been saying thus far, quite the opposite...
>
> > The bpf maintainers will keep this stability promise. They obviously will not
> > reap it out of the kernel on the whim, but they will nuke it if this kfunc
> > will be in the way of the kernel innovation.
>
> ...and it is contradicted by this last bit. I mean "it's stable, but
> we'll remove it if it's in the way" is not, well, stable.

Schrodinger's kfuncs :)

> [...]
>
> > bpf developers and users should assume that all kfuncs are stable and use them.
> > When somebody comes to argue that a particular kfunc needs to change
> > the developer who added that kfunc better to be around to argue that the kfunc is
> > perfect the way it is. If developer is gone the maintainers will make a call.
> > It's a self regulating system.
> > kfuncs will be stable if developers/users are around.
> > Yet the maintainers will have a freedom to change if absolutely necessary.
>
> This assumes users (i.e., BPF program authors) are around during the
> development phase, which they are generally not. Except for the users
> who are also BPF devs, but that's a minority (if not now, hopefully in
> the future). So I really think we need to document some expectations
> here.
>
> For instance, what happens if we change a kfunc, and a user shows up
> during the -rc phase saying it broke their application? Are we going to
> revert that change?

It depends.
Obviously we're not going to change/remove/deprecate kfuncs unless
there is a need to do so.
And when we do we will consider all users.

> > Back to deprecation...
> > I think KF_DEPRECATED is a good idea.
> > When kfunc will be auto emitted into vmlinux.h (or whatever other file)
> > or shipped in libbpf header we can emit
> > __attribute__((deprecated("reason", "replacement")));
> > to that header file (so it's seen during bpf prog build) and
> > start dmesg warn on them in the verifier.
> > Kernel splats do get noticed. The users would have to act quickly.
>
> So how about documenting that bit? Something like:
>
> "We promise that kfuncs will not be removed without going through a
> deprecation phase. The length of the deprecation will be proportional to
> how long that kfunc has existed in the kernel, but will be no shorter
> than XX kernel releases." ?

That's not something we can promise.
Take conntrack kfuncs. If netfilter folks decide to sunset
conntrack tomorrow we won't be standing in their way.

On the other side the dynptr kfuncs are going to stay as-is for
foreseeable future because they don't rely on other kernel
subsystems to do the job.
Both cases may still change if users themselves
(after using it in prod) come back with reasons to change it.

In the past the kernel devs were dictating the helper uapi to
users and users had no option, but to shut up and use what's available.
Now they will be able to use new apis and request changes.
At that point it will be a set of users X vs a set of users Y.
If ten users say that this kfunc sucks while one user
wants to keep it as-is we will introduce another kfunc and
will start deprecating the one that lost the vote.
The deprecation time window will depend on case by case
considering maintenance cost, etc.

> > As far as KF_STABLE... I think it hurts the system in the long run.
> > The developer can argue at one point in time that kfunc has to be KF_STABLE.
> > The patch will be applied, but the developer is off the hook and can disappear.
> > The maintainers would have to argue on behalf of the developer
> > and keep maintaining it? The maintainers won't have a signal whether
> > kfunc is still useful after initial KF_STABLE patch.
>
> Doing the above wrt deprecation without having an explicit stable tag
> would be OK with me.
>
> > I think it's more important to decide how we document kfuncs and
> > how equivalent of bpf_helper_defs.h can be done.
>
> I agree we (also) need to do this. As well as have some support for
> querying for them from userspace on a running kernel (for CO-RE
> purposes).

Of course. That has been brought up many times.
Just do it. Whoever has time to do it.
Toke Høiland-Jørgensen Jan. 27, 2023, 5:50 p.m. UTC | #11
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Jan 24, 2023 at 5:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > The bpf developers adding new kfunc should assume that it's stable and proceed
>> > to use it in bpf progs and production applications.
>>
>> "Assume all kfuncs are stable" is fine by me, but that is emphatically
>> not what we have been saying thus far, quite the opposite...
>>
>> > The bpf maintainers will keep this stability promise. They obviously will not
>> > reap it out of the kernel on the whim, but they will nuke it if this kfunc
>> > will be in the way of the kernel innovation.
>>
>> ...and it is contradicted by this last bit. I mean "it's stable, but
>> we'll remove it if it's in the way" is not, well, stable.
>
> Schrodinger's kfuncs :)

Heh, yeah. There's a reason it's called Schrodinger's *uncertainty*
principle, though. Documenting it is about removing some of that
uncertainty so users of can actually know what to expect.

Otherwise, there is a subset of potential users who will shy away from
using kfuncs because it's perceived as "totally unstable, may change at
any time". Which is a shame, because there are many such users who could
benefit from using BPF.

So in other words, even if we don't commit to a stability promise I
think it's worth documenting expectations as precisely as we can.

[...]

>> > Back to deprecation...
>> > I think KF_DEPRECATED is a good idea.
>> > When kfunc will be auto emitted into vmlinux.h (or whatever other file)
>> > or shipped in libbpf header we can emit
>> > __attribute__((deprecated("reason", "replacement")));
>> > to that header file (so it's seen during bpf prog build) and
>> > start dmesg warn on them in the verifier.
>> > Kernel splats do get noticed. The users would have to act quickly.
>>
>> So how about documenting that bit? Something like:
>>
>> "We promise that kfuncs will not be removed without going through a
>> deprecation phase. The length of the deprecation will be proportional to
>> how long that kfunc has existed in the kernel, but will be no shorter
>> than XX kernel releases." ?
>
> That's not something we can promise.
> Take conntrack kfuncs. If netfilter folks decide to sunset
> conntrack tomorrow we won't be standing in their way.

Well, we could do one of two things:

- Make a promise to commit to the deprecation procedure and tell
  subsystems not to add kfuncs unless they are OK with that (getting
  suitable ACKs for the existing users first, of course)

or

- Document that core-BPF kfuncs won't go away without a deprecation
  procedure, and have each subsystem using them document their own
  policies.

I believe the latter is more in line what you and others have set as an
expectation when discussing this previously?

> On the other side the dynptr kfuncs are going to stay as-is for
> foreseeable future because they don't rely on other kernel
> subsystems to do the job.
> Both cases may still change if users themselves
> (after using it in prod) come back with reasons to change it.
>
> In the past the kernel devs were dictating the helper uapi to
> users and users had no option, but to shut up and use what's available.
> Now they will be able to use new apis and request changes.
> At that point it will be a set of users X vs a set of users Y.
> If ten users say that this kfunc sucks while one user
> wants to keep it as-is we will introduce another kfunc and
> will start deprecating the one that lost the vote.

"Lost the vote"? This seems like a can of worms (who can vote? how are
we counting them? etc). I think what you really mean here is something
like "maintainers will take into consideration the opinions of users of
the API and make a call as to whether the benefits of changing a kfunc
outweighs the costs"?

> The deprecation time window will depend on case by case
> considering maintenance cost, etc.

Right, that's not too far from what I proposed above. I still think it
would be useful to commit to a minimum number of releases, though.
Again, to set expectations.

-Toke
diff mbox series

Patch

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 9fd7fb539f85..dd40a4ee35f2 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -7,9 +7,9 @@  BPF Kernel Functions (kfuncs)
 
 BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
 kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
-kfuncs do not have a stable interface and can change from one kernel release to
-another. Hence, BPF programs need to be updated in response to changes in the
-kernel.
+kfuncs by default do not have a stable interface and can change from one kernel
+release to another. Hence, BPF programs may need to be updated in response to
+changes in the kernel. See :ref:`BPF_kfunc_stability`.
 
 2. Defining a kfunc
 ===================
@@ -223,14 +223,89 @@  type. An example is shown below::
         }
         late_initcall(init_subsystem);
 
-3. Core kfuncs
+
+.. _BPF_kfunc_stability:
+
+3. API (in)stability of kfuncs
+==============================
+
+By default, kfuncs exported to BPF programs are considered a kernel-internal
+interface that can change between kernel versions. This means that BPF programs
+using kfuncs may need to adapt to changes between kernel versions. In the
+extreme case that could also include removal of a kfunc. In other words, kfuncs
+are _not_ part of the kernel UAPI! Rather, these kfuncs can be thought of as
+being similar to internal kernel API functions exported using the
+``EXPORT_SYMBOL_GPL`` macro. All new BPF kernel helper-like functionality must
+initially start out as kfuncs.
+
+3.1 Promotion to "stable" kfuncs
+--------------------------------
+
+While kfuncs are by default considered unstable as described above, some kfuncs
+may warrant a stronger stability guarantee and can be marked as *stable*. The
+decision to move a kfunc to *stable* is taken on a case-by-case basis and must
+clear a high bar, taking into account the functions' usefulness under
+longer-term production deployment without any unforeseen API issues or
+limitations. In general, it is not expected that every kfunc will turn into a
+stable one - think of it as an exception rather than the norm.
+
+Those kfuncs which have been promoted to stable are then marked using the
+``KF_STABLE`` tag. The process for requesting a kfunc be marked as stable
+consists of submitting a patch to the bpf@vger.kernel.org mailing list adding
+the ``KF_STABLE`` tag to that kfunc's definition. The patch description must
+include the rationale for why the kfunc should be promoted to stable, including
+references to existing production uses, etc. The patch will be considered the
+same was as any other patch, and ultimately the decision on whether a kfunc
+should be promoted to stable is taken by the BPF maintainers.
+
+Stable kfuncs provide the following stability guarantees:
+
+1. Stable kfuncs will not change their function signature or functionality in a
+   way that may cause incompatibilities for BPF programs calling the function.
+
+2. The BPF community will make every reasonable effort to keep stable kfuncs
+   around as long as they continue to be useful to real-world BPF applications.
+
+3. Should a stable kfunc turn out to be no longer useful, the BPF community may
+   decide to eventually remove it. In this case, before being removed that kfunc
+   will go through a deprecation procedure as outlined below.
+
+3.2 Deprecation of kfuncs
+-------------------------
+
+As described above, the community will make every reasonable effort to keep
+kfuncs available through future kernel versions once they are marked as stable.
+However, it may happen case that BPF development moves in an unforeseen
+direction so that even a stable kfunc ceases to be useful for program
+development.
+
+In this case, stable kfuncs can be marked as *deprecated* using the
+``KF_DEPRECATED`` tag. Such a deprecation request cannot be arbitrary and must
+explain why a given stable kfunc should be deprecated. Once a kfunc is marked as
+deprecated, the following procedure will be followed for removal:
+
+1. A kfunc marked as deprecated will be kept in the kernel for a conservatively
+   chosen period of time after it was first marked as deprecated (usually
+   corresponding to a span of multiple years).
+
+2. Deprecated functions will be documented in the kernel docs along with their
+   remaining lifespan and including a recommendation for new functionality that
+   can replace the usage of the deprecated function (or an explanation for why
+   no such replacement exists).
+
+3. After the deprecation period, the kfunc will be removed and the function name
+   will be marked as invalid inside the kernel (to ensure that no new kfunc is
+   accidentally introduced with the same name in the future). After this
+   happens, BPF programs calling the kfunc will be rejected by the verifier.
+
+4. Core kfuncs
 ==============
 
 The BPF subsystem provides a number of "core" kfuncs that are potentially
 applicable to a wide variety of different possible use cases and programs.
 Those kfuncs are documented here.
 
-3.1 struct task_struct * kfuncs
+4.1 struct task_struct * kfuncs
 -------------------------------
 
 There are a number of kfuncs that allow ``struct task_struct *`` objects to be
@@ -306,7 +381,7 @@  Here is an example of it being used:
 		return 0;
 	}
 
-3.2 struct cgroup * kfuncs
+4.2 struct cgroup * kfuncs
 --------------------------
 
 ``struct cgroup *`` objects also have acquire and release functions: